Search results for: free word order languages
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 17468

Search results for: free word order languages

17438 A Word-to-Vector Formulation for Word Representation

Authors: Sandra Rizkallah, Amir F. Atiya

Abstract:

This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.

Keywords: natural language processing, word to vector, text similarity, text mining

Procedia PDF Downloads 242
17437 Phonetics and Phonological Investigation of Geminates and Gemination in Some Indic Languages

Authors: Hifzur Ansary

Abstract:

The aim and scope of the present research are to delve into the form of geminates and the process of gemination with special reference to Indic Languages. This work presents the results of a cross-linguistic investigation of word-medial geminate consonants. This study is a theoretical as well as experimental, that is, it is based not only on impressionistic data from Indic languages but also on an instrumental analysis of the data. The primary data have been collected from the native speakers. The secondary data have been collected from printed materials such as journals, grammar books and other published articles. The observations made in this study have been checked with a number of educated native speakers of Bangla and Telugu. The study focuses on geminates and gemination in Bangla (Indo-Aryan Language Family) and Telugu (Dravidian Language family) exhaustively. Thus this study also attempts to posit the valid geminates in Bangali and Telugu and provides an account of gemination in these languages. It also makes a comparison of singletons and geminated consonants. It describes the distribution of geminate phonemes and non-geminate phonemes of Bangla and Telugu. The present research would also investigate the vowel lengthening in Bangla with respect to gemination. The study also explains how gemination processes present in Indian Languages are transferred to Indian English.

Keywords: geminate consonant, singleton-geminate contrast, different types of assimilation, gemination derives from borrowed words

Procedia PDF Downloads 254
17436 Mouthing Patterns in Indian Sign Language

Authors: Neha Kulshreshtha

Abstract:

This paper examines the patterns of 'Mouthing', a non-manual marker, and its distribution in Indian Sign Language (ISL). Linguistic research in Indian Sign Language is an emerging field where much is needed to be done. The little research which has happened focuses on the structure of ISL in terms of physical or manual markers, therefore a study of mouthing patterns would give an insight into the distribution of this particular non-manual marker. Data has been collected with the help of native ISL users through various techniques in which natural signs can be captured, for example, storytelling, informal conversations etc. The aim of the study is to find out the various situations where mouthing is used. Sometimes, the mouthing is not actually the articulation of the word as spoken in the local languages. The paper aims to find out whether the mouthing patterns in ISL are influenced by any local language or they are independent of any influence from the local language or both. Mouthing patterns have been studied in many sign languages and an investigation into ISL will reveal whether it falls in pattern with the other sign languages.

Keywords: Indian sign language, mouthing, non-manual marker, spoken language influence

Procedia PDF Downloads 223
17435 TransDrift: Modeling Word-Embedding Drift Using Transformer

Authors: Nishtha Madaan, Prateek Chaudhury, Nishant Kumar, Srikanta Bedathur

Abstract:

In modern NLP applications, word embeddings are a crucial backbone that can be readily shared across a number of tasks. However, as the text distributions change and word semantics evolve over time, the downstream applications using the embeddings can suffer if the word representations do not conform to the data drift. Thus, maintaining word embeddings to be consistent with the underlying data distribution is a key problem. In this work, we tackle this problem and propose TransDrift, a transformer-based prediction model for word embeddings. Leveraging the flexibility of the transformer, our model accurately learns the dynamics of the embedding drift and predicts future embedding. In experiments, we compare with existing methods and show that our model makes significantly more accurate predictions of the word embedding than the baselines. Crucially, by applying the predicted embeddings as a backbone for downstream classification tasks, we show that our embeddings lead to superior performance compared to the previous methods.

Keywords: NLP applications, transformers, Word2vec, drift, word embeddings

Procedia PDF Downloads 63
17434 Reasons for Language Words in the Quran and Literary Approaches That Are Persian

Authors: Fateme Mazbanpoor, Sayed Mohammad Amiri

Abstract:

In this article, we will examine the Persian words in Quran and study the reasons of their presence in this holy book. Writers of this paper extracted about 70 Persian words of Quran by referring to resources. (Alalfaz ol Moarab ol Farsieh Edishir, Almoarabol Javalighi, Almahzab va Etghan Seuti; Vocabulary involved in Quran Arthur Jeffry;, and etc…), some of these words are: ‘Abarigh, ‘Estabragh’,’Barzakh’, ‘Din’,’Zamharir, ‘Sondos’ ‘Sejil’,’ Namaregh’, ‘Fil’ etc. These Persian words have entered Arabic and finally entered Quran in two ways: 1) directly from Persian language, 2) via other languages. The first way: because of the Iranian dominance on Hira, Yemen, whole Oman and Bahrein land in Sasanian period, there were political, religious, linguistic, literary, and trade ties between these Arab territories causing the impact of Persian on Arabic; giving way to many Persian-loan words into Arabic in this period of time. The second way: Since the geographical and business conditions of the areas were dominated by Iran, Hejaz had lots of deals and trades with Mesopotamia and Yemen. On the other hand, Arabic language which was relatively a young language at that time, used to be impressed by Semitic languages in order to expand its vocabulary (Syrian and Aramaic were influenced by the languages of Iran). Consequently, due to the long relationship between Iranian and Arabs, some of the Persian words have taken longer ways through Aramaic and Syrian to find their way into Quran.

Keywords: Quran, Persian word, Arabic language, Persian

Procedia PDF Downloads 439
17433 Effects of Aging on Auditory and Visual Recall Abilities

Authors: Rashmi D. G., Aishwarya G., Niharika M. K.

Abstract:

Purpose: Free recall tasks target cognitive and linguistic processes like episodic memory, lexical access and retrieval. Consequently, the free recall paradigm is suitable for assessing memory deterioration caused by aging; this also depends on linguistic factors, including the use of first and second languages and their relative ability. Hence, the present study aimed to determine if aging has an effect on visual and auditory recall abilities. Method: Twenty young adults (mean age: 25.4±0.99) and older adults (mean age: 63.3±3.51) participated in the study. Participants performed a free recall task under two conditions – related and unrelated and two modalities - visual and auditory where they were instructed to recall as many items as possible with no specific order and time limit. Results: Free recall performance was calculated as the mean number of correctly recalled items. Although younger participants recalled a higher number of items, the performance across conditions and modality was variable. Conclusion: In summary, the findings of the present study revealed an age-related decline in the efficiency of episodic memory, which is crucial to remember recent events.

Keywords: recall, episodic memory, aging, modality

Procedia PDF Downloads 68
17432 A Study from Language and Culture Perspective of Human Needs in Chinese and Vietnamese Euphemism Languages

Authors: Quoc Hung Le Pham

Abstract:

Human beings are motivated to satisfy the physiological needs and psychological needs. In the fundamental needs, bodily excretion is the most basic one, while physiological excretion refers to the final products produced in the process of discharging the body. This physiological process is a common human phenomenon. For instance, bodily secretion is totally natural, but people of various nationalities through the times avoid saying it directly. Terms like ‘shit’ are often negatively regarded as dirty, smelly and vulgar; it will lead people to negative thinking. In fact, it is in the psychology of human beings to avoid such unsightly terms. Especially in social situations where you have to take care of your image, and you have to release. The best way to solve this is to approach the use of euphemism. People prefer to say it as ‘answering nature's call’ or ‘to pass a motion’ instead. Chinese and Vietnamese nations are referring to use euphemisms to replace bodily secretions, so this research will take this phenomenon as the object aims to explore the similarities and dissimilarities between two languages euphemism. The basic of the niche of this paper is human physiological phenomenon excretion. As the preliminary results show, in expressing bodily secretions the deeply impacting factor is language and cultural factors. On language factor terms, two languages are using assonance to replace human nature discharge, whilst the dissimilarities are metonymy, loan word and personification. On culture factor terms, the convergences are metonymy and application of the semantically-contrary-word-euphemism, whilst the difference is Chinese euphemism using allusion but Vietnamese euphemism does not.

Keywords: cultural factors, euphemism, human needs, language factors

Procedia PDF Downloads 267
17431 The Lexical Eidos as an Invariant of a Polysemantic Word

Authors: S. Pesina, T. Solonchak

Abstract:

Phenomenological analysis is not based on natural language, but ideal language which is able to be a carrier of ideal meanings – eidos representing typical structures or essences. For this purpose, it’s necessary to release from the spatio-temporal definiteness of a subject and then state its noetic essence (eidos) by means of free fantasy generation. Herewith, as if a totally new objectness is created - the universal, confirming the thesis that thinking process takes place in generalizations passing by numerous means through the specific to the general and from the general through the specific to the singular.

Keywords: lexical eidos, phenomenology, noema, polysemantic word, semantic core

Procedia PDF Downloads 249
17430 Project Marayum: Creating a Community Built Mobile Phone Based, Online Web Dictionary for Endangered Philippine Languages

Authors: Samantha Jade Sadural, Kathleen Gay Figueroa, Noel Nicanor Sison II, Francis Miguel Quilab, Samuel Edric Solis, Kiel Gonzales, Alain Andrew Boquiren, Janelle Tan, Mario Carreon

Abstract:

Of the 185 languages in the Philippines, 28 are endangered, 11 are dying off, and 4 are extinct. Language documentation, as a prerequisite to language education, can be one of the ways languages can be preserved. Project Marayum is envisioned to be a collaboratively built, mobile phone-based, online dictionary platform for Philippine languages. Although there are many online language dictionaries available on the Internet, Project Marayum aims to give a sense of ownership to the language community's dictionary as it is built and maintained by the community for the community. From a seed dictionary, members of a language community can suggest changes, add new entries, and provide language examples. Going beyond word definitions, the platform can be used to gather sample sentences and even audio samples of word usage. These changes are reviewed by language experts of the community, sourced from the local state universities or local government units. Approved changes are then added to the dictionary and can be viewed instantly through the Marayum website. A companion mobile phone application allows users to browse the dictionary in remote areas where Internet connectivity is nonexistent. The dictionary will automatically be updated once the user regains Internet access. Project Marayum is still a work in progress. At the time of this abstract's writing, the Project has just entered its second year. Prototypes are currently being tested with the Asi language of Romblon island as its initial language testbed. In October 2020, Project Marayum will have both a webpage and mobile application with Asi, Ilocano, and Cebuano language dictionaries available for use online or for download. In addition, the Marayum platform would be then easily expandable for use of the more endangered language communities. Project Marayum is funded by the Philippines Department of Science and Technology.

Keywords: collaborative language dictionary, community-centered lexicography, content management system, software engineering

Procedia PDF Downloads 136
17429 Immigration Of Language From Anatolia To Greenland

Authors: Onur Kaya

Abstract:

Languages date back thousands of years of formation and journeys through the world. In these journeys and formations, they travel, reach and mixes to the very far corners and languages of the world. In this perspective, in order to analyze such language examples, the analysis of the formation, affection, travel, thus immigration of Anatolian Turkish and Inuit of Greenland is significant. Firstly, it is significant to analyze the historical connections between the Turks in Anatolia and the Inuit people in Greenland. So, the intersection of Turks and Inuit's immigrations in history and all these connections to Greenland and Anatolia will be revealed. Then, it is necessary to analyze the linguistic qualities of Turkish and Inuit languages. For this aim, the linguistic theories and linguistic features of the two languages and their common points will be emphasized. After all these explanations and analyses, the effects of the two languages two each other, common words, and the existence of all these in written and literary works of the two languages will be analyzed and exemplified. Finally, the lecture will focus on two different geographies as, Anatolia and Greenland. The two societies’ historical commonness will be revealed. The immigration and the intersecting locations of the two societies will be analyzed. By means of all this information and within the light of the linguistic theories, the commonness of the two languages, the affections caused by each other, the result of these affections, and their examples in written works will be revealed.

Keywords: greenland, anatolia, turk, inuit, immigration

Procedia PDF Downloads 39
17428 Math Word Problems: Context and Achievement

Authors: Irena Smetackova

Abstract:

The important part of school mathematics are word problems which represent the connection between school knowledge and life reality. To find the reasons why students consider word problems to be difficult, it is necessary to take into consideration the motivational settings, besides mathematical knowledge and reading skills. Our goal is to identify whether the familiar or unfamiliar context of math word problem influences solving success rate and if so, whether the reasons are motivational or cognitive. For this purpose, we conducted three steps study in group of fifty pupils 9-10 years old. In the first step, we asked pupils to create ‘the best’ word problems for entered numerical formula. The set of 19 word problems with different contexts were selected. In the second step, pupils were asked to evaluate (without solving) how they like each item and how easy it is for them. The 6 word problems with low preference and low estimated success rate were selected and combined with other 6 problems with high preference and success rate. In the third step, the same pupils were asked to solve the word problems. The analysis showed that pupils attitudes and solving toward word problems varied by the context. The strong gender patterns both in preferred contexts and in estimated success rates were identified however the real success rate did not differ so strongly. The success gap between word problems with and without preferred contexts were stronger than the gap between problems with and without real experience with the context. The hypothesis that motivational factors are more important than cognitive factors was confirmed.

Keywords: mathematics, context of reality, motivation, cognition, word problems

Procedia PDF Downloads 169
17427 Resume Ranking Using Custom Word2vec and Rule-Based Natural Language Processing Techniques

Authors: Subodh Chandra Shakya, Rajendra Sapkota, Aakash Tamang, Shushant Pudasaini, Sujan Adhikari, Sajjan Adhikari

Abstract:

Lots of efforts have been made in order to measure the semantic similarity between the text corpora in the documents. Techniques have been evolved to measure the similarity of two documents. One such state-of-art technique in the field of Natural Language Processing (NLP) is word to vector models, which converts the words into their word-embedding and measures the similarity between the vectors. We found this to be quite useful for the task of resume ranking. So, this research paper is the implementation of the word2vec model along with other Natural Language Processing techniques in order to rank the resumes for the particular job description so as to automate the process of hiring. The research paper proposes the system and the findings that were made during the process of building the system.

Keywords: chunking, document similarity, information extraction, natural language processing, word2vec, word embedding

Procedia PDF Downloads 130
17426 Named Entity Recognition System for Tigrinya Language

Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager

Abstract:

The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.

Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF

Procedia PDF Downloads 73
17425 Language Activation Theory: Unlocking Bilingual Language Processing

Authors: Leorisyl D. Siarot

Abstract:

It is conventional to see and hear Filipinos, in general, speak two or more languages. This phenomenon brings us to a closer look on how our minds process the input and produce an output with a specific chosen language. This study aimed to generate a theoretical model which explained the interaction of the first and the second languages in the human mind. After a careful analysis of the gathered data, a theoretical prototype called Language Activation Model was generated. For every string, there are three specialized banks: lexico-semantics, morphono-syntax, and pragmatics. These banks are interrelated to other banks of other language strings. As the bilingual learns more languages, a new string is replicated and is filled up with the information of the new language learned. The principles of the first and second languages' interaction are drawn; these are expressed in laws, namely: law of dominance, law of availability, law of usuality and law of preference. Furthermore, difficulties encountered in the learning of second languages were also determined.

Keywords: bilingualism, psycholinguistics, second language learning, languages

Procedia PDF Downloads 485
17424 Free Vibration of Functionally Graded Smart Beams Based on the First Order Shear Deformation Theory

Authors: A. R. Nezamabadi, M. Veiskarami

Abstract:

This paper studies free vibration of simply supported functionally graded beams with piezoelectric layers based on the first order shear deformation theory. The Young's modulus of beam is assumed to be graded continuously across the beam thickness. The governing equation is established. Resulting equation is solved using the Euler's equation. The effects of the constituent volume fractions, the influences of applied voltage on the vibration frequency are presented. To investigate the accuracy of the present analysis, a compression study is carried out with a known data.

Keywords: mechanical buckling, functionally graded beam, first order shear deformation theory, free vibration

Procedia PDF Downloads 448
17423 The French Ekang Ethnographic Dictionary. The Quantum Approach

Authors: Henda Gnakate Biba, Ndassa Mouafon Issa

Abstract:

Dictionaries modeled on the Western model [tonic accent languages] are not suitable and do not account for tonal languages phonologically, which is why the [prosodic and phonological] ethnographic dictionary was designed. It is a glossary that expresses the tones and the rhythm of words. It recreates exactly the speaking or singing of a tonal language, and allows the non-speaker of this language to pronounce the words as if they were a native. It is a dictionary adapted to tonal languages. It was built from ethnomusicological theorems and phonological processes, according to Jean. J. Rousseau 1776 hypothesis /To say and to sing were once the same thing/. Each word in the French dictionary finds its corresponding language, ekaη. And each word ekaη is written on a musical staff. This ethnographic dictionary is also an inventive, original and innovative research thesis, but it is also an inventive, original and innovative research thesis. A contribution to the theoretical, musicological, ethno musicological and linguistic conceptualization of languages, giving rise to the practice of interlocution between the social and cognitive sciences, the activities of artistic creation and the question of modeling in the human sciences: mathematics, computer science, translation automation and artificial intelligence. When you apply this theory to any text of a folksong of a world-tone language, you do not only piece together the exact melody, rhythm, and harmonies of that song as if you knew it in advance but also the exact speaking of this language. The author believes that the issue of the disappearance of tonal languages and their preservation has been structurally resolved, as well as one of the greatest cultural equations related to the composition and creation of tonal, polytonal and random music. The experimentation confirming the theorization designed a semi-digital, semi-analog application which translates the tonal languages of Africa (about 2,100 languages) into blues, jazz, world music, polyphonic music, tonal and anatonal music and deterministic and random music). To test this application, I use a music reading and writing software that allows me to collect the data extracted from my mother tongue, which is already modeled in the musical staves saved in the ethnographic (semiotic) dictionary for automatic translation ( volume 2 of the book). Translation is done (from writing to writing, from writing to speech and from writing to music). Mode of operation: you type a text on your computer, a structured song (chorus-verse), and you command the machine a melody of blues, jazz and, world music or, variety etc. The software runs, giving you the option to choose harmonies, and then you select your melody.

Keywords: music, language, entenglement, science, research

Procedia PDF Downloads 38
17422 Foreign Languages and Employability in the European Union

Authors: Paulina Pietrzyk-Kowalec

Abstract:

This paper presents the phenomenon of multilingualism becoming the norm rather than the exception in the European Union. It also seeks to describe the correlation between the command of foreign languages and employability. It is evident that the challenges of today's societies when it comes to employability and to the reality of the current labor market are more and more diversified. Thus, it is one of the crucial tasks of higher education to prepare its students to face this kind of complexity, understand its nuances, and have the capacity to adapt effectively to situations that are common in corporations based in the countries belonging to the EU. From this point of view, the assessment of the impact that the command of foreign languages of European university students could have on the numerous business sectors becomes vital. It also involves raising awareness of future professionals to make them understand the importance of mastering communicative skills in foreign languages that will meet the requirements of students' prospective employers. The direct connection between higher education institutions and the world of business also allows companies to realize that they should rethink their recruitment and human resources procedures in order to take into account the importance of foreign languages. This article focuses on the objective of the multilingualism policy developed by the European Commission, which is to enable young people to master at least two foreign languages, which is crucial in their future careers. The article puts emphasis on the existence of a crucial connection between the research conducted in higher education institutions and the business sector in order to reduce current qualification gaps.

Keywords: cross-cultural communication, employability, human resources, language attitudes, multilingualism

Procedia PDF Downloads 113
17421 The Attitude of Egyptian Nubian University Students towards Arabic and Nubian Languages

Authors: Sanaa Abouras

Abstract:

This research investigates the attitude of Egyptian Nubian University students towards the Arabic and the two Nubian languages, Nobiin, and Kenuzi-Dongola. The Nubian languages are called by Egyptian Nubians, Fadijja/Fadicca and Kenzi, respectively. Nubians are people who live in the Nubia area which lies between Egypt’s southern borders with the northern part of Sudan. Nubia is divided into two parts - one under the Egyptian regime, and the other under the Sudanese regime. The number of participants used in the study was forty - half male and half female. Twenty of these participants live in the Nubian region and are enrolled at the South Valley University in Aswan, Egypt. This number was compared with an additional twenty Egyptian-Nubian university students who live outside the Nubian region and attend various Egyptian universities located in Alexandria and Cairo. The hypothesis of this study is that Egyptian Nubian University students tend to have positive attitudes toward Arabic and also the Nubian languages. This research is a qualitative and partially quantitative one. Observations, questionnaires, and interviews were used to collect data in order to explore the following: (1) the language students prefer to speak at home and in public and if language preferences are gender-related, (2) the factors that influence the Egyptian Nubian university students' attitudes towards Arabic and Nubian languages, and (3) a look at the future of these ethnic Nubian languages. Results that answered the main question on the attitude of Egyptian Nubian University students toward Arabic and Nubian languages revealed that students who live inside and outside the Nubian region tend to have positive attitudes towards both the Arabic and the Nubian languages.

Keywords: language attitude, minority, Arabic language, Nubian Language

Procedia PDF Downloads 240
17420 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 64
17419 Speech Rhythm Variation in Languages and Dialects: F0, Natural and Inverted Speech

Authors: Imen Ben Abda

Abstract:

Languages have been classified into different rhythm classes. 'Stress-timed' languages are exemplified by English, 'syllable-timed' languages by French and 'mora-timed' languages by Japanese. However, to our best knowledge, acoustic studies have not been unanimous in strictly establishing which rhythm category a given language belongs to and failed to show empirical evidence for isochrony. Perception seems to be a good approach to categorize languages into different rhythm classes. This study, within the scope of experimental phonetics, includes an account of different perceptual experiments using cues from natural and inverted speech, as well as pitch extracted from speech data. It is an attempt to categorize speech rhythm over a large set of Arabic (Tunisian, Algerian, Lebanese and Moroccan) and English dialects (Welsh, Irish, Scottish and Texan) as well as other languages such as Chinese, Japanese, French, and German. Listeners managed to classify the different languages and dialects into different rhythm classes using suprasegmental cues mainly rhythm and pitch (F0). They also perceived rhythmic differences even among languages and dialects belonging to the same rhythm class. This may show that there are different subclasses within very broad rhythmic typologies.

Keywords: F0, inverted speech, mora-timing, rhythm variation, stress-timing, syllable-timing

Procedia PDF Downloads 484
17418 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 332
17417 Use of Pragmatic Cues for Word Learning in Bilingual and Monolingual Children

Authors: Isabelle Lorge, Napoleon Katsos

Abstract:

BACKGROUND: Children growing up in a multilingual environment face challenges related to the need to monitor the speaker’s linguistic abilities, more frequent communication failures, and having to acquire a large number of words in a limited amount of time compared to monolinguals. As a result, bilingual learners may develop different word learning strategies, rely more on some strategies than others, and engage cognitive resources such as theory of mind and attention skills in different ways. HYPOTHESIS: The goal of our study is to investigate whether multilingual exposure leads to improvements in the ability to use pragmatic inference for word learning, i.e., to use speaker cues to derive their referring intentions, often by overcoming lower level salience effects. The speaker cues we identified as relevant are (a) use of a modifier with or without stress (‘the WET dax’ prompting the choice of the referent which has a dry counterpart), (b) referent extension (‘this is a kitten with a fep’ prompting the choice of the unique rather than shared object), (c) referent novelty (choosing novel action rather than novel object which has been manipulated already), (d) teacher versus random sampling (assuming the choice of specific examples for a novel word to be relevant to the extension of that new category), and finally (e) emotional affect (‘look at the figoo’ uttered in a sad or happy voice) . METHOD: To this end, we implemented on a touchscreen computer a task corresponding to each of the cues above, where the child had to pick the referent of a novel word. These word learning tasks (a), (b), (c), (d) and (e) were adapted from previous word learning studies. 113 children have been tested (54 reception and 59 year 1, ranging from 4 to 6 years old) in a London primary school. Bilingual or monolingual status and other relevant information (age of onset, proficiency, literacy for bilinguals) is ascertained through language questionnaires from parents (34 out of 113 received to date). While we do not yet have the data that will allow us to test for effect of bilingualism, we can already see that performances are far from approaching ceiling in any of the tasks. In some cases the children’s performances radically differ from adults’ in a qualitative way, which means that there is scope for quantitative and qualitative effects to arise between language groups. The findings should contribute to explain the puzzling speed and efficiency that bilinguals demonstrate in acquiring competence in two languages.

Keywords: bilingualism, pragmatics, word learning, attention

Procedia PDF Downloads 112
17416 Expressivity of Word-Formation in English and Russian Advertising Lexicon

Authors: Voronina Ekaterina Borisovna

Abstract:

The problem of expressivity of advertising lexicon is studied in the article. The comparison of English and Russian advertising lexicons is done. The objects of the analysis were English and Russian advertising texts, both printed advertising texts and texts extracted from the commercials. Some conclusions concerning the expressivity of advertising lexicon were made. Expressivity can be included in the semantic structure of words or created by word-formation means. Expressivity caused by morphological derivatives includes such facilities as derivational affixes, models and types of word formation.

Keywords: advertising lexicon, expressivity, word-formation means, linguistics

Procedia PDF Downloads 326
17415 Building Semantic-Relatedness Thai Word Ontology for Semantic Analysis

Authors: Gridaphat Sriharee

Abstract:

Building semantic-relatedness Thai word ontology can be implemented by considering word forms and word meaning. This research proposed the methodology for building the ontology, which can be used for semantic analysis. There are four categories of words: similar form and the same meaning, similar form and similar meaning, different form and opposite/same meaning, and different form and similar meaning, which will be used as initial words for building the proposed ontology. Extension of the ontology can be augmented by considering the messages that give the meaning of the word from the dictionaries. Exploiting WordNet to construct the proposed ontology was investigated and discussed. The proposed ontology was evaluated for its quality. With the proposed methodology, it is promising that the constructed ontology is a well-defined ontology.

Keywords: Thai, NLP, semantics, ontology

Procedia PDF Downloads 63
17414 Accounting as Addressed in the Qur’aan

Authors: Shahriar M. Saadullah, Abdul-Quddoos Abdul-Basith, Zaki K. Abushawish

Abstract:

As a part of academic research in Islamic Accounting it is important to know how the word Accounting is discussed in the Qur’aan. This paper identifies and analyzes the word Accounting in the Qur’aan, which is significant to know and understand. The paper uses a methodology of identifying the root word of Accounting Hasaba (حسب) in the Qur’aan with the help of Islam 360 software and analyzes the use of the relevant words derived from the root word. Then the paper attempts to connect the findings to the contemporary Accounting issues. The paper finds that the root word of Accounting Hasaba (حسب) appears in the Qur’aan 109 times but it is only used in the sense Account, Accountable, or Accounting 45 times. These words appear in 44 different verses in the Qur’aan, appearing twice in one of the verses. The paper divides these verses into 8 different themes namely, Day of Accounting, without any Accounting, Accounting of Time, Self-Accounting, Swift in Accounting, Accounting is only with God, Awareness and the Good Accounting, and Heedlessness and the Bad Accounting. The way the words Account, Accounting, and Accountable is discussed in the Qur’aan links to the contemporary accounting issues including Ethics, Agency Theory, and Internal Control. The links discovered in the paper clearly shows the timeless nature of the message of the Qur’aan.

Keywords: accounting, contemporary accounting issues, Qur'aan, root word of accounting hasaba

Procedia PDF Downloads 393
17413 Using the Smith-Waterman Algorithm to Extract Features in the Classification of Obesity Status

Authors: Rosa Figueroa, Christopher Flores

Abstract:

Text categorization is the problem of assigning a new document to a set of predetermined categories, on the basis of a training set of free-text data that contains documents whose category membership is known. To train a classification model, it is necessary to extract characteristics in the form of tokens that facilitate the learning and classification process. In text categorization, the feature extraction process involves the use of word sequences also known as N-grams. In general, it is expected that documents belonging to the same category share similar features. The Smith-Waterman (SW) algorithm is a dynamic programming algorithm that performs a local sequence alignment in order to determine similar regions between two strings or protein sequences. This work explores the use of SW algorithm as an alternative to feature extraction in text categorization. The dataset used for this purpose, contains 2,610 annotated documents with the classes Obese/Non-Obese. This dataset was represented in a matrix form using the Bag of Word approach. The score selected to represent the occurrence of the tokens in each document was the term frequency-inverse document frequency (TF-IDF). In order to extract features for classification, four experiments were conducted: the first experiment used SW to extract features, the second one used unigrams (single word), the third one used bigrams (two word sequence) and the last experiment used a combination of unigrams and bigrams to extract features for classification. To test the effectiveness of the extracted feature set for the four experiments, a Support Vector Machine (SVM) classifier was tuned using 20% of the dataset. The remaining 80% of the dataset together with 5-Fold Cross Validation were used to evaluate and compare the performance of the four experiments of feature extraction. Results from the tuning process suggest that SW performs better than the N-gram based feature extraction. These results were confirmed by using the remaining 80% of the dataset, where SW performed the best (accuracy = 97.10%, weighted average F-measure = 97.07%). The second best was obtained by the combination of unigrams-bigrams (accuracy = 96.04, weighted average F-measure = 95.97) closely followed by the bigrams (accuracy = 94.56%, weighted average F-measure = 94.46%) and finally unigrams (accuracy = 92.96%, weighted average F-measure = 92.90%).

Keywords: comorbidities, machine learning, obesity, Smith-Waterman algorithm

Procedia PDF Downloads 271
17412 Contribution of Word Decoding and Reading Fluency on Reading Comprehension in Young Typical Readers of Kannada Language

Authors: Vangmayee V. Subban, Suzan Deelan. Pinto, Somashekara Haralakatta Shivananjappa, Shwetha Prabhu, Jayashree S. Bhat

Abstract:

Introduction and Need: During early years of schooling, the instruction in the schools mainly focus on children’s word decoding abilities. However, the skilled readers should master all the components of reading such as word decoding, reading fluency and comprehension. Nevertheless, the relationship between each component during the process of learning to read is less clear. The studies conducted in alphabetical languages have mixed opinion on relative contribution of word decoding and reading fluency on reading comprehension. However, the scenarios in alphasyllabary languages are unexplored. Aim and Objectives: The aim of the study was to explore the role of word decoding, reading fluency on reading comprehension abilities in children learning to read Kannada between the age ranges of 5.6 to 8.6 years. Method: In this cross sectional study, a total of 60 typically developing children, 20 each from Grade I, Grade II, Grade III maintaining equal gender ratio between the age range of 5.6 to 6.6 years, 6.7 to 7.6 years and 7.7 to 8.6 years respectively were selected from Kannada medium schools. The reading fluency and reading comprehension abilities of the children were assessed using Grade level passages selected from the Kannada text book of children core curriculum. All the passages consist of five questions to assess reading comprehension. The pseudoword decoding skills were assessed using 40 pseudowords with varying syllable length and their Akshara composition. Pseudowords are formed by interchanging the syllables within the meaningful word while maintaining the phonotactic constraints of Kannada language. The assessment material was subjected to content validation and reliability measures before collecting the data on the study samples. The data were collected individually, and reading fluency was assessed for words correctly read per minute. Pseudoword decoding was scored for the accuracy of reading. Results: The descriptive statistics indicated that the mean pseudoword reading, reading comprehension, words accurately read per minute increased with the Grades. The performance of Grade III children found to be higher, Grade I lower and Grade II remained intermediate of Grade III and Grade I. The trend indicated that reading skills gradually improve with the Grades. Pearson’s correlation co-efficient showed moderate and highly significant (p=0.00) positive co-relation between the variables, indicating the interdependency of all the three components required for reading. The hierarchical regression analysis revealed 37% variance in reading comprehension was explained by pseudoword decoding and was highly significant. Subsequent entry of reading fluency measure, there was no significant change in R-square and was only change 3%. Therefore, pseudoword-decoding evolved as a single most significant predictor of reading comprehension during early Grades of reading acquisition. Conclusion: The present study concludes that the pseudoword decoding skills contribute significantly to reading comprehension than reading fluency during initial years of schooling in children learning to read Kannada language.

Keywords: alphasyllabary, pseudo-word decoding, reading comprehension, reading fluency

Procedia PDF Downloads 234
17411 The Multi-Lingual Acquisition Patterns of Elementary, High School and College Students in Angeles City, Philippines

Authors: Dennis Infante, Leonora Yambao

Abstract:

The Philippines is a multilingual community. A Filipino learns at least three languages throughout his lifespan. Since languages are learned and picked up simultaneously in the environment, a student naturally develops a language system that combines features of at least three languages: the local language, English and Filipino. This study seeks to investigate this particular phenomenon and aspires to propose a theoretical framework of unique language acquisition in the elementary, high school and college in the three languages spoken and used in media, community, business and school: Kapampangan, the local language; Filipino, the national language; and English. The study randomly selects five students from three participating schools in order to acquire language samples. The samples were analyzed in the subsentential, sentential and suprasentential levels using grammatical theories. The data are classified to map out the pattern of substitution or shifting from one language to another.

Keywords: language acquisition, mother tongue, multiculturalism, multilingual education

Procedia PDF Downloads 353
17410 Effects of Foreign-language Learning on Bilinguals' Production in Both Their Languages

Authors: Natalia Kartushina

Abstract:

Foreign (second) language (L2) learning is highly promoted in modern society. Students are encouraged to study abroad (SA) to achieve the most effective learning outcomes. However, L2 learning has side effects for native language (L1) production, as L1 sounds might show a drift from the L1 norms towards those of the L2, and this, even after a short period of L2 learning. L1 assimilatory drift has been attributed to a strong perceptual association between similar L1 and L2 sounds in the mind of L2 leaners; thus, a change in the production of an L2 target leads to the change in the production of the related L1 sound. However, nowadays, it is quite common that speakers acquire two languages from birth, as, for example, it is the case for many bilingual communities (e.g., Basque and Spanish in the Basque Country). Yet, it remains to be established how FL learning affects native production in individuals who have two native languages, i.e., in simultaneous or very early bilinguals. Does FL learning (here a third language, L3) affect bilinguals’ both languages or only one? What factors determine which of the bilinguals’ languages is more susceptible to change? The current study examines the effects of L3 (English) learning on the production of vowels in the two native languages of simultaneous Spanish-Basque bilingual adolescents enrolled into the Erasmus SA English program. Ten bilingual speakers read five Spanish and Basque consonant-vowel-consonant-vowel words two months before their SA and the next day after their arrival back to Spain. Each word contained the target vowel in the stressed syllable and was repeated five times. Acoustic analyses measuring vowel openness (F1) and backness (F2) were performed. Two possible outcomes were considered. First, we predicted that L3 learning would affect the production of only one language and this would be the language that would be used the most in contact with English during the SA period. This prediction stems from the results of recent studies showing that early bilinguals have separate phonological systems for each of their languages; and that late FL learner (as it is the case of our participants), who tend to use their L1 in language-mixing contexts, have more L2-accented L1 speech. The second possibility stated that L3 learning would affect both of the bilinguals’ languages in line with the studies showing that bilinguals’ L1 and L2 phonologies interact and constantly co-influence each other. The results revealed that speakers who used both languages equally often (balanced users) showed an F1 drift in both languages toward the F1 of the English vowel space. Unbalanced speakers, however, showed a drift only in the less used language. The results are discussed in light of recent studies suggesting that the amount of language use is a strong predictor of the authenticity in speech production with less language use leading to more foreign-accented speech and, eventually, to language attrition.

Keywords: language-contact, multilingualism, phonetic drift, bilinguals' production

Procedia PDF Downloads 91
17409 Second-Order Complex Systems: Case Studies of Autonomy and Free Will

Authors: Eric Sanchis

Abstract:

Although there does not exist a definitive consensus on a precise definition of a complex system, it is generally considered that a system is complex by nature. The presented work illustrates a different point of view: a system becomes complex only with regard to the question posed to it, i.e., with regard to the problem which has to be solved. A complex system is a couple (question, object). Because the number of questions posed to a given object can be potentially substantial, complexity does not present a uniform face. Two types of complex systems are clearly identified: first-order complex systems and second-order complex systems. First-order complex systems physically exist. They are well-known because they have been studied by the scientific community for a long time. In second-order complex systems, complexity results from the system composition and its articulation that are partially unknown. For some of these systems, there is no evidence of their existence. Vagueness is the keyword characterizing this kind of systems. Autonomy and free will, two mental productions of the human cognitive system, can be identified as second-order complex systems. A classification based on the properties structure makes it possible to discriminate complex properties from the others and to model this kind of second order complex systems. The final outcome is an implementable synthetic property that distinguishes the solid aspects of the actual property from those that are uncertain.

Keywords: autonomy, free will, synthetic property, vaporous complex systems

Procedia PDF Downloads 182