Search results for: bag of words (BOW)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1277

Search results for: bag of words (BOW)

1127 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 448
1126 Recognition of Voice Commands of Mentor Robot in Noisy Environment Using Hidden Markov Model

Authors: Khenfer Koummich Fatma, Hendel Fatiha, Mesbahi Larbi

Abstract:

This paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a human-machine interface with a voice recognition system that allows the operator to teleoperate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands pronounced in two languages: French and Arabic. The obtained recognition rate is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equals 30 dB, in this case; the Arabic speech recognition rate is 69%, and the French speech recognition rate is 80%. This can be explained by the ability of phonetic context of each speech when the noise is added.

Keywords: Arabic speech recognition, Hidden Markov Model (HMM), HTK, noise, TIMIT, voice command

Procedia PDF Downloads 385
1125 The Effects of English Contractions on the Application of Syntactic Theories

Authors: Wakkai Hosanna Hussaini

Abstract:

A formal structure of the English clause is composed of at least two elements – subject and verb, in structural grammar and at least one element – predicate, in systemic (functional) and generative grammars. Each of the elements can be represented by a word or group (of words). In modern English structure, very often speakers merge two words as one with the use of an apostrophe. Each of the two words can come from different elements or belong to the same element. In either case, result of the merger is called contraction. Although contractions constitute a part of modern English structure, they are considered informal in nature (more frequently used in spoken than written English) that is why they were initially viewed as constituting an evidence of language deterioration. To our knowledge, no formal syntactic theory yet has been particular on the contractions because of its deviation from the formal rules of syntax that seek to identify the elements that form a clause in English. The inconsistency between the formal rules and a contraction is established when two words representing two elements in a non-contraction are merged as one element to form a contraction. Thus the paper presents the various syntactic issues as effects arising from converting non-contracted to contracted forms. It categorizes English contractions and describes each category according to its syntactic relations (position and relationship) and morphological formation (form and content) as integral part of modern structure of English. This is a position paper as such the methodology is observational, descriptive and explanatory/analytical based on existing related literature. The inventory of English contractions contained in books on syntax forms the data from where specific examples are drawn. It is noted as conclusion that the existing syntactic theories were not originally established to account for English contractions. The paper, when published, will further expose the inadequacies of the existing syntactic theories by giving more reasons for the establishment of a more comprehensive syntactic theory for analyzing English clause/sentence structure involving contractions. The method used reveals the extent of the inadequacies in applying the three major syntactic theories: structural, systemic (functional) and generative, on the English contractions. Although no theory is without scope, shying away from the three major theories from recognizing the English contractions need to be broken because of the increasing popularity of its use in modern English structure. The paper, therefore, recommends that as use of contraction gains more popular even in formal speeches today, there is need to establish a syntactic theory to handle its patterns of syntactic relations and morphological formation.

Keywords: application, effects, English contractions, syntactic theories

Procedia PDF Downloads 268
1124 Can Urbanisation Be the Cause for Increasing Urban Poverty: An Exploratory Analysis for India

Authors: Sarmistha Singh

Abstract:

An analysis of trend of urbanization and urban poverty in recent decades is showing that a distinctly reducing rural poverty and increasing in urban areas. It can be argued that the higher the urbanization fuelled by the urban migration to city, which is picking up people from less skilled, education so they faced obstacle to enter into the mainstream economy of city. The share of workforce in economy is higher; in contrast it remains as negligence. At the same time, less wages, absence of social security, social dialogue make them insecure. The vulnerability in their livelihood found. So the paper explores the relation of urbanization and urban poverty in the city, in other words how the urbanization process affecting the urban space in creating the number of poor people in the city. The central focus is the mobility of people with less education and skilled with motive of job search and better livelihood. In many studies found the higher the urbanization and higher the urban poverty in city. In other words, poverty is the impact of urbanization. The strategy of urban inequality through ‘dispersal of concentration’ by the World Bank and others, need to be examined.

Keywords: urbanization, mobility, urban poverty, informal settlements, informal worker

Procedia PDF Downloads 414
1123 Using Synonymy in Translation of Hemingway’s 'A Farewell to Arms' from English into Albanian

Authors: Miranda Enesi, Helena Grillo Mukli

Abstract:

The English word-stock is extremely rich in synonyms which can be largely accounted for by the abundant borrowing. Translation problems encountered by translators in general are usually ‘transfer problems’. They face more difficulties in the interpretation of meaning from the source language text than lexical differences between languages. The aim of the study is to inspect the various strategies used in translating from English into Albanian specific words in the ‘A Farwell to arms’ novel. For this purpose, examples translated from English into Albanian were examined. The Albanian equivalents have shown that various strategies were used in order to overcome the problem of rendering words and expressions into the target language. Employed strategies were synonymy, modulation, transposition, calque and word for word translation. In addition, this paper shows that the strategy of translating using synonymy is mostly used. In this paper, an attempt is made to examine the nature of contextual synonymy in order to investigate its problematic nature regarding translation. Types of synonymy are analyzed and then examples from English and Albanian versions are provided to examine the overlap between them.

Keywords: equivalence, literal translation, paraphrasing, transfer problems, synonymy

Procedia PDF Downloads 174
1122 Infringement of Patent Rights with Doctrine of Equivalent for Turkey

Authors: Duru Helin Ozaner

Abstract:

Due to the doctrine of equivalent, the words in the claims' sentences are insufficient for the protection area provided by the patent registration. While this situation widens the boundaries of the protection area, it also obscures the boundaries of the protected area of patents. In addition, it creates distrust for third parties. Therefore, the doctrine of equivalent aims to establish a balance between the rights of patent owners and the legal security of third parties. The current legal system of Turkey has been tried to be created as a parallel judicial system to the widely applied regulations. Therefore, the regulations regarding the protection provided by patents in the current Turkish legal system are similar to many countries. However, infringement through equivalent is common by third parties. This study, it is aimed to explain that the protection provided by the patent is not only limited to the words of the claims but also the wide-ranging protection provided by the claims for the doctrine of equivalence. This study is important to determine the limits of the protection provided by the patent right holder and to indicate the importance of the equivalent elements of the protection granted to the patent right holder.

Keywords: patent, infringement, intellectual property, the doctrine of equivalent

Procedia PDF Downloads 214
1121 The Words of the Pandemic in Spillover by David Quammen

Authors: Anna Maria Re

Abstract:

Taking advantage of the ecolinguistic theoretical and practical analysis, the work intends the prophetic, punctual, and at times disturbing language used by David Quammen in Spillover, questioning it from an ecological perspective and contributing to the search for new stories. In the famous volume, the author illustrates a literary history of the great epidemics and pandemics, demonstrating that viruses are nature's inevitable response to man's assault on ecosystems. In doing so, he introduces new words, which have tamed our anxieties in recent years since writing as a human artistic expression can mirror the human conscience. Writing in the Anthropocene, coining a new reference lexicon with respect to what is happening, means offering a form to the idea of survival of the planet, imagining the human being grappling with an environment whose conformation he himself has helped to change with a language that is no longer effective in describing the world as we have known it and that quickly needs a radical overhaul. Following the methodology proposed in Ecolinguistics: language, ecology and the stories we live by, the analysis in the paper will enhance the language that encodes new stories based on: ideologies, framings, metaphors, evaluations, identities, convictions, and salience.

Keywords: Anthropocene, pandemic, spillover, virus, zoonosis

Procedia PDF Downloads 97
1120 Variation of Lexical Choice and Changing Need of Identity Expression

Authors: Thapasya J., Rajesh Kumar

Abstract:

Language plays complex roles in society. The previous studies on language and society explain their interconnected, complementary and complex interactions and, those studies were primarily focused on the variations in the language. Variation being the fundamental nature of languages, the question of personal and social identity navigated through language variation and established that there is an interconnection between language variation and identity. This paper analyses the sociolinguistic variation in language at the lexical level and how the lexical choice of the speaker(s) affects in shaping their identity. It obtains primary data from the lexicon of the Mappila dialect of Malayalam spoken by the members of Mappila (Muslim) community of Kerala. The variation in the lexical choice is analysed by collecting data from the speech samples of 15 minutes from four different age groups of Mappila dialect speakers. Various contexts were analysed and the frequency of borrowed words in each instance is calculated to reach a conclusion on how the variation is happening in the speech community. The paper shows how the lexical choice of the speakers could be socially motivated and involve in shaping and changing identities. Lexical items or vocabulary clearly signal the group identity and personal identity. Mappila dialect of Malayalam was rich in frequent use of borrowed words from Arabic, Persian and Urdu. There was a deliberate attempt to show their identity as a Mappila community member, which was derived from the socio-political situation during those days. This made a clear variation between the Mappila dialect and other dialects of Malayalam at the surface level, which was motivated to create and establish the identity of a person as the member of Mappila community. Historically, these kinds of linguistic variation were highly motivated because of the socio-political factors and, intertwined with the historical facts about the origin and spread of Islamism in the region; people from the Mappila community highly motivated to project their identity as a Mappila because of the social insecurities they had to face before accepting that religion. Thus the deliberate inclusion of Arabic, Persian and Urdu words in their speech helped in showing their identity. However, the socio-political situations and factors at the origin of Mappila community have been changed over a period of time. The social motivation for indicating their identity as a Mappila no longer exist and thus the frequency of borrowed words from Arabic, Persian and Urdu have been reduced from their speech. Apart from the religious terms, the borrowed words from these languages are very few at present. The analysis is carried out by the changes in the language of the people according to their age and found to have significant variations between generations and literacy plays a major role in this variation process. The need of projecting a specific identity of an individual would vary according to the change in the socio-political scenario and a variation in language can shape the identity in order to go with the varying socio-political situation in any language.

Keywords: borrowings, dialect, identity, lexical choice, literacy, variation

Procedia PDF Downloads 237
1119 Understanding the Interactive Nature in Auditory Recognition of Phonological/Grammatical/Semantic Errors at the Sentence Level: An Investigation Based upon Japanese EFL Learners’ Self-Evaluation and Actual Language Performance

Authors: Hirokatsu Kawashima

Abstract:

One important element of teaching/learning listening is intensive listening such as listening for precise sounds, words, grammatical, and semantic units. Several classroom-based investigations have been conducted to explore the usefulness of auditory recognition of phonological, grammatical and semantic errors in such a context. The current study reports the results of one such investigation, which targeted auditory recognition of phonological, grammatical, and semantic errors at the sentence level. 56 Japanese EFL learners participated in this investigation, in which their recognition performance of phonological, grammatical and semantic errors was measured on a 9-point scale by learners’ self-evaluation from the perspective of 1) two types of similar English sound (vowel and consonant minimal pair words), 2) two types of sentence word order (verb phrase-based and noun phrase-based word orders), and 3) two types of semantic consistency (verb-purpose and verb-place agreements), respectively, and their general listening proficiency was examined using standardized tests. A number of findings have been made about the interactive relationships between the three types of auditory error recognition and general listening proficiency. Analyses based on the OPLS (Orthogonal Projections to Latent Structure) regression model have disclosed, for example, that the three types of auditory error recognition are linked in a non-linear way: the highest explanatory power for general listening proficiency may be attained when quadratic interactions between auditory recognition of errors related to vowel minimal pair words and that of errors related to noun phrase-based word order are embraced (R2=.33, p=.01).

Keywords: auditory error recognition, intensive listening, interaction, investigation

Procedia PDF Downloads 513
1118 Earnings Management and Tone Management: Evidence from the UK

Authors: Salah Kayed Kayed, Jessica Hong Yang

Abstract:

This study investigates, whether earnings management in the audited financial statements is associated with tone management in the narrative sections of the annual report in the UK. Earnings management and narrative disclosure are communication strategies used from managers to communicate with investors or other users. Because earnings management and narrative disclosure stem from managers, they can exploit this by doing manipulation in their earnings, and simultaneously disclosing qualitative text (narrative information) in their reports as a tone of words, which will affect users’ perception, and hence users will be misinformed. The association between earnings and tone management can be explained by the self-serving, through cognitive reference points, theory. The sample period lasts from 2010 to 2015, and the sample comprises all non-financial firms that consider under FTSE 350 in any year during the sample period. A list of words from previous research is used to measure the tone in the narrative sections of the annual report. Because this study focuses on the managerial strategic choice and the subjective issues that come from management, it uses the abnormal tone to capture the managerial discretion on tone, and a number of different discretionary accruals proxies to measure earnings management, where accruals management is considered as a manipulation tool from managers to change the users' perception. This research is motivated to fulfil the literature gap by examining the association between earnings and tone management. Moreover, if firms that apply earnings management use tone management to mislead investors, it is beneficial for investors, policy makers, standard setters, or other users to know whether there is an association between earnings management and tone management. Clearly, we believe that this study is fundamental in the accounting context, where it evaluates the communication strategies that are used in firms' financial reports. Consistent with prior research, it is expected that tone management is positively associated with earnings management. This means that firms that use earnings management have incentives to manipulate in their narrative disclosure through tone of words, to reflect a good perception for users, which will conceal the earnings management techniques used in their reporting.

Keywords: earnings management, FTSE 350, narrative disclosure, tone management

Procedia PDF Downloads 277
1117 The Influence of Cognitive Load in the Acquisition of Words through Sentence or Essay Writing

Authors: Breno Barrreto Silva, Agnieszka Otwinowska, Katarzyna Kutylowska

Abstract:

Research comparing lexical learning following the writing of sentences and longer texts with keywords is limited and contradictory. One possibility is that the recursivity of writing may enhance processing and increase lexical learning; another possibility is that the higher cognitive load of complex-text writing (e.g., essays), at least when timed, may hinder the learning of words. In our study, we selected 2 sets of 10 academic keywords matched for part of speech, length (number of characters), frequency (SUBTLEXus), and concreteness, and we asked 90 L1-Polish advanced-level English majors to use the keywords when writing sentences, timed (60 minutes) or untimed essays. First, all participants wrote a timed Control essay (60 minutes) without keywords. Then different groups produced Timed essays (60 minutes; n=33), Untimed essays (n=24), or Sentences (n=33) using the two sets of glossed keywords (counterbalanced). The comparability of the participants in the three groups was ensured by matching them for proficiency in English (LexTALE), and for few measures derived from the control essay: VocD (assessing productive lexical diversity), normed errors (assessing productive accuracy), words per minute (assessing productive written fluency), and holistic scores (assessing overall quality of production). We measured lexical learning (depth and breadth) via an adapted Vocabulary Knowledge Scale (VKS) and a free association test. Cognitive load was measured in the three essays (Control, Timed, Untimed) using normed number of errors and holistic scores (TOEFL criteria). The number of errors and essay scores were obtained from two raters (interrater reliability Pearson’s r=.78-91). Generalized linear mixed models showed no difference in the breadth and depth of keyword knowledge after writing Sentences, Timed essays, and Untimed essays. The task-based measurements found that Control and Timed essays had similar holistic scores, but that Untimed essay had better quality than Timed essay. Also, Untimed essay was the most accurate, and Timed essay the most error prone. Concluding, using keywords in Timed, but not Untimed, essays increased cognitive load, leading to more errors and lower quality. Still, writing sentences and essays yielded similar lexical learning, and differences in the cognitive load between Timed and Untimed essays did not affect lexical acquisition.

Keywords: learning academic words, writing essays, cognitive load, english as an L2

Procedia PDF Downloads 73
1116 Preferred Character Size for Oblique Angles

Authors: Photjanat Phimnom, Haruetai Lohasiriwat

Abstract:

In today’s world, the LED display has been used for presenting visual information under various circumstances. Such information is an important intermediary in the human information processing. Researchers have been investigated diverse factors that influence this process effectiveness. The letter size is undoubtedly one major factor that has been tested and recommended by many standards and guidelines. However, viewing information on the display from direct perpendicular position is a typical assumption whereas many actual events are required viewing from the angles. This current research aims to study the effect of oblique viewing angle and viewing distance on ability to recognize alphabet, number, and English word. The total of ten participants was volunteered to our 3 x 4 x 4 within subject study. Independent variables include three distance levels (2, 6, and 12 m), four oblique angle (0, 45, 60, 75 degree), and four target types (alphabet, number, short words, and long words). Following the method of constant stimuli we found that the larger oblique angle, ranging from 0 to 75 degree from the line of sight, results in significant higher legibility threshold or larger font size required (p-value < 0.05). Viewing distance factor also shows to have significant effect on the threshold (p-value < 0.05). However, the effect from distance factor is expected to be confounded by the quality of the screen we used in our experiment. Lastly, our results show that single alphabet as well as single number are recognized at significant lower threshold (smaller font size) as compared to both short and long words (p-value < 0.05). Therefore, it is recommended that when designs information to be presented on LED display, understanding of all possible ranges of oblique angle should be taken into account in order to specify the preferred letter size. Additionally, the recommendation of letter size for 100 % readability in our tested conditions is provided in the paper.

Keywords: letter size, oblique angle, viewing distance, legibility threshold

Procedia PDF Downloads 394
1115 Objectives of the Standardization of Technical Terminology Nowadays in Albanian

Authors: Gani Pllana

Abstract:

In the conditions of the rapid development of technics and technology in recent years, the cooperation of the scientific-technical language with the standard Albanian language is continuing with a higher intensity than before. We notice a vigor of enrichment in the vocabulary of technical terminology, due to the birth and formation of new fields and subfields of technics, technology, as computing, mechatronics, telemetry, a multitude of concepts many of which, on the one hand, are marked with names of the languages they come from, mainly from English, but on the other hand, they meet their needs with the lexical mother tongue composition (by common words being raised to terms) and with the activation of other layers, such as compound word terms. Thus, for example, in the field of computing, we notice in it the inclusion of the ordinary vocabulary for reproductive reasons, like mi, dritare, flamur, adresë, skedar (Engl: mouse, window, flag, address, file), and along with them, the compound word terms, serving to differentiate relevant concepts, like, adresë e hiperlidhjes, adresë e uebit, adresë relative, adresë virtuale (Engl. address hyperlink, web address, relative address, virtual address) etc.

Keywords: common words, Albanian language, technical terminology, standardization

Procedia PDF Downloads 289
1114 A Study Investigating Word Association Behaviour in People with Acquired Language and Communication Disorders

Authors: Angela Maria Fenu

Abstract:

The aim of this study was to better characterize the nature of word association responses in people with aphasia. The participants selected for the experimental group were 4 individuals with mild Broca’s aphasia. The control group consisted of 51 cognitively intact age- and gender-matched individuals. The participants were asked to perform a word association task in which they had to say the first word they thought of when hearing each cue. The cue words (n= 16) were the translation in Italian of the set of English cue words of a published study. The participants from the experimental group were administered the word association test every two weeks for a period of two months when they received speech-language therapy A combination of analytical approaches to measure the data was used. To analyse different patterns of word association responses in both groups, the nature of the relationship between the cue and the response was examined: responses were divided into five categories of association. To investigate the similarity between aphasic and non-aphasic subjects, the stereotypy of responses was examined.While certain stimulus words (nouns, adjectives) elicited responses from Broca’s aphasics that tended to resemble those made by non-aphasic subjects; others (adverbs, verbs) showed the tendency to elicit responses different from the ones given by normal subjects. This suggests that some mechanisms underlying certain types of associations are degraded in aphasics individuals, while others display little evidence of disruption. The high number of paradigmatic associations given in response to a noun or an adjective might imply that the mechanisms, largely semantic, underlying paradigmatic associations are relatively preserved in Broca’s aphasia, but it might also mean that some words are more easily processed depending on their grammatical class (nouns, adjectives). The most significant variation was noticed when the grammatical class of the cue word was an adverb. Unlike the normal individuals, the experimental subjects gave the most idiosyncratic associations, which are often produced when the attempt to give a paradigmatic response fails. In turn, the failure to retrieve paradigmatic responses when the cue is an adverb might suggest that Broca’s aphasics are more sensitive to this grammatical class.The findings from this study suggest that, from research on word associations in people with aphasia, important data can arise concerning the specific lexical retrieval impairments that characterize the different types of aphasia and the various treatments that might positively influence the kinds of word association responses affected by language disruption.

Keywords: aphasia therapy, clinical linguistics, word-association behaviour, mental lexicon

Procedia PDF Downloads 88
1113 The Language of Fliptop among Filipino Youth: A Discourse Analysis

Authors: Bong Borero Lumabao

Abstract:

This qualitative research is a study on the lines of Fliptop talks performed by the Fliptop rappers employing Finnegan’s (2008) discourse analysis. This paper aimed to analyze the phonological, morphological, and semantic features of the fliptop talk, to explore the structures in the lines of Fliptop among Filipino youth, and to uncover the various insights that can be gained from it. The corpora of the study included all the 20 Fliptop Videos downloaded from the Youtube Channel of Fliptop. Results revealed that Fliptop contains phonological features such as assonance, consonance, deletion, lengthening, and rhyming. Morphological features include acronym, affixation, blending, borrowing, code-mixing and switching, compounding, conversion or functional shifts, and dysphemism. Semantics presented the lexical category, meaning, and words used in the fliptop talks. Structure of Fliptop revolves on the personal attack (physical attributes), attack on the bars (rapping skills), extension: family members and friends, antithesis, profane words, figurative languages, sexual undertones, anime characters, homosexuality, and famous celebrities involvement.

Keywords: discourse analysis, fliptop talks, filipino youth, fliptop videos, Philippines

Procedia PDF Downloads 242
1112 Cross-Language Variation and the ‘Fused’ Zone in Bilingual Mental Lexicon: An Experimental Research

Authors: Yuliya E. Leshchenko, Tatyana S. Ostapenko

Abstract:

Language variation is a widespread linguistic phenomenon which can affect different levels of a language system: phonological, morphological, lexical, syntactic, etc. It is obvious that the scope of possible standard alternations within a particular language is limited by a variety of its norms and regulations which set more or less clear boundaries for what is possible and what is not possible for the speakers. The possibility of lexical variation (alternate usage of lexical items within the same contexts) is based on the fact that the meanings of words are not clearly and rigidly defined in the consciousness of the speakers. Therefore, lexical variation is usually connected with unstable relationship between words and their referents: a case when a particular lexical item refers to different types of referents, or when a particular referent can be named by various lexical items. We assume that the scope of lexical variation in bilingual speech is generally wider than that observed in monolingual speech due to the fact that, besides ‘lexical item – referent’ relations it involves the possibility of cross-language variation of L1 and L2 lexical items. We use the term ‘cross-language variation’ to denote a case when two equivalent words of different languages are treated by a bilingual speaker as freely interchangeable within the common linguistic context. As distinct from code-switching which is traditionally defined as the conscious use of more than one language within one communicative act, in case of cross-language lexical variation the speaker does not perceive the alternate lexical items as belonging to different languages and, therefore, does not realize the change of language code. In the paper, the authors present research of lexical variation of adult Komi-Permyak – Russian bilingual speakers. The two languages co-exist on the territory of the Komi-Permyak District in Russia (Komi-Permyak as the ethnic language and Russian as the official state language), are usually acquired from birth in natural linguistic environment and, according to the data of sociolinguistic surveys, are both identified by the speakers as coordinate mother tongues. The experimental research demonstrated that alternation of Komi-Permyak and Russian words within one utterance/phrase is highly frequent both in speech perception and production. Moreover, our participants estimated cross-language word combinations like ‘маленькая /Russian/ нывка /Komi-Permyak/’ (‘a little girl’) or ‘мунны /Komi-Permyak/ домой /Russian/’ (‘go home’) as regular/habitual, containing no violation of any linguistic rules and being equally possible in speech as the equivalent intra-language word combinations (‘учöтик нывка’ /Komi-Permyak/ or ‘идти домой’ /Russian/). All the facts considered, we claim that constant concurrent use of the two languages results in the fact that a large number of their words tend to be intuitively interpreted by the speakers as lexical variants not only related to the same referent, but also referring to both languages or, more precisely, to none of them in particular. Consequently, we can suppose that bilingual mental lexicon includes an extensive ‘fused’ zone of lexical representations that provide the basis for cross-language variation in bilingual speech.

Keywords: bilingualism, bilingual mental lexicon, code-switching, lexical variation

Procedia PDF Downloads 148
1111 Using Maximization Entropy in Developing a Filipino Phonetically Balanced Wordlist for a Phoneme-Level Speech Recognition System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

In this paper, a set of Filipino Phonetically Balanced Word list consisting of 250 words (PBW250) were constructed for a phoneme-level ASR system for the Filipino language. The Entropy Maximization is used to obtain phonological balance in the list. Entropy of phonemes in a word is maximized, providing an optimal balance in each word’s phonological distribution using the Add-Delete Method (PBW algorithm) and is compared to the modified PBW algorithm implemented in a dynamic algorithm approach to obtain optimization. The gained entropy score of 4.2791 and 4.2902 for the PBW and modified algorithm respectively. The PBW250 was recorded by 40 respondents, each with 2 sets data. Recordings from 30 respondents were trained to produce an acoustic model that were tested using recordings from 10 respondents using the HMM Toolkit (HTK). The results of test gave the maximum accuracy rate of 97.77% for a speaker dependent test and 89.36% for a speaker independent test.

Keywords: entropy maximization, Filipino language, Hidden Markov Model, phonetically balanced words, speech recognition

Procedia PDF Downloads 457
1110 Bag of Local Features for Person Re-Identification on Large-Scale Datasets

Authors: Yixiu Liu, Yunzhou Zhang, Jianning Chi, Hao Chu, Rui Zheng, Libo Sun, Guanghao Chen, Fangtong Zhou

Abstract:

In the last few years, large-scale person re-identification has attracted a lot of attention from video surveillance since it has a potential application prospect in public safety management. However, it is still a challenging job considering the variation in human pose, the changing illumination conditions and the lack of paired samples. Although the accuracy has been significantly improved, the data dependence of the sample training is serious. To tackle this problem, a new strategy is proposed based on bag of visual words (BoVW) model of designing the feature representation which has been widely used in the field of image retrieval. The local features are extracted, and more discriminative feature representation is obtained by cross-view dictionary learning (CDL), then the assignment map is obtained through k-means clustering. Finally, the BoVW histograms are formed which encodes the images with the statistics of the feature classes in the assignment map. Experiments conducted on the CUHK03, Market1501 and MARS datasets show that the proposed method performs favorably against existing approaches.

Keywords: bag of visual words, cross-view dictionary learning, person re-identification, reranking

Procedia PDF Downloads 194
1109 Phonetics Problems and Solutions for 5th Grade Students of Turkish Language as a Foreign Language in Demirel College in 2015-2016 Academic Year

Authors: Huseyin Demir

Abstract:

Foreign language learners are able to make mistakes in their pronunciation and writing when they encounter with alphabetical indications that are not available in their own language. The fifth-grade students who learn Turkish language at Demirel College in Georgia constitute the concrete example. ‘F’, ‘y’, ‘ö’, ‘ü’ letters in the Turkish alphabet are the most common mistakes they make. After a careful comparative linguistic study, it was found out that the mistakes caused by the fact that these signs were not available in Georgian. These problems have been tried to be solved through comparative language teaching method by using the pronunciation possibilities in other languages, which are spoken or known by students. First of all, other languages known by students are identified, the similar pronunciation difficulties in Turkish are also found in those languages in order to minimize the pronunciation problem in Turkish, pronunciation possibilities are that are available in those language are utilized. In this context, visual animations are made for pronunciation of English words such as year (yr), earn (örn), fair (fêir) and made student familiar with pronunciation with these words through repetition. With this study, it is observed that student’s motivation has been increased and with these indications, student’s mistakes are minimized.

Keywords: pronunciation, Demirel college, motivations, Turkish as a foreign language

Procedia PDF Downloads 251
1108 A Lexicographic Approach to Obstacles Identified in the Ontological Representation of the Tree of Life

Authors: Sandra Young

Abstract:

The biodiversity literature is vast and heterogeneous. In today’s data age, numbers of data integration and standardisation initiatives aim to facilitate simultaneous access to all the literature across biodiversity domains for research and forecasting purposes. Ontologies are being used increasingly to organise this information, but the rationalisation intrinsic to ontologies can hit obstacles when faced with the intrinsic fluidity and inconsistency found in the domains comprising biodiversity. Essentially the problem is a conceptual one: biological taxonomies are formed on the basis of specific, physical specimens yet nomenclatural rules are used to provide labels to describe these physical objects. These labels are ambiguous representations of the physical specimen. An example of this is with the genus Melpomene, the scientific nomenclatural representation of a genus of ferns, but also for a genus of spiders. The physical specimens for each of these are vastly different, but they have been assigned the same nomenclatural reference. While there is much research into the conceptual stability of the taxonomic concept versus the nomenclature used, to the best of our knowledge as yet no research has looked empirically at the literature to see the conceptual plurality or singularity of the use of these species’ names, the linguistic representation of a physical entity. Language itself uses words as symbols to represent real world concepts, whether physical entities or otherwise, and as such lexicography has a well-founded history in the conceptual mapping of words in context for dictionary making. This makes it an ideal candidate to explore this problem. The lexicographic approach uses corpus-based analysis to look at word use in context, with a specific focus on collocated word frequencies (the frequencies of words used in specific grammatical and collocational contexts). It allows for inconsistencies and contradictions in the source data and in fact includes these in the word characterisation so that 100% of the available evidence is counted. Corpus analysis is indeed suggested as one of the ways to identify concepts for ontology building, because of its ability to look empirically at data and show patterns in language usage, which can indicate conceptual ideas which go beyond words themselves. In this sense it could potentially be used to identify if the hierarchical structures present within the empirical body of literature match those which have been identified in ontologies created to represent them. The first stages of this research have revealed a hierarchical structure that becomes apparent in the biodiversity literature when annotating scientific species’ names, common names and more general names as classes, which will be the focus of this paper. The next step in the research is focusing on a larger corpus in which specific words can be analysed and then compared with existing ontological structures looking at the same material, to evaluate the methods by means of an alternative perspective. This research aims to provide evidence as to the validity of the current methods in knowledge representation for biological entities, and also shed light on the way that scientific nomenclature is used within the literature.

Keywords: ontology, biodiversity, lexicography, knowledge representation, corpus linguistics

Procedia PDF Downloads 137
1107 Polarity Classification of Social Media Comments in Turkish

Authors: Migena Ceyhan, Zeynep Orhan, Dimitrios Karras

Abstract:

People in modern societies are continuously sharing their experiences, emotions, and thoughts in different areas of life. The information reaches almost everyone in real-time and can have an important impact in shaping people’s way of living. This phenomenon is very well recognized and advantageously used by the market representatives, trying to earn the most from this means. Given the abundance of information, people and organizations are looking for efficient tools that filter the countless data into important information, ready to analyze. This paper is a modest contribution in this field, describing the process of automatically classifying social media comments in the Turkish language into positive or negative. Once data is gathered and preprocessed, feature sets of selected single words or groups of words are build according to the characteristics of language used in the texts. These features are used later to train, and test a system according to different machine learning algorithms (Naïve Bayes, Sequential Minimal Optimization, J48, and Bayesian Linear Regression). The resultant high accuracies can be important feedback for decision-makers to improve the business strategies accordingly.

Keywords: feature selection, machine learning, natural language processing, sentiment analysis, social media reviews

Procedia PDF Downloads 146
1106 Probing Syntax Information in Word Representations with Deep Metric Learning

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, with the development of large-scale pre-trained lan-guage models, building vector representations of text through deep neural network models has become a standard practice for natural language processing tasks. From the performance on downstream tasks, we can know that the text representation constructed by these models contains linguistic information, but its encoding mode and extent are unclear. In this work, a structural probe is proposed to detect whether the vector representation produced by a deep neural network is embedded with a syntax tree. The probe is trained with the deep metric learning method, so that the distance between word vectors in the metric space it defines encodes the distance of words on the syntax tree, and the norm of word vectors encodes the depth of words on the syntax tree. The experiment results on ELMo and BERT show that the syntax tree is encoded in their parameters and the word representations they produce.

Keywords: deep metric learning, syntax tree probing, natural language processing, word representations

Procedia PDF Downloads 67
1105 Comparative Analysis of Patent Protection between Health System and Enterprises in Shanghai, China

Authors: Na Li, Yunwei Zhang, Yuhong Niu

Abstract:

The study discussed the patent protections of health system and enterprises in Shanghai. The comparisons of technical distribution and scopes of patent protections between Shanghai health system and enterprises were used by the methods of IPC classification, co-words analysis and visual social network. Results reflected a decreasing order within IPC A61 area, namely A61B, A61K, A61M, and A61F. A61B required to be further investigated. The highest authorized patents A61B17 of A61B of IPC A61 area was found. Within A61B17, fracture fixation, ligament reconstruction, cardiac surgery, and biopsy detection were regarded as common concerned fields by Shanghai health system and enterprises. However, compared with cardiac closure which Shanghai enterprises paid attention to, Shanghai health system was more inclined to blockages and hemostatic tools. The results also revealed that the scopes of patent protections of Shanghai enterprises were relatively centralized. Shanghai enterprises had a series of comprehensive strategies for protecting core patents. In contrast, Shanghai health system was considered to be lack of strategic patent protections for core patents.

Keywords: co-words analysis, IPC classification, patent protection, technical distribution

Procedia PDF Downloads 134
1104 Exploring the Neural Mechanisms of Communication and Cooperation in Children and Adults

Authors: Sara Mosteller, Larissa K. Samuelson, Sobanawartiny Wijeakumar, John P. Spencer

Abstract:

This study was designed to examine how humans are able to teach and learn semantic information as well as cooperate in order to jointly achieve sophisticated goals. Specifically, we are measuring individual differences in how these abilities develop from foundational building blocks in early childhood. The current study adopts a paradigm for novel noun learning developed by Samuelson, Smith, Perry, and Spencer (2011) to a hyperscanning paradigm [Cui, Bryant and Reiss, 2012]. This project measures coordinated brain activity between a parent and child using simultaneous functional near infrared spectroscopy (fNIRS) in pairs of 2.5, 3.5 and 4.5-year-old children and their parents. We are also separately testing pairs of adult friends. Children and parents, or adult friends, are seated across from one another at a table. The parent (in the developmental study) then teaches their child the names of novel toys. An experimenter then tests the child by presenting the objects in pairs and asking the child to retrieve one object by name. Children are asked to choose from both pairs of familiar objects and pairs of novel objects. In order to explore individual differences in cooperation with the same participants, each dyad plays a cooperative game of Jenga, in which their joint score is based on how many blocks they can remove from the tower as a team. A preliminary analysis of the noun-learning task showed that, when presented with 6 word-object mappings, children learned an average of 3 new words (50%) and that the number of objects learned by each child ranged from 2-4. Adults initially learned all of the new words but were variable in their later retention of the mappings, which ranged from 50-100%. We are currently examining differences in cooperative behavior during the Jenga playing game, including time spent discussing each move before it is made. Ongoing analyses are examining the social dynamics that might underlie the differences between words that were successfully learned and unlearned words for each dyad, as well as the developmental differences observed in the study. Additionally, the Jenga game is being used to better understand individual and developmental differences in social coordination during a cooperative task. At a behavioral level, the analysis maps periods of joint visual attention between participants during the word learning and the Jenga game, using head-mounted eye trackers to assess each participant’s first-person viewpoint during the session. We are also analyzing the coherence in brain activity between participants during novel word-learning and Jenga playing. The first hypothesis is that visual joint attention during the session will be positively correlated with both the number of words learned and with the number of blocks moved during Jenga before the tower falls. The next hypothesis is that successful communication of new words and success in the game will each be positively correlated with synchronized brain activity between the parent and child/the adult friends in cortical regions underlying social cognition, semantic processing, and visual processing. This study probes both the neural and behavioral mechanisms of learning and cooperation in a naturalistic, interactive and developmental context.

Keywords: communication, cooperation, development, interaction, neuroscience

Procedia PDF Downloads 252
1103 Investigating the Associative Network of Color Terms among Turkish University Students: A Cognitive-Based Study

Authors: R. Güçlü, E. Küçüksakarya

Abstract:

Word association (WA) gives the broadest information on how knowledge is structured in the human mind. Cognitive linguistics, psycholinguistics, and applied linguistics are the disciplines that consider WA tests as substantial in gaining insights into the very nature of the human cognitive system and semantic knowledge. In this study, Berlin and Kay’s basic 11 color terms (1969) are presented as the stimuli words to a total number of 300 Turkish university students. The responses are analyzed according to Fitzpatrick’s model (2007), including four categories, namely meaning-based responses, position-based responses, form-based responses, and erratic responses. In line with the findings, the responses to free association tests are expected to give much information about Turkish university students’ psychological structuring of vocabulary, especially morpho-syntactic and semantic relationships among words. To conclude, theoretical and practical implications are discussed to make an in-depth evaluation of how associations of basic color terms are represented in the mental lexicon of Turkish university students.

Keywords: color term, gender, mental lexicon, word association task

Procedia PDF Downloads 131
1102 Electrophysiological Correlates of Statistical Learning in Children with and without Developmental Language Disorder

Authors: Ana Paula Soares, Alexandrina Lages, Helena Oliveira, Francisco-Javier Gutiérrez-Domínguez, Marisa Lousada

Abstract:

From an early age, exposure to a spoken language allows us to implicitly capture the structure underlying the succession of the speech sounds in that language and to segment it into meaningful units (words). Statistical learning (SL), i.e., the ability to pick up patterns in the sensory environment even without intention or consciousness of doing it, is thus assumed to play a central role in the acquisition of the rule-governed aspects of language and possibly to lie behind the language difficulties exhibited by children with development language disorder (DLD). The research conducted so far has, however, led to inconsistent results, which might stem from the behavioral tasks used to test SL. In a classic SL experiment, participants are first exposed to a continuous stream (e.g., syllables) in which, unbeknownst to the participants, stimuli are grouped into triplets that always appear together in the stream (e.g., ‘tokibu’, ‘tipolu’), with no pauses between each other (e.g., ‘tokibutipolugopilatokibu’) and without any information regarding the task or the stimuli. Following exposure, SL is assessed by asking participants to discriminate between triplets previously presented (‘tokibu’) from new sequences never presented together during exposure (‘kipopi’), i.e., to perform a two-alternative-forced-choice (2-AFC) task. Despite the widespread use of the 2-AFC to test SL, it has come under increasing criticism as it is an offline post-learning task that only assesses the result of the learning that had occurred during the previous exposure phase and that might be affected by other factors beyond the computation of regularities embedded in the input, typically the likelihood two syllables occurring together, a statistic known as transitional probability (TP). One solution to overcome these limitations is to assess SL as exposure to the stream unfolds using online techniques such as event-related potentials (ERP) that is highly sensitive to the time-course of the learning in the brain. Here we collected ERPs to examine the neurofunctional correlates of SL in preschool children with DLD, and chronological-age typical language development (TLD) controls who were exposed to an auditory stream in which eight three-syllable nonsense words, four of which presenting high-TPs and the other four low-TPs, to further analyze whether the ability of DLD and TLD children to extract-word-like units from the steam was modulated by words’ predictability. Moreover, to ascertain if the previous knowledge of the to-be-learned-regularities affected the neural responses to high- and low-TP words, children performed the auditory SL task, firstly, under implicit, and, subsequently, under explicit conditions. Although behavioral evidence of SL was not obtained in either group, the neural responses elicited during the exposure phases of the SL tasks differentiated children with DLD from children with TLD. Specifically, the results indicated that only children from the TDL group showed neural evidence of SL, particularly in the SL task performed under explicit conditions, firstly, for the low-TP, and, subsequently, for the high-TP ‘words’. Taken together, these findings support the view that children with DLD showed deficits in the extraction of the regularities embedded in the auditory input which might underlie the language difficulties.

Keywords: development language disorder, statistical learning, transitional probabilities, word segmentation

Procedia PDF Downloads 188
1101 A Study of the Use of English by Thai: A Case Study of English in Thai songs

Authors: Jutharat Nawarungreung

Abstract:

As an international language, English is used as a medium in formal and informal settings including all kinds of entertainment. As it were, the use of English in such an arena is of no less importance and interest, and indeed it becomes a valuable tool for EFL learners to learn and improve their language. In addition, it is a social perspective in the way that English is incorporated in other nationalities’ music, as well as the attitudes of listeners toward it. This research principally aimed to find out the level of comprehensibility of English inserted in Thai pop music. There were three groups of participants, namely Thais, non-native speakers who are non-Thai and native speakers, 35 each group. The research tools comprised song lyrics, interviews, questionnaires, and video recorder. The participants listened to Thai songs and wrote down the English words and their meanings they heard. They were video-recorded when listening to the songs, and then asked on particular actions and facial expressions. Afterwards, they were interviewed to account for their attitudes toward the incorporation of English into Thai songs. Finally, the participants completed a questionnaire. Data was analysed by the way of comparison of all the participants’ pronunciation. In doing so, the number of correct and incorrect answers was revealed. The study has shown that those who attained the highest level of understanding the English words in Thai music were Thais, native speakers, and non-native speakers who are non-Thai respectively.

Keywords: English throughout the world, varieties of English, English in Thai songs, intelligibility, attitudes

Procedia PDF Downloads 354
1100 Problems of Translating Technical Terms from English into Arabic

Authors: Nisreen Naji Al-Khawaldeh, Lara Ahmad Mansour El-Awar

Abstract:

The present study investigated the strategies MA translation students used for translating technical terms, the most common obstacles they encountered in translating such terms, and the motives behind using such terms as they are in their original form despite their translatability into Arabic. To achieve these objectives, a translation test was administered to 100 MA students specialising in translation at both Hashemite University and The University of Jordan. It consisted of two parts: (a) 50 English technical terms to be translated (b) two questions to be answered concerning the challenges or problems encountered while translating the previous technical terms and the motives that drive them to use most of the English technical terms as they are despite their translatability into Arabic. The analysis of the results revealed that MA translation students faced problems in translating technical terms, namely the inability to find the equivalent form for the given technical terms, the use of literal translation, and the wider use of loan-words type. Besides, the students used different strategies to translate the technical terms, namely borrowing (i.e., loan- words), paraphrasing, synonymy, naturalization, equivalence, and literal translation. Moreover, it was also revealed that most technical terms were used as they are in the source language despite their translatability into Arabic because these technical terms are easier to use in English rather than in Arabic. Also, when these terms were introduced to the Arab world, they were introduced in English, not in Arabic. So, the brain links these objects to their English terms.

Keywords: arabic, english, technical terms, translation strategies, translation problems

Procedia PDF Downloads 280
1099 Content Based Video Retrieval System Using Principal Object Analysis

Authors: Van Thinh Bui, Anh Tuan Tran, Quoc Viet Ngo, The Bao Pham

Abstract:

Video retrieval is a searching problem on videos or clips based on content in which they are relatively close to an input image or video. The application of this retrieval consists of selecting video in a folder or recognizing a human in security camera. However, some recent approaches have been in challenging problem due to the diversity of video types, frame transitions and camera positions. Besides, that an appropriate measures is selected for the problem is a question. In order to overcome all obstacles, we propose a content-based video retrieval system in some main steps resulting in a good performance. From a main video, we process extracting keyframes and principal objects using Segmentation of Aggregating Superpixels (SAS) algorithm. After that, Speeded Up Robust Features (SURF) are selected from those principal objects. Then, the model “Bag-of-words” in accompanied by SVM classification are applied to obtain the retrieval result. Our system is performed on over 300 videos in diversity from music, history, movie, sports, and natural scene to TV program show. The performance is evaluated in promising comparison to the other approaches.

Keywords: video retrieval, principal objects, keyframe, segmentation of aggregating superpixels, speeded up robust features, bag-of-words, SVM

Procedia PDF Downloads 301
1098 Traditional Terms, Spaces, Forms and Artifacts in Cultural Semiotics of Southwest Nigeria

Authors: Ajibade Adeyemo

Abstract:

The paper examined local terms used for spaces, forms and building practices in southwest Nigeria as cultural semiotics. Housing has more cultural meaning than mere shelter as shown in building terms such as ‘roof over my head’. The study is significant in the study area because its people were traditionally orally centered until ‘culture contact’ led to graphical presentation and appreciation in the form of drawings which is a modern language of architecture. This semiotic study will facilitate the understanding of the wholesomeness of traditional building practices and thoughts. This is in the culture of the traditional multi-sensory appreciation of architecture, urban design and the arts. It will analyze traditional aphoristic words and terms which are like proverbs which are significant in language because of their metaphorical essence. Many of such terms in the dominant Yoruba language of the study area are oftentimes phenomenal reducing universal terms like the earth and heaven to the simple module of housing. These words could be worth investigating because they are symbolic serve as codes which are cultural tool of regional ethnic significance. Sassure’s and Pierce’s concepts of Semiotics in line with Eco’s concept of semiotics of metaphor shall be deployed.

Keywords: traditional terms, spaces, forms, artifacts, cultural semiotics, southwest

Procedia PDF Downloads 275