Search results for: bilingual semantic processing
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4125

Search results for: bilingual semantic processing

4005 Online Topic Model for Broadcasting Contents Using Semantic Correlation Information

Authors: Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script which is a series of texts including directions and dialogues. The other is blogposts which possesses relatively abstracted contents, stories and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. In order to improve the quality of topics, it needs a method to consider the word difference. In this paper, we introduce a semantic vocabulary expansion method to solve the word difference. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can learn more salient topics for broadcasting contents.

Keywords: broadcasting script analysis, topic expansion, semantic correlation analysis, word2vec

Procedia PDF Downloads 230
4004 A Chinese Nested Named Entity Recognition Model Based on Lexical Features

Authors: Shuo Liu, Dan Liu

Abstract:

In the field of named entity recognition, most of the research has been conducted around simple entities. However, for nested named entities, which still contain entities within entities, it has been difficult to identify them accurately due to their boundary ambiguity. In this paper, a hierarchical recognition model is constructed based on the grammatical structure and semantic features of Chinese text for boundary calculation based on lexical features. The analysis is carried out at different levels in terms of granularity, semantics, and lexicality, respectively, avoiding repetitive work to reduce computational effort and using the semantic features of words to calculate the boundaries of entities to improve the accuracy of the recognition work. The results of the experiments carried out on web-based microblogging data show that the model achieves an accuracy of 86.33% and an F1 value of 89.27% in recognizing nested named entities, making up for the shortcomings of some previous recognition models and improving the efficiency of recognition of nested named entities.

Keywords: coarse-grained, nested named entity, Chinese natural language processing, word embedding, T-SNE dimensionality reduction algorithm

Procedia PDF Downloads 97
4003 Readability Facing the Irreducible Otherness: Translation as a Third Dimension toward a Multilingual Higher Education

Authors: Noury Bakrim

Abstract:

From the point of view of language morphodynamics, interpretative Readability of the text-result (the stasis) is not the external hermeneutics of its various potential reading events but the paradigmatic, semantic immanence of its dynamics. In other words, interpretative Readability articulates the potential tension between projection (intentionality of the discursive event) and the result (Readability within the syntagmatic stasis). We then consider that translation represents much more a metalinguistic conversion of neurocognitive bilingual sub-routines and modular relations than a semantic equivalence. Furthermore, the actualizing Readability (the process of rewriting a target text within a target language/genre) builds upon the descriptive level between the generative syntax/semantic from and its paradigmatic potential translatability. Translation corpora reveal the evidence of a certain focusing on the positivist stasis of the source text at the expense of its interpretative Readability. For instance, Fluchere's brilliant translation of Miller's Tropic of cancer into French realizes unconsciously an inversion of the hierarchical relations between Life Thought and Fable: From Life Thought (fable) into Fable (Life Thought). We could regard the translation of Bernard Kreiss basing on Canetti's work die englischen Jahre (les annees anglaises) as another inversion of the historical scale from individual history into Hegelian history. In order to describe and test both translation process and result, we focus on the pedagogical practice which enables various principles grounding in interpretative/actualizing Readability. Henceforth, establishing the analytical uttering dynamics of the source text could be widened by other practices. The reversibility test (target - source text) or the comparison with a second translation in a third language (tertium comparationis A/B and A/C) point out the evidence of an impossible event. Therefore, it doesn't imply an uttering idealistic/absolute source but the irreducible/non-reproducible intentionality of its production event within the experience of world/discourse. The aim of this paper is to conceptualize translation as the tension between interpretative and actualizing Readability in a new approach grounding in morphodynamics of language and Translatability (mainly into French) within literary and non-literary texts articulating theoretical and described pedagogical corpora.

Keywords: readability, translation as deverbalization, translation as conversion, Tertium Comparationis, uttering actualization, translation pedagogy

Procedia PDF Downloads 138
4002 A Bilingual Didactic Sequence about Biological Control to Develop the Scientific Literacy on High School Students

Authors: André Melo Franco Lorena De Barros, Elida Geralda Campos

Abstract:

The bilingual education has just started in Brazils public schools. This paper is a didactic sequence of biology bilingual lessons about biologic control in the Brazilian Savana. This sequence has been applied in the first year of a bilingual education program in the only public English and Portuguese bilingual high school in Brazil. The aim of this work is to develop and apply a didactic sequence capable of developing the scientific literacy through the bilingual education associated with Problem Based Learning. This didactic sequence was applied in a class of 30 students. It was divided in three lessons. In the first lesson the students were divided in groups and received a fiction Letter from a mayor explaining the problem and asking students for help. The organic soy plantation of the mayor’s is been attacked by caterpillars. The students read the text then raised hypothesis of how they could solve the problem. In the second lesson the students searched online to verify if theirs hypothesis were correct and to find answers for the question proposed. In the third lesson the groups got together and discussed about their results and wrote a final essay with the answers for the problem proposed. The tools used to acquire information about the didactic sequence were: researcher’s diary, survey, interview and essay developed by the students. Most of the initial hypothesis couldn’t answer the problem properly. By the second lesson most of the students could answer properly. During the third lesson all the groups figured out suitable answers. The forms of biological control, birds habits and transgenic were deeply studied by the students. This methodology was successful for developing the scientific literacy with most of the students and also concluded that the quality of learning is directly associated with the effort of each student during the process. [ARAÚJO, Denise Lino de. O que é (e como se faz) sequência didática. Entrepalavras, Fortaleza, v. 3, n. 3, p.322-334, jul. 2013.] [FRANCO, Aline Aparecida et al. Preferência alimentar de Anticarsia gemmatalis Hübner (Lepidoptera: Noctuidae) por cultivares de soja. Científica: Revista de Ciências Agrárias, Jaboticabal, v. 1, n. 42, p.32-38, 29 jan. 2014.] [RIBEIRO, Luis Roberto de Camargo. Aprendizagem baseada em problemas (PBL): Uma experiência no ensino superior. São Carlos: Editora da Universidade Federal de São Carlos Ribeiro, 2008. 151 p.] [TRIVELATO, Sílvia L. Frateschi; TONIDANDEL, Sandra M. Rudella. Ensino Por Investigação: Eixos Organizadores Para Sequências De Ensino De Biologia. Ensaio Pesquisa em Educação em Ciências, Belo Horizonte, v. 17, n. especial, p.97-114, nov. 2015.].

Keywords: Bilingual Education, Environmental Education, Problem Based Learning, Science education

Procedia PDF Downloads 147
4001 Graph-Based Semantical Extractive Text Analysis

Authors: Mina Samizadeh

Abstract:

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.

Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis

Procedia PDF Downloads 45
4000 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse

Authors: Sheena Christabel Pravin, M. Palanivelan

Abstract:

Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.

Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies

Procedia PDF Downloads 188
3999 Code Switching and Language Attitudes of Two 10-11 Years Old Bilingual Child

Authors: Kristiina Teiss

Abstract:

Estonians and children having Estonian as a one of their languages have lately become the fastest growing minority or bilingual group in Finland which underlines the importance of studying this target group. The acquisition of bilingualism by an infant is affected by many different issues like the child’s personal traits, language differences, and different environmental factors such as people´s attitudes towards languages and bilingualism. In the early years the most important factor is the children’s interaction with their parents and siblings. This poster gives an overview to the material and some preliminary findings of ongoing PhD study concerning code-mixing, code-switching and language attitudes of two bilingual 10-11 year old children. Data was collected from two different bilingual families, one of them living in Tampere, Finland and one of them moved during the study to Tallinn, Estonia. The data includes audio recordings of the families’ interactions with their children when they were aged 2-3 years old and then when they were 10-11 years old. The data also includes recorded semi-structured queries of the parents, as well as recorded semi-structured queries of the children when they were in the age of 10-11 years. The features of code-mixing can vary depending on norms or models in the families, or even according to its use by two parents in same family. The practices studied in the ongoing longitudinal case study, based on a framework of ethnography, contain parental conversational strategies and family attitudes as well as CS (code-switching and code-mixing) cases occurring both in children and adult language. The aim of this paper is to find out whether there is a connection between children’s attitudes and their daily language use. It would be also interesting to find some evidence, as to whether living in different countries has different impacts on using two languages. The results of dissertation maid give some directional suggestions on how language maintenance of Estonian-Finnish bilinguals could be supported, although generalizations on the base of case study could not be done.

Keywords: code switching, Estonian, Finnish, language attitudes

Procedia PDF Downloads 341
3998 A Semantic and Concise Structure to Represent Human Actions

Authors: Tobias Strübing, Fatemeh Ziaeetabar

Abstract:

Humans usually manipulate objects with their hands. To represent these actions in a simple and understandable way, we need to use a semantic framework. For this purpose, the Semantic Event Chain (SEC) method has already been presented which is done by consideration of touching and non-touching relations between manipulated objects in a scene. This method was improved by a computational model, the so-called enriched Semantic Event Chain (eSEC), which incorporates the information of static (e.g. top, bottom) and dynamic spatial relations (e.g. moving apart, getting closer) between objects in an action scene. This leads to a better action prediction as well as the ability to distinguish between more actions. Each eSEC manipulation descriptor is a huge matrix with thirty rows and a massive set of the spatial relations between each pair of manipulated objects. The current eSEC framework has so far only been used in the category of manipulation actions, which eventually involve two hands. Here, we would like to extend this approach to a whole body action descriptor and make a conjoint activity representation structure. For this purpose, we need to do a statistical analysis to modify the current eSEC by summarizing while preserving its features, and introduce a new version called Enhanced eSEC or (e2SEC). This summarization can be done from two points of the view: 1) reducing the number of rows in an eSEC matrix, 2) shrinking the set of possible semantic spatial relations. To achieve these, we computed the importance of each matrix row in an statistical way, to see if it is possible to remove a particular one while all manipulations are still distinguishable from each other. On the other hand, we examined which semantic spatial relations can be merged without compromising the unity of the predefined manipulation actions. Therefore by performing the above analyses, we made the new e2SEC framework which has 20% fewer rows, 16.7% less static spatial and 11.1% less dynamic spatial relations. This simplification, while preserving the salient features of a semantic structure in representing actions, has a tremendous impact on the recognition and prediction of complex actions, as well as the interactions between humans and robots. It also creates a comprehensive platform to integrate with the body limbs descriptors and dramatically increases system performance, especially in complex real time applications such as human-robot interaction prediction.

Keywords: enriched semantic event chain, semantic action representation, spatial relations, statistical analysis

Procedia PDF Downloads 83
3997 Music Reading Expertise Facilitates Implicit Statistical Learning of Sentence Structures in a Novel Language: Evidence from Eye Movement Behavior

Authors: Sara T. K. Li, Belinda H. J. Chung, Jeffery C. N. Yip, Janet H. Hsiao

Abstract:

Music notation and text reading both involve statistical learning of music or linguistic structures. However, it remains unclear how music reading expertise influences text reading behavior. The present study examined this issue through an eye-tracking study. Chinese-English bilingual musicians and non-musicians read English sentences, Chinese sentences, musical phrases, and sentences in Tibetan, a language novel to the participants, with their eye movement recorded. Each set of stimuli consisted of two conditions in terms of structural regularity: syntactically correct and syntactically incorrect musical phrases/sentences. They then completed a sentence comprehension (for syntactically correct sentences) or a musical segment/word recognition task afterwards to test their comprehension/recognition abilities. The results showed that in reading musical phrases, as compared with non-musicians, musicians had a higher accuracy in the recognition task, and had shorter reading time, fewer fixations, and shorter fixation duration when reading syntactically correct (i.e., in diatonic key) than incorrect (i.e., in non-diatonic key/atonal) musical phrases. This result reflects their expertise in music reading. Interestingly, in reading Tibetan sentences, which was novel to both participant groups, while non-musicians did not show any behavior differences between reading syntactically correct or incorrect Tibetan sentences, musicians showed a shorter reading time and had marginally fewer fixations when reading syntactically correct sentences than syntactically incorrect ones. However, none of the musicians reported discovering any structural regularities in the Tibetan stimuli after the experiment when being asked explicitly, suggesting that they may have implicitly acquired the structural regularities in Tibetan sentences. This group difference was not observed when they read English or Chinese sentences. This result suggests that music reading expertise facilities reading texts in a novel language (i.e., Tibetan), but not in languages that the readers are already familiar with (i.e., English and Chinese). This phenomenon may be due to the similarities between reading music notations and reading texts in a novel language, as in both cases the stimuli follow particular statistical structures but do not involve semantic or lexical processing. Thus, musicians may transfer their statistical learning skills stemmed from music notation reading experience to implicitly discover structures of sentences in a novel language. This speculation is consistent with a recent finding showing that music reading expertise modulates the processing of English nonwords (i.e., words that do not follow morphological or orthographic rules) but not pseudo- or real words. These results suggest that the modulation of music reading expertise on language processing depends on the similarities in the cognitive processes involved. It also has important implications for the benefits of music education on language and cognitive development.

Keywords: eye movement behavior, eye-tracking, music reading expertise, sentence reading, structural regularity, visual processing

Procedia PDF Downloads 351
3996 Evaluating 8D Reports Using Text-Mining

Authors: Benjamin Kuester, Bjoern Eilert, Malte Stonis, Ludger Overmeyer

Abstract:

Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.

Keywords: 8D report, complaint management, evaluation system, text-mining

Procedia PDF Downloads 276
3995 Television: A Tool for Learning English

Authors: Anirudha S. Joshi

Abstract:

The 21st century classroom is filled with a vibrant assortment of learners. In India the different socio-economic background with culturally diversified experiences need the English teacher of the teenage group to be more dynamic, innovative and competent. The boycott of conventional ways of teaching and the warm reception of modern approaches give place to the modern devices like Television. Instead of calling it an idiot? box why not a dynamic teacher utilize it for the purpose of developing the skills among the students? The teacher applies various strategies for the learners. One of them is selecting a particular popular T.V. program in the national language ‘Hindi’ and motivating the constructivist students to take part in the activities based on it. This bilingual method enables them to develop the speaking, writing and conversational skills in English in a very natural, informal and enthusiastic way.

Keywords: bilingual method, modern approaches, natural way, TV program

Procedia PDF Downloads 367
3994 Assessing the Structure of Non-Verbal Semantic Knowledge: The Evaluation and First Results of the Hungarian Semantic Association Test

Authors: Alinka Molnár-Tóth, Tímea Tánczos, Regina Barna, Katalin Jakab, Péter Klivényi

Abstract:

Supported by neuroscientific findings, the so-called Hub-and-Spoke model of the human semantic system is based on two subcomponents of semantic cognition, namely the semantic control process and semantic representation. Our semantic knowledge is multimodal in nature, as the knowledge system stored in relation to a conception is extensive and broad, while different aspects of the conception may be relevant depending on the purpose. The motivation of our research is to develop a new diagnostic measurement procedure based on the preservation of semantic representation, which is appropriate to the specificities of the Hungarian language and which can be used to compare the non-verbal semantic knowledge of healthy and aphasic persons. The development of the test will broaden the Hungarian clinical diagnostic toolkit, which will allow for more specific therapy planning. The sample of healthy persons (n=480) was determined by the last census data for the representativeness of the sample. Based on the concept of the Pyramids and Palm Tree Test, and according to the characteristics of the Hungarian language, we have elaborated a test based on different types of semantic information, in which the subjects are presented with three pictures: they have to choose the one that best fits the target word above from the two lower options, based on the semantic relation defined. We have measured 5 types of semantic knowledge representations: associative relations, taxonomy, motional representations, concrete as well as abstract verbs. As the first step in our data analysis, we examined the normal distribution of our results, and since it was not normally distributed (p < 0.05), we used nonparametric statistics further into the analysis. Using descriptive statistics, we could determine the frequency of the correct and incorrect responses, and with this knowledge, we could later adjust and remove the items of questionable reliability. The reliability was tested using Cronbach’s α, and it can be safely said that all the results were in an acceptable range of reliability (α = 0.6-0.8). We then tested for the potential gender differences using the Mann Whitney-U test, however, we found no difference between the two (p < 0.05). Likewise, we didn’t see that the age had any effect on the results using one-way ANOVA (p < 0.05), however, the level of education did influence the results (p > 0.05). The relationships between the subtests were observed by the nonparametric Spearman’s rho correlation matrix, showing statistically significant correlation between the subtests (p > 0.05), signifying a linear relationship between the measured semantic functions. A margin of error of 5% was used in all cases. The research will contribute to the expansion of the clinical diagnostic toolkit and will be relevant for the individualised therapeutic design of treatment procedures. The use of a non-verbal test procedure will allow an early assessment of the most severe language conditions, which is a priority in the differential diagnosis. The measurement of reaction time is expected to advance prodrome research, as the tests can be easily conducted in the subclinical phase.

Keywords: communication disorders, diagnostic toolkit, neurorehabilitation, semantic knowlegde

Procedia PDF Downloads 69
3993 Semantic Search Engine Based on Query Expansion with Google Ranking and Similarity Measures

Authors: Ahmad Shahin, Fadi Chakik, Walid Moudani

Abstract:

Our study is about elaborating a potential solution for a search engine that involves semantic technology to retrieve information and display it significantly. Semantic search engines are not used widely over the web as the majorities are still in Beta stage or under construction. Many problems face the current applications in semantic search, the major problem is to analyze and calculate the meaning of query in order to retrieve relevant information. Another problem is the ontology based index and its updates. Ranking results according to concept meaning and its relation with query is another challenge. In this paper, we are offering a light meta-engine (QESM) which uses Google search, and therefore Google’s index, with some adaptations to its returned results by adding multi-query expansion. The mission was to find a reliable ranking algorithm that involves semantics and uses concepts and meanings to rank results. At the beginning, the engine finds synonyms of each query term entered by the user based on a lexical database. Then, query expansion is applied to generate different semantically analogous sentences. These are generated randomly by combining the found synonyms and the original query terms. Our model suggests the use of semantic similarity measures between two sentences. Practically, we used this method to calculate semantic similarity between each query and the description of each page’s content generated by Google. The generated sentences are sent to Google engine one by one, and ranked again all together with the adapted ranking method (QESM). Finally, our system will place Google pages with higher similarities on the top of the results. We have conducted experimentations with 6 different queries. We have observed that most ranked results with QESM were altered with Google’s original generated pages. With our experimented queries, QESM generates frequently better accuracy than Google. In some worst cases, it behaves like Google.

Keywords: semantic search engine, Google indexing, query expansion, similarity measures

Procedia PDF Downloads 400
3992 An Ontology for Semantic Enrichment of RFID Systems

Authors: Haitham S. Hamza, Mohamed Maher, Shourok Alaa, Aya Khattab, Hadeal Ismail, Kamilia Hosny

Abstract:

Radio Frequency Identification (RFID) has become a key technology in the margining concept of Internet of Things (IoT). Naturally, business applications would require the deployment of various RFID systems that are developed by different vendors and use various data formats. This heterogeneity poses a real challenge in developing large-scale IoT systems with RFID as integration is becoming very complex and challenging. Semantic integration is a key approach to deal with this challenge. To do so, ontology for RFID systems need to be developed in order to annotated semantically RFID systems, and hence, facilitate their integration. Accordingly, in this paper, we propose ontology for RFID systems. The proposed ontology can be used to semantically enrich RFID systems, and hence, improve their usage and reasoning. The usage of the proposed ontology is explained through a simple scenario in the health care domain.

Keywords: RFID, semantic technology, ontology, sparql query language, heterogeneity

Procedia PDF Downloads 439
3991 Mapping of Research Productivity of Balochistan University Faculty: A Bibliometric Study of Pakistan Studies Bilingual/Bi-annual Pakistan Studies, English/Urdu Research Journal from 2015 to 2020

Authors: Muhammad Anwar

Abstract:

The prime objective of the study is to investigate the research productivity of the PAKISTAN STUDIES Bilingual / Bi-annual Pakistan Studies, English / Urdu Research Journal from 2015 to 2020. The present study also finds the frequency of publications, author contributions; paper length, references and most productive authors and degree of collaboration also have been checked. The current study finds 271 research articles have been contributed by faculty members of university of Balochistan, Quetta. The highest number of papers has been published 75(27.67%) in 2020 and 59(21.77%) papers were published in 2019. The current study finds the vol.10 and vol.11 were Contributed 36(13.28%) and 45(16.00%) research articles respectively. This present study recognizes those 179(66.05%) two authors and 62(22.87%) authors were counted in three. The results revealed the degree of collaboration was 0.97. The study further discloses the length of the paper where the majority of the 122(45.07%) papers were range of 11-15 and 73(26.93%) articles were range of 6-10. The utmost prolfic author was Dr.Noor Ahmed from the Pakistan study center with 15 papers ranked 1st and Dr.Kaleem Bareach contributed 14 articles ranked 2nd.

Keywords: research, bibliometric, bilingual, bi-annual, Pakistan, university, Balochistan

Procedia PDF Downloads 75
3990 Treating Voxels as Words: Word-to-Vector Methods for fMRI Meta-Analyses

Authors: Matthew Baucum

Abstract:

With the increasing popularity of fMRI as an experimental method, psychology and neuroscience can greatly benefit from advanced techniques for summarizing and synthesizing large amounts of data from brain imaging studies. One promising avenue is automated meta-analyses, in which natural language processing methods are used to identify the brain regions consistently associated with certain semantic concepts (e.g. “social”, “reward’) across large corpora of studies. This study builds on this approach by demonstrating how, in fMRI meta-analyses, individual voxels can be treated as vectors in a semantic space and evaluated for their “proximity” to terms of interest. In this technique, a low-dimensional semantic space is built from brain imaging study texts, allowing words in each text to be represented as vectors (where words that frequently appear together are near each other in the semantic space). Consequently, each voxel in a brain mask can be represented as a normalized vector sum of all of the words in the studies that showed activation in that voxel. The entire brain mask can then be visualized in terms of each voxel’s proximity to a given term of interest (e.g., “vision”, “decision making”) or collection of terms (e.g., “theory of mind”, “social”, “agent”), as measured by the cosine similarity between the voxel’s vector and the term vector (or the average of multiple term vectors). Analysis can also proceed in the opposite direction, allowing word cloud visualizations of the nearest semantic neighbors for a given brain region. This approach allows for continuous, fine-grained metrics of voxel-term associations, and relies on state-of-the-art “open vocabulary” methods that go beyond mere word-counts. An analysis of over 11,000 neuroimaging studies from an existing meta-analytic fMRI database demonstrates that this technique can be used to recover known neural bases for multiple psychological functions, suggesting this method’s utility for efficient, high-level meta-analyses of localized brain function. While automated text analytic methods are no replacement for deliberate, manual meta-analyses, they seem to show promise for the efficient aggregation of large bodies of scientific knowledge, at least on a relatively general level.

Keywords: FMRI, machine learning, meta-analysis, text analysis

Procedia PDF Downloads 420
3989 New Ways of Vocabulary Enlargement

Authors: S. Pesina, T. Solonchak

Abstract:

Lexical invariants, being a sort of stereotypes within the frames of ordinary consciousness, are created by the members of a language community as a result of uniform division of reality. The invariant meaning is formed in person’s mind gradually in the course of different actualizations of secondary meanings in various contexts. We understand lexical the invariant as abstract language essence containing a set of semantic components. In one of its configurations it is the basis or all or a number of the meanings making up the semantic structure of the word.

Keywords: lexical invariant, invariant theories, polysemantic word, cognitive linguistics

Procedia PDF Downloads 285
3988 Code Mixing and Code-Switching Patterns in Kannada-English Bilingual Children and Adults Who Stutter

Authors: Vasupradaa Manivannan, Santosh Maruthy

Abstract:

Background/Aims: Preliminary evidence suggests that code-switching and code-mixing may act as one of the voluntary coping behavior to avoid the stuttering characteristics in children and adults; however, less is known about the types and patterns of code-mixing (CM) and code-switching (CS). Further, it is not known how it is different between children to adults who stutter. This study aimed to identify and compare the CM and CS patterns between Kannada-English bilingual children and adults who stutter. Method: A standard group comparison was made between five children who stutter (CWS) in the age range of 9-13 years and five adults who stutter (AWS) in the age range of 20-25 years. The participants who are proficient in Kannada (first language- L1) and English (second language- L2) were considered for the study. There were two tasks given to both the groups, a) General conversation (GC) with 10 random questions, b) Narration task (NAR) (Story / General Topic, for example., A Memorable Life Event) in three different conditions {Mono Kannada (MK), Mono English (ME), and Bilingual (BIL) Condition}. The children and adults were assessed online (via Zoom session) with a high-quality internet connection. The audio and video samples of the full assessment session were auto-recorded and manually transcribed. The recorded samples were analyzed for the percentage of dysfluencies using SSI-4 and CM, and CS exhibited in each participant using Matrix Language Frame (MLF) model parameters. The obtained data were analyzed using the Statistical Package for the Social Sciences (SPSS) software package (Version 20.0). Results: The mean, median, and standard deviation values were obtained for the percentage of dysfluencies (%SS) and frequency of CM and CS in Kannada-English bilingual children and adults who stutter for various parameters obtained through the MLF model. The inferential results indicated that %SS significantly varied between population (AWS vs CWS), languages (L1 vs L2), and tasks (GC vs NAR) but not across free (BIL) and bound (MK, ME) conditions. It was also found that the frequency of CM and CS patterns varies between CWS and AWS. The AWS had a lesser %SS but greater use of CS patterns than CWS, which is due to their excessive coping skills. The language mixing patterns were more observed in L1 than L2, and it was significant in most of the MLF parameters. However, there was a significantly higher (P<0.05) %SS in L2 than L1. The CS and CS patterns were more in conditions 1 and 3 than 2, which may be due to the higher proficiency of L2 than L1. Conclusion: The findings highlight the importance of assessing the CM and CS behaviors, their patterns, and the frequency of CM and CS between CWS and AWS on MLF parameters in two different tasks across three conditions. The results help us to understand CM and CS strategies in bilingual persons who stutter.

Keywords: bilinguals, code mixing, code switching, stuttering

Procedia PDF Downloads 50
3987 Recurrent Neural Networks with Deep Hierarchical Mixed Structures for Chinese Document Classification

Authors: Zhaoxin Luo, Michael Zhu

Abstract:

In natural languages, there are always complex semantic hierarchies. Obtaining the feature representation based on these complex semantic hierarchies becomes the key to the success of the model. Several RNN models have recently been proposed to use latent indicators to obtain the hierarchical structure of documents. However, the model that only uses a single-layer latent indicator cannot achieve the true hierarchical structure of the language, especially a complex language like Chinese. In this paper, we propose a deep layered model that stacks arbitrarily many RNN layers equipped with latent indicators. After using EM and training it hierarchically, our model solves the computational problem of stacking RNN layers and makes it possible to stack arbitrarily many RNN layers. Our deep hierarchical model not only achieves comparable results to large pre-trained models on the Chinese short text classification problem but also achieves state of art results on the Chinese long text classification problem.

Keywords: nature language processing, recurrent neural network, hierarchical structure, document classification, Chinese

Procedia PDF Downloads 34
3986 Semantic Based Analysis in Complaint Management System with Analytics

Authors: Francis Alterado, Jennifer Enriquez

Abstract:

Semantic Based Analysis in Complaint Management System with Analytics is an enhanced tool of providing complaints by the clients as well as a mechanism for Palawan Polytechnic College to gather, process, and monitor status of these complaints. The study has a mobile application that serves as a remote facility of communication between the students and the school management on the issues encountered by the student and the solution of every complaint received. In processing the complaints, text mining and clustering algorithms were utilized. Every module of the systems was tested and based on the results; these are 100% free from error before integration was done. A system testing was also done by checking the expected functionality of the system which was 100% functional. The system was tested by 10 students by forwarding complaints to 10 departments. Based on results, the students were able to submit complaints, the system was able to process accordingly by identifying to which department the complaints are intended, and the concerned department was able to give feedback on the complaint received to the student. With this, the system gained 4.7 rating which means Excellent.

Keywords: technology adoption, emerging technology, issues challenges, algorithm, text mining, mobile technology

Procedia PDF Downloads 171
3985 A Methodology to Integrate Data in the Company Based on the Semantic Standard in the Context of Industry 4.0

Authors: Chang Qin, Daham Mustafa, Abderrahmane Khiat, Pierre Bienert, Paulo Zanini

Abstract:

Nowadays, companies are facing lots of challenges in the process of digital transformation, which can be a complex and costly undertaking. Digital transformation involves the collection and analysis of large amounts of data, which can create challenges around data management and governance. Furthermore, it is also challenged to integrate data from multiple systems and technologies. Although with these pains, companies are still pursuing digitalization because by embracing advanced technologies, companies can improve efficiency, quality, decision-making, and customer experience while also creating different business models and revenue streams. In this paper, the issue that data is stored in data silos with different schema and structures is focused. The conventional approaches to addressing this issue involve utilizing data warehousing, data integration tools, data standardization, and business intelligence tools. However, these approaches primarily focus on the grammar and structure of the data and neglect the importance of semantic modeling and semantic standardization, which are essential for achieving data interoperability. In this session, the challenge of data silos in Industry 4.0 is addressed by developing a semantic modeling approach compliant with Asset Administration Shell (AAS) models as an efficient standard for communication in Industry 4.0. The paper highlights how our approach can facilitate the data mapping process and semantic lifting according to existing industry standards such as ECLASS and other industrial dictionaries. It also incorporates the Asset Administration Shell technology to model and map the company’s data and utilize a knowledge graph for data storage and exploration.

Keywords: data interoperability in industry 4.0, digital integration, industrial dictionary, semantic modeling

Procedia PDF Downloads 63
3984 The Effect of the Vernacular on Code-Switching Hebrew into Palestinian Arabic

Authors: Ward Makhoul

Abstract:

Code-switching (CS) is known as a ubiquitous phenomenon in multilingual societies and countries. Vernacular Palestinian Arabic (PA) variety spoken in Israel is among these languages, informally used for day-to-day conversations only. Such conversations appear to contain code-switched instances from Hebrew, the formal and dominant language of the country, even in settings where the need for CS seems to be unnecessary. This study examines the CS practices in PA and investigates the reason behind these CS instances in controlled settings and the correlation between bilingual dominance and CS. In the production-task interviews and Bilingual Language Profile test (BLP), there was a correlation between language dominance and CS; 13 participants were interviewed to elicit and analyze natural speech-containing CS instances, along with undergoing a BLP test. The acceptability judgment task observed the limits and boundaries of different code-switched linguistic structures.

Keywords: code-switching, Hebrew, Palestinian-Arabic, vernacular

Procedia PDF Downloads 83
3983 Cross-Language Variation and the ‘Fused’ Zone in Bilingual Mental Lexicon: An Experimental Research

Authors: Yuliya E. Leshchenko, Tatyana S. Ostapenko

Abstract:

Language variation is a widespread linguistic phenomenon which can affect different levels of a language system: phonological, morphological, lexical, syntactic, etc. It is obvious that the scope of possible standard alternations within a particular language is limited by a variety of its norms and regulations which set more or less clear boundaries for what is possible and what is not possible for the speakers. The possibility of lexical variation (alternate usage of lexical items within the same contexts) is based on the fact that the meanings of words are not clearly and rigidly defined in the consciousness of the speakers. Therefore, lexical variation is usually connected with unstable relationship between words and their referents: a case when a particular lexical item refers to different types of referents, or when a particular referent can be named by various lexical items. We assume that the scope of lexical variation in bilingual speech is generally wider than that observed in monolingual speech due to the fact that, besides ‘lexical item – referent’ relations it involves the possibility of cross-language variation of L1 and L2 lexical items. We use the term ‘cross-language variation’ to denote a case when two equivalent words of different languages are treated by a bilingual speaker as freely interchangeable within the common linguistic context. As distinct from code-switching which is traditionally defined as the conscious use of more than one language within one communicative act, in case of cross-language lexical variation the speaker does not perceive the alternate lexical items as belonging to different languages and, therefore, does not realize the change of language code. In the paper, the authors present research of lexical variation of adult Komi-Permyak – Russian bilingual speakers. The two languages co-exist on the territory of the Komi-Permyak District in Russia (Komi-Permyak as the ethnic language and Russian as the official state language), are usually acquired from birth in natural linguistic environment and, according to the data of sociolinguistic surveys, are both identified by the speakers as coordinate mother tongues. The experimental research demonstrated that alternation of Komi-Permyak and Russian words within one utterance/phrase is highly frequent both in speech perception and production. Moreover, our participants estimated cross-language word combinations like ‘маленькая /Russian/ нывка /Komi-Permyak/’ (‘a little girl’) or ‘мунны /Komi-Permyak/ домой /Russian/’ (‘go home’) as regular/habitual, containing no violation of any linguistic rules and being equally possible in speech as the equivalent intra-language word combinations (‘учöтик нывка’ /Komi-Permyak/ or ‘идти домой’ /Russian/). All the facts considered, we claim that constant concurrent use of the two languages results in the fact that a large number of their words tend to be intuitively interpreted by the speakers as lexical variants not only related to the same referent, but also referring to both languages or, more precisely, to none of them in particular. Consequently, we can suppose that bilingual mental lexicon includes an extensive ‘fused’ zone of lexical representations that provide the basis for cross-language variation in bilingual speech.

Keywords: bilingualism, bilingual mental lexicon, code-switching, lexical variation

Procedia PDF Downloads 114
3982 Multilingualism and Unification of Teaching

Authors: Mehdi Damaliamiri, Firouzeh Akbari

Abstract:

Teaching literature to children at an early age is of great importance, and there have been different methods to facilitate learning literature. Based on the law, all children going to school in Iran should learn the Persian language and literature. This has been concomitant with two different levels of learning related to urban or rural bilingualism. For bilingual children living in the villages, learning literature and a new language (Persian) turns into a big challenge as it is done based on the translation the teacher does while in the city, it is easier as the confrontation of children with the Persian language is more. Over recent years, to change the trend of learning Persian by children speaking another language, the TV and radio programs have been considered to be effective, but the scores of the students in Persian language national exams show that these programs have not been so effective for the bilingual students living in the villages. To identify the determinants of weak learning of Persian by bilingual children, two different regions were chosen, Turkish-speaking and Kurdish-speaking communities, to compare their learning of Persian at the first and second levels of elementary school. The criteria of learning was based on the syllabification of Persian words, word order in the sentence, and compound sentences. Students were taught in Persian how to recognize syllabification without letting them translate the words in their own languages and were asked to produce simple sentences in Persian in response to situational questions. Teaching methods, language relatedness with Persian, and exposure to social media programs, especially TV and radio, were the factors that were considered to affect the potential of children in learning Persian.

Keywords: bilingualism, persian, education, Literature

Procedia PDF Downloads 47
3981 Social Media Idea Ontology: A Concept for Semantic Search of Product Ideas in Customer Knowledge through User-Centered Metrics and Natural Language Processing

Authors: Martin H¨ausl, Maximilian Auch, Johannes Forster, Peter Mandl, Alexander Schill

Abstract:

In order to survive on the market, companies must constantly develop improved and new products. These products are designed to serve the needs of their customers in the best possible way. The creation of new products is also called innovation and is primarily driven by a company’s internal research and development department. However, a new approach has been taking place for some years now, involving external knowledge in the innovation process. This approach is called open innovation and identifies customer knowledge as the most important source in the innovation process. This paper presents a concept of using social media posts as an external source to support the open innovation approach in its initial phase, the Ideation phase. For this purpose, the social media posts are semantically structured with the help of an ontology and the authors are evaluated using graph-theoretical metrics such as density. For the structuring and evaluation of relevant social media posts, we also use the findings of Natural Language Processing, e. g. Named Entity Recognition, specific dictionaries, Triple Tagger and Part-of-Speech-Tagger. The selection and evaluation of the tools used are discussed in this paper. Using our ontology and metrics to structure social media posts enables users to semantically search these posts for new product ideas and thus gain an improved insight into the external sources such as customer needs.

Keywords: idea ontology, innovation management, semantic search, open information extraction

Procedia PDF Downloads 164
3980 Reconstruction of Visual Stimuli Using Stable Diffusion with Text Conditioning

Authors: ShyamKrishna Kirithivasan, Shreyas Battula, Aditi Soori, Richa Ramesh, Ramamoorthy Srinath

Abstract:

The human brain, among the most complex and mysterious aspects of the body, harbors vast potential for extensive exploration. Unraveling these enigmas, especially within neural perception and cognition, delves into the realm of neural decoding. Harnessing advancements in generative AI, particularly in Visual Computing, seeks to elucidate how the brain comprehends visual stimuli observed by humans. The paper endeavors to reconstruct human-perceived visual stimuli using Functional Magnetic Resonance Imaging (fMRI). This fMRI data is then processed through pre-trained deep-learning models to recreate the stimuli. Introducing a new architecture named LatentNeuroNet, the aim is to achieve the utmost semantic fidelity in stimuli reconstruction. The approach employs a Latent Diffusion Model (LDM) - Stable Diffusion v1.5, emphasizing semantic accuracy and generating superior quality outputs. This addresses the limitations of prior methods, such as GANs, known for poor semantic performance and inherent instability. Text conditioning within the LDM's denoising process is handled by extracting text from the brain's ventral visual cortex region. This extracted text undergoes processing through a Bootstrapping Language-Image Pre-training (BLIP) encoder before it is injected into the denoising process. In conclusion, a successful architecture is developed that reconstructs the visual stimuli perceived and finally, this research provides us with enough evidence to identify the most influential regions of the brain responsible for cognition and perception.

Keywords: BLIP, fMRI, latent diffusion model, neural perception.

Procedia PDF Downloads 42
3979 Beliefs, Practices and Identity about Bilingualism: Korean-australian Immigrant Parents and Family Language Policies

Authors: Eun Kyong Park

Abstract:

This study explores the relationships between immigrant parents’ beliefs about bilingualism, family literacy practices, and their children’s identity development in Sydney, Australia. This project examines how these parents’ ideological beliefs and knowledge are related to their provision of family literacy practices and management of the environment for their bilingual children based on family language policy (FLP). This is a follow-up study of the author’s prior thesis that presented Korean immigrant mothers’ beliefs and decision-making in support of their children’s bilingualism. It includes fathers’ perspectives within the participating families as a whole by foregrounding their perceptions of bilingual and identity development. It adopts a qualitative approach with twelve immigrant mothers and fathers living in a Korean-Australian community whose child attends one of the communities Korean language programs. This time, it includes introspective and self-evocative auto-ethnographic data. The initial data set collected from the first part of this study demonstrated the mothers provided rich, diverse, and specific family literacy activities for their children. These mothers selected specific practices to facilitate their child’s bilingual development at home. The second part of data has been collected over a three month period: 1) a focus group interview with mothers; 2) a brief self-report of fathers; 3) the researcher’s reflective diary. To analyze these multiple data, thematic analysis and coding were used to reveal the parents’ ideologies surrounding bilingualism and bilingual identities. It will highlight the complexity of language and literacy practices in the family domain interrelated with sociocultural factors. This project makes an original contribution to the field of bilingualism and FLP and a methodological contribution by introducing auto-ethnographic input of this community’s lived practices. This project will empower Korean-Australian immigrant families and other multilingual communities to reflect their beliefs and practices for their emerging bilingual children. It will also enable educators and policymakers to access authentic information about how bilingualism is practiced within these immigrant families in multiple ways and to help build the culturally appropriate partnership between home and school community.

Keywords: bilingualism, beliefs, identity, family language policy, Korean immigrant parents in Australia

Procedia PDF Downloads 105
3978 Exploring Syntactic and Semantic Features for Text-Based Authorship Attribution

Authors: Haiyan Wu, Ying Liu, Shaoyun Shi

Abstract:

Authorship attribution is to extract features to identify authors of anonymous documents. Many previous works on authorship attribution focus on statistical style features (e.g., sentence/word length), content features (e.g., frequent words, n-grams). Modeling these features by regression or some transparent machine learning methods gives a portrait of the authors' writing style. But these methods do not capture the syntactic (e.g., dependency relationship) or semantic (e.g., topics) information. In recent years, some researchers model syntactic trees or latent semantic information by neural networks. However, few works take them together. Besides, predictions by neural networks are difficult to explain, which is vital in authorship attribution tasks. In this paper, we not only utilize the statistical style and content features but also take advantage of both syntactic and semantic features. Different from an end-to-end neural model, feature selection and prediction are two steps in our method. An attentive n-gram network is utilized to select useful features, and logistic regression is applied to give prediction and understandable representation of writing style. Experiments show that our extracted features can improve the state-of-the-art methods on three benchmark datasets.

Keywords: authorship attribution, attention mechanism, syntactic feature, feature extraction

Procedia PDF Downloads 103
3977 Integration of Immigrant Students into Local Education System

Authors: Suheyla Demi̇rkol Orak

Abstract:

The requirement of inclusive education is one of the utmost important results of both regular and irregular immigration. The matter in the case of Syrian immigrants is even worse than the other immigrants cases in world history since a massive immigration wave has affected all world countries' socio-economic profiles. When Syrians immigrated from Syria all over the world, they aimed to survive and left behind the war, but surviving is not optional occasion without handling language-related problems. Humans exist and preserve their existence with their language. That is a matter of concern for the integration of Syrians into the hosting countries. Many countries are proceeding with various programs to integrate Syrians into the majority groups by either assimilation or adaptation policies. Turkey has got the lion's share of the Syrian immigration apple, and in the same vein with this situation, its language education system should be analyzed severely in order to come up with a perfect match program for the integration of Syrians. It aimed to generate an inclusive education model for catalyzing the integration process of immigrant Syrian students into the majority socio-economic group via overcoming the language barrier. The identity of the immigrants is prioritized. The study follows a narrative literature review, which aims to review and critique relevant literature and offers a new conceptualization derived from the previous literature. The study derives a critical localized bilingual education model. As the outcome of the narrative literature review, a bilingual education model which prioritized the identity of the target community was designed. In the present study, main bilingual education programs and most of the countries' bilingual education policies were reviewed critically and suggestions were listed for the Syrian immigrants dominantly in Turkey and suggested to be benefitted by the other countries through localizing the practices.

Keywords: bi/multilingual education, sheltered education, immigrants, glocalization, submersion program, immersion program

Procedia PDF Downloads 47
3976 Reading and Writing of Biscriptal Children with and Without Reading Difficulties in Two Alphabetic Scripts

Authors: Baran Johansson

Abstract:

This PhD dissertation aimed to explore children’s writing and reading in L1 (Persian) and L2 (Swedish). It adds new perspectives to reading and writing studies of bilingual biscriptal children with and without reading and writing difficulties (RWD). The study used standardised tests to examine linguistic and cognitive skills related to word reading and writing fluency in both languages. Furthermore, all participants produced two texts (one descriptive and one narrative) in each language. The writing processes and the writing product of these children were explored using logging methodologies (Eye and Pen) for both languages. Furthermore, this study investigated how two bilingual children with RWD presented themselves through writing across their languages. To my knowledge, studies utilizing standardised tests and logging tools to investigate bilingual children’s word reading and writing fluency across two different alphabetic scripts are scarce. There have been few studies analysing how bilingual children construct meaning in their writing, and none have focused on children who write in two different alphabetic scripts or those with RWD. Therefore, some aspects of the systemic functional linguistics (SFL) perspective were employed to examine how two participants with RWD created meaning in their written texts in each language. The results revealed that children with and without RWD had higher writing fluency in all measures (e.g. text lengths, writing speed) in their L2 compared to their L1. Word reading abilities in both languages were found to influence their writing fluency. The findings also showed that bilingual children without reading difficulties performed 1 standard deviation below the mean when reading words in Persian. However, their reading performance in Swedish aligned with the expected age norms, suggesting greater efficient in reading Swedish than in Persian. Furthermore, the results showed that the level of orthographic depth, consistency between graphemes and phonemes, and orthographic features can probably explain these differences across languages. The analysis of meaning-making indicated that the participants with RWD exhibited varying levels of difficulty, which influenced their knowledge and usage of writing across languages. For example, the participant with poor word recognition (PWR) presented himself similarly across genres, irrespective of the language in which he wrote. He employed the listing technique similarly across his L1 and L2. However, the participant with mixed reading difficulties (MRD) had difficulties with both transcription and text production. He produced spelling errors and frequently paused in both languages. He also struggled with word retrieval and producing coherent texts, consistent with studies of monolingual children with poor comprehension or with developmental language disorder. The results suggest that the mother tongue instruction provided to the participants has not been sufficient for them to become balanced biscriptal readers and writers in both languages. Therefore, increasing the number of hours dedicated to mother tongue instruction and motivating the children to participate in these classes could be potential strategies to address this issue.

Keywords: reading, writing, reading and writing difficulties, bilingual children, biscriptal

Procedia PDF Downloads 39