Search results for: habeas corpus
137 A Greedy Alignment Algorithm Supporting Medication Reconciliation
Authors: David Tresner-Kirsch
Abstract:
Reconciling patient medication lists from multiple sources is a critical task supporting the safe delivery of patient care. Manual reconciliation is a time-consuming and error-prone process, and recently attempts have been made to develop efficiency- and safety-oriented automated support for professionals performing the task. An important capability of any such support system is automated alignment – finding which medications from a list correspond to which medications from a different source, regardless of misspellings, naming differences (e.g. brand name vs. generic), or changes in treatment (e.g. switching a patient from one antidepressant class to another). This work describes a new algorithmic solution to this alignment task, using a greedy matching approach based on string similarity, edit distances, concept extraction and normalization, and synonym search derived from the RxNorm nomenclature. The accuracy of this algorithm was evaluated against a gold-standard corpus of 681 medication records; this evaluation found that the algorithm predicted alignments with 99% precision and 91% recall. This performance is sufficient to support decision support applications for medication reconciliation.Keywords: clinical decision support, medication reconciliation, natural language processing, RxNorm
Procedia PDF Downloads 283136 Machine Learning-Based Workflow for the Analysis of Project Portfolio
Authors: Jean Marie Tshimula, Atsushi Togashi
Abstract:
We develop a data-science approach for providing an interactive visualization and predictive models to find insights into the projects' historical data in order for stakeholders understand some unseen opportunities in the African market that might escape them behind the online project portfolio of the African Development Bank. This machine learning-based web application identifies the market trend of the fastest growing economies across the continent as well skyrocketing sectors which have a significant impact on the future of business in Africa. Owing to this, the approach is tailored to predict where the investment needs are the most required. Moreover, we create a corpus that includes the descriptions of over more than 1,200 projects that approximately cover 14 sectors designed for some of 53 African countries. Then, we sift out this large amount of semi-structured data for extracting tiny details susceptible to contain some directions to follow. In the light of the foregoing, we have applied the combination of Latent Dirichlet Allocation and Random Forests at the level of the analysis module of our methodology to highlight the most relevant topics that investors may focus on for investing in Africa.Keywords: machine learning, topic modeling, natural language processing, big data
Procedia PDF Downloads 167135 Domain Adaptive Dense Retrieval with Query Generation
Authors: Rui Yin, Haojie Wang, Xun Li
Abstract:
Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then, the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. We also explore contrastive learning as a method for training domain-adapted dense retrievers and show that it leads to strong performance in various retrieval settings. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.Keywords: dense retrieval, query generation, contrastive learning, unsupervised training
Procedia PDF Downloads 101134 Recurrent Patterns of Netspeak among Selected Nigerians on WhatsApp Platform: A Quest for Standardisation
Authors: Lily Chimuanya, Esther Ajiboye, Emmanuel Uba
Abstract:
One of the consequences of online communication is the birth of new orthography genres characterised by novel conventions of abbreviation and acronyms usually referred to as Netspeak. Netspeak, also known as internet slang, is a style of writing mainly used in online communication to limit the length of text characters and to save time. The aim of this study is to evaluate how second language users of the English language have internalised this new convention of writing; identify the recurrent patterns of Netspeak; and assess the consistency of the use of the identified patterns in relation to their meanings. The study is corpus-based, and data drawn from WhatsApp chart pages of selected groups of Nigerian English speakers show a large occurrence of inconsistencies in the patterns of Netspeak and their meanings. The study argues that rather than emphasise the negative impact of Netspeak on the communicative competence of second language users, studies should focus on suggesting models as yardsticks for standardising the usage of Netspeak and indeed all other emerging language conventions resulting from online communication. This stance stems from the inevitable global language transformation that is eminent with the coming of age of information technology.Keywords: abbreviation, acronyms, Netspeak, online communication, standardisation
Procedia PDF Downloads 390133 Quantifying User-Related, System-Related, and Context-Related Patterns of Smartphone Use
Authors: Andrew T. Hendrickson, Liven De Marez, Marijn Martens, Gytha Muller, Tudor Paisa, Koen Ponnet, Catherine Schweizer, Megan Van Meer, Mariek Vanden Abeele
Abstract:
Quantifying and understanding the myriad ways people use their phones and how that impacts their relationships, cognitive abilities, mental health, and well-being is increasingly important in our phone-centric society. However, most studies on the patterns of phone use have focused on theory-driven tests of specific usage hypotheses using self-report questionnaires or analyses of smaller datasets. In this work we present a series of analyses from a large corpus of over 3000 users that combine data-driven and theory-driven analyses to identify reliable smartphone usage patterns and clusters of similar users. Furthermore, we compare the stability of user clusters across user- and system-initiated sessions, as well as during the hypothesized ritualized behavior times directly before and after sleeping. Our results indicate support for some hypothesized usage patterns but present a more complete and nuanced view of how people use smartphones.Keywords: data mining, experience sampling, smartphone usage, health and well being
Procedia PDF Downloads 162132 Exploring Goal Setting by Foreign Language Learners in Virtual Exchange
Authors: Suzi M. S. Cavalari, Tim Lewis
Abstract:
Teletandem is a bilingual model of virtual exchange in which two partners from different countries( and speak different languages) meet synchronously and regularly over a period of 8 weeks to learn each other’s mother tongue (or the language of proficiency). At São Paulo State University (UNESP), participants should answer a questionnaire before starting the exchanges in which one of the questions refers to setting a goal to be accomplished with the help of the teletandem partner. In this context, the present presentation aims to examine the goal-setting activity of 79 Brazilians who participated in Portuguese-English teletandem exchanges over a period of four years (2012-2015). The theoretical background is based on goal setting and self-regulated learning theories that propose that appropriate efficient goals are focused on the learning process (not on the product) and are specific, proximal (short-term) and moderately difficult. The data set used was 79 initial questionnaires retrieved from the MulTeC (Multimodal Teletandem Corpus). Results show that only approximately 10% of goals can be considered appropriate. Features of these goals are described in relation to specificities of the teletandem context. Based on the results, three mechanisms that can help learners to set attainable goals are discussed.Keywords: foreign language learning, goal setting, teletandem, virtual exchange
Procedia PDF Downloads 184131 An Eco-Translatology Approach to the Translation of Spanish Tourism Advertising in Digital Communication in Chinese
Authors: Mingshu Liu, Laura Santamaria, Xavier Carmaniu Mainadé
Abstract:
As one of the sectors most affected by the COVID-19 pandemic, tourism is facing challenges in revitalizing the industry. But at the same time, it would be a good opportunity to take advantage of digital communication as an effective tool for tourism promotion. Our proposal aims to verify the linguistic operations on online platforms in China. The research is carried out based on the theory of Eco-traductology put forward by Gengshen Hu, whose contribution focuses on the translator's adaptation to the ecosystem environment and the three elaborated parameters (linguistic, cultural and communicative). We also relate it to Even-Zohar's and Toury's theoretical postulates on the Polysystem to elaborate on interdisciplinary methodology. Such a methodology allows us to analyze personal treatments and phraseology in the target text. As for the corpus, we adopt the official Spanish-language website of Turismo de España as the source text and the postings on the two major social networks in China, Weibo and Wechat, in 2019. Through qualitative analysis, we conclude that, in the tourism advertising campaign on Chinese social networks, chengyu (Chinese phraseology) and honorific titles are used very frequently.Keywords: digital communication, eco-traductology, polysystem theory, tourism advertising
Procedia PDF Downloads 227130 Short Text Classification for Saudi Tweets
Authors: Asma A. Alsufyani, Maram A. Alharthi, Maha J. Althobaiti, Manal S. Alharthi, Huda Rizq
Abstract:
Twitter is one of the most popular microblogging sites that allows users to publish short text messages called 'tweets'. Increasing the number of accounts to follow (followings) increases the number of tweets that will be displayed from different topics in an unclassified manner in the timeline of the user. Therefore, it can be a vital solution for many Twitter users to have their tweets in a timeline classified into general categories to save the user’s time and to provide easy and quick access to tweets based on topics. In this paper, we developed a classifier for timeline tweets trained on a dataset consisting of 3600 tweets in total, which were collected from Saudi Twitter and annotated manually. We experimented with the well-known Bag-of-Words approach to text classification, and we used support vector machines (SVM) in the training process. The trained classifier performed well on a test dataset, with an average F1-measure equal to 92.3%. The classifier has been integrated into an application, which practically proved the classifier’s ability to classify timeline tweets of the user.Keywords: corpus creation, feature extraction, machine learning, short text classification, social media, support vector machine, Twitter
Procedia PDF Downloads 154129 Memory Types in Hemodialysis (HD) Patients; A Study Based on Hemodialysis Duration, Zahedan: South East of Iran
Authors: Behnoush Sabayan, Ali Alidadi, Saeid Ebarhimi, N. M. Bakhshani
Abstract:
Hemodialysis (HD) patients are at a high risk of atherosclerotic and vascular disease; also little information is available for the HD impact on brain structure of these patients. We studied the brain abnormalities in HD patients. The aim of this study was to investigate the effect of long term HD on brain structure of HD patients. Non-contrast MRI was used to evaluate imaging findings. Our study included 80 HD patients of whom 39 had less than six months of HD and 41 patients had a history of HD more than six months. The population had a mean age of 51.60 years old and 27.5% were female. According to study, HD patients who have been hemodialyzed for a long time (median time of HD was up to 4 years) had small vessel ischemia than the HD patients who underwent HD for a shorter term, which the median time was 3 to 5 months. Most of the small vessel ischemia was located in pre-ventricular, subcortical and white matter (1.33± .471, 1.23± .420 and 1.39±.490). However, the other brain damages like: central pons abnormality, global brain atrophy, thinning of corpus callosum and frontal lobe atrophy were found (P<0.01). The present study demonstrated that HD patients who were under HD for a longer time had small vessel ischemia and we conclude that this small vessel ischemia might be a causative mechanism of brain atrophy in chronic hemodialysis patients. However, additional researches are needed in this area.Keywords: Hemodialysis Patients, Duration of Hemodialysis, MRI, Zahedan
Procedia PDF Downloads 213128 Patronage Network and Ideological Manipulations in Translation of Literary Texts: A Case Study of George Orwell's “1984” in Persian Translation in the Period 1980 to 2015
Authors: Masoud Hassanzade Novin, Bahloul Salmani
Abstract:
The process of the translation is not merely the linguistic aspects. It is also considered in the cultural framework of both the source and target text cultures. The translation process and translated texts are confronted the new aspect in 20th century which is considered mostly in the patronage framework and ideological grillwork of the target language. To have these factors scrutinized in the process of the translation both micro-element factors and macro-element factors can be taken into consideration. For the purpose of this study through a qualitative type of research based on critical discourse analysis approach, the case study of the novel “1984” written by George Orwell was chosen as the corpus of the study to have the contrastive analysis by its Persian translated texts. Results of the study revealed some distortions embedded in the target texts which were overshadowed by ideological aspect and patronage network. The outcomes of the manipulated terms were different in various categories which revealed the manipulation aspects in the texts translated.Keywords: critical discourse analysis, ideology, patronage network, translated texts
Procedia PDF Downloads 320127 A Cognitive Semantic Analysis of the Metaphorical Extensions of Come out and Take Over
Authors: Raquel Rossini, Edelvais Caldeira
Abstract:
The aim of this work is to investigate the motivation for the metaphorical uses of two verb combinations: come out and take over. Drawing from cognitive semantics theories, image schemas and metaphors, it was attempted to demonstrate that: a) the metaphorical senses of both 'come out' and 'take over' extend from both the verbs and the particles central (spatial) senses in such verb combinations; and b) the particles 'out' and 'over' also contribute to the whole meaning of the verb combinations. In order to do so, a random selection of 579 concordance lines for come out and 1,412 for take over was obtained from the Corpus of Contemporary American English – COCA. One of the main procedures adopted in the present work was the establishment of verb and particle central senses. As per the research questions addressed in this study, they are as follows: a) how does the identification of trajector and landmark help reveal patterns that contribute for the identification of the semantic network of these two verb combinations?; b) what is the relationship between the schematic structures attributed to the particles and the metaphorical uses found in empirical data?; and c) what conceptual metaphors underlie the mappings from the source to the target domains? The results demonstrated that not only the lexical verbs come and take, but also the particles out and over play an important whole in the different meanings of come out and take over. Besides, image schemas and conceptual metaphors were found to be helpful in order to establish the motivations for the metaphorical uses of these linguistic structures.Keywords: cognitive linguistics, English syntax, multi-word verbs, prepositions
Procedia PDF Downloads 154126 Deep Learning Based-Object-classes Semantic Classification of Arabic Texts
Authors: Imen Elleuch, Wael Ouarda, Gargouri Bilel
Abstract:
We proposes in this paper a Deep Learning based approach to classify text in order to enrich an Arabic ontology based on the objects classes of Gaston Gross. Those object classes are defined by taking into account the syntactic and semantic features of the treated language. Thus, our proposed approach is a hybrid one. In fact, it is based on the one hand on the object classes that represents a knowledge based-approach on classification of text and in the other hand it uses the deep learning approach that use the word embedding-based-approach to classify text. We have applied our proposed approach on a corpus constructed from an Arabic dictionary. The obtained semantic classification of text will enrich the Arabic objects classes ontology. In fact, new classes can be added to the ontology or an expansion of the features that characterizes each object class can be updated. The obtained results are compared to a similar work that treats the same object with a classical linguistic approach for the semantic classification of text. This comparison highlight our hybrid proposed approach that can be ameliorated by broaden the dataset used in the deep learning process.Keywords: deep-learning approach, object-classes, semantic classification, Arabic
Procedia PDF Downloads 86125 Raising Test of English for International Communication (TOEIC) Scores through Purpose-Driven Vocabulary Acquisition
Authors: Edward Sarich, Jack Ryan
Abstract:
In contrast to learning new vocabulary incidentally in one’s first language, foreign language vocabulary is often acquired purposefully, because a lack of natural exposure requires it to be studied in an artificial environment. It follows then that foreign language vocabulary may be more efficiently acquired if it is purpose-driven, or linked to a clear and desirable outcome. The research described in this paper relates to the early stages of what is seen as a long-term effort to measure the effectiveness of a methodology for purpose-driven foreign language vocabulary instruction, specifically by analyzing whether directed studying from high-frequency vocabulary lists leads to an improvement in Test of English for International Communication (TOEIC) scores. The research was carried out in two sections of a first-year university English composition class at a small university in Japan. The results seem to indicate that purposeful study from relevant high-frequency vocabulary lists can contribute to raising TOEIC scores and that the test preparation methodology used in this study was thought by students to be beneficial in helping them to prepare to take this high-stakes test.Keywords: corpus vocabulary, language asssessment, second language vocabulary acquisition, TOEIC test preparation
Procedia PDF Downloads 149124 Study of Multimodal Resources in Interactions Involving Children with Autistic Spectrum Disorders
Authors: Fernanda Miranda da Cruz
Abstract:
This paper aims to systematize, descriptively and analytically, the relations between language, body and material world explored in a specific empirical context: everyday co-presence interactions between children diagnosed with Autistic Spectrum Disease ASD and various interlocutors. We will work based on 20 hours of an audiovisual corpus in Brazilian Portuguese language. This analysis focuses on 1) the analysis of daily interactions that have the presence/participation of subjects with a diagnosis of ASD based on an embodied interaction perspective; 2) the study of the status and role of gestures, body and material world in the construction and constitution of human interaction and its relation with linguistic-cognitive processes and Autistic Spectrum Disorders; 3) to highlight questions related to the field of videoanalysis, such as: procedures for recording interactions in complex environments (involving many participants, use of objects and body movement); the construction of audiovisual corpora for linguistic-interaction research; the invitation to a visual analytical mentality of human social interactions involving not only the verbal aspects that constitute it, but also the physical space, the body and the material world.Keywords: autism spectrum disease, multimodality, social interaction, non-verbal interactions
Procedia PDF Downloads 113123 Multi-Granularity Feature Extraction and Optimization for Pathological Speech Intelligibility Evaluation
Authors: Chunying Fang, Haifeng Li, Lin Ma, Mancai Zhang
Abstract:
Speech intelligibility assessment is an important measure to evaluate the functional outcomes of surgical and non-surgical treatment, speech therapy and rehabilitation. The assessment of pathological speech plays an important role in assisting the experts. Pathological speech usually is non-stationary and mutational, in this paper, we describe a multi-granularity combined feature schemes, and which is optimized by hierarchical visual method. First of all, the difference granularity level pathological features are extracted which are BAFS (Basic acoustics feature set), local spectral characteristics MSCC (Mel s-transform cepstrum coefficients) and nonlinear dynamic characteristics based on chaotic analysis. Latterly, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96-dimensions.The experimental results denote that new features by support vector machine (SVM) has the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.Keywords: pathological speech, multi-granularity feature, MSCC (Mel s-transform cepstrum coefficients), F-score, radar chart
Procedia PDF Downloads 282122 One-Shot Text Classification with Multilingual-BERT
Authors: Hsin-Yang Wang, K. M. A. Salam, Ying-Jia Lin, Daniel Tan, Tzu-Hsuan Chou, Hung-Yu Kao
Abstract:
Detecting user intent from natural language expression has a wide variety of use cases in different natural language processing applications. Recently few-shot training has a spike of usage on commercial domains. Due to the lack of significant sample features, the downstream task performance has been limited or leads to an unstable result across different domains. As a state-of-the-art method, the pre-trained BERT model gathering the sentence-level information from a large text corpus shows improvement on several NLP benchmarks. In this research, we are proposing a method to change multi-class classification tasks into binary classification tasks, then use the confidence score to rank the results. As a language model, BERT performs well on sequence data. In our experiment, we change the objective from predicting labels into finding the relations between words in sequence data. Our proposed method achieved 71.0% accuracy in the internal intent detection dataset and 63.9% accuracy in the HuffPost dataset. Acknowledgment: This work was supported by NCKU-B109-K003, which is the collaboration between National Cheng Kung University, Taiwan, and SoftBank Corp., Tokyo.Keywords: OSML, BERT, text classification, one shot
Procedia PDF Downloads 100121 Filipino And Malaysian Travel Bloggers: Adverbial Intensifiers Used in Blog Description
Authors: Arvin Ludovice
Abstract:
The modern way of communicating and connecting people has been in its easiest forms nowadays, one of it is blog. Blogs, nowadays, are truly relevant in informing people of different as aspects, interests, and fields through these blogs. The evidentiality and testimony of collective people are easily been accessed. However, the description of blog in the making is persuading people, choice of language is one—adverbial intensifiers. Measuring the language on a scale of its intensity subdue the intensity per se. The present study determines, scrutinizes and analyses the adverbial intensifiers used in Filipino and Malaysian. The corpus consists of 30 top travel blogs written by Filipinos and 30 top travel blogs written by Malaysian for a total of 60 travel blogs. The application AntConc was utilized to tag the necessary intensifiers. A frequency distribution of the scores is used to identify the most common intensifiers used by travel bloggers from the Philippines and Malaysia. The scale or degree of intensifier is taken from Quirk Degree of Intensifiers as the basis for the functions of intensifiers. The result found that Malaysian travel blogs are more expressive with the use of the adverbial intensifiers vis-à-vis Filipino travel bloggers, consequently, ranking of the intensifiers, boosters are most used one in expressing and utilizing the language choice a more. The conclusion states that Malaysian travel bloggers are of using the functionality of the adverbial intensifiers. The distinction on the pedagogical implications are hereunto stated as well to deepen and give its significant and importance in language teaching.Keywords: adverbial intensifiers, blogs, Filipino and Malaysian blogs, AntConc
Procedia PDF Downloads 182120 Explicitation as a Non-Professional Translation Universal: Evidence from the Translation of Promotional Material
Authors: Julieta Alos
Abstract:
Following the explicitation hypothesis, it has been proposed that explicitation is a translation universal, i.e., one of those features that characterize translated texts, and cannot be traced back to interference from a particular language. The explicitation hypothesis has been enthusiastically endorsed by some scholars, and firmly rejected by others. Focusing on the translation of promotional material from English into Arabic, specifically in the luxury goods market, the aims of this study are twofold: First, to contribute to the debate regarding the notion of explicitation in order to advance our understanding of what has become a contentious concept. Second, to add to the growing body of literature on non-professional translation by shedding light on this particular aspect of it. To this end, our study uses a combination of qualitative and quantitative methods to explore a corpus of brochures pertaining to the luxury industry, translated into Arabic at the local marketing agencies promoting the brands in question, by bilingual employees who have no translation training. Our data reveals a preference to avoid creative language choices in favor of more direct advertising messages, suggestive of a general tendency towards explicitation in non-professional translation, beyond what is dictated by the grammatical and stylistic constraints of Arabic. We argue, further, that this translation approach is at odds with the principles of luxury advertising, which emphasize implicitness and ambiguity, and view language as an extension of the creative process involved in the production of the luxury item.Keywords: English-Arabic translation, explicitation, non-professional translation, promotional texts
Procedia PDF Downloads 374119 A Contrastive Rhetoric Study: The Use of Textual and Interpersonal Metadiscoursal Markers in Persian and English Newspaper Editorials
Authors: Habibollah Mashhady, Moslem Fatollahi
Abstract:
This study tries to contrast the use of metadiscoursal markers in English and Persian Newspaper Editorials as persuasive text types. These markers are linguistic elements in the text which do not add to the propositional content of it, rather they serve to realize the Halliday’s (1985) textual and interpersonal functions of language. At first, some of the most common markers from five subcategories of Text Connectives, Illocution Markers, Hedges, Emphatics, and Attitude Markers were identified in both English and Persian newspapers. Then, the frequency of occurrence of these markers in both English and Persian corpus consisting of 44 randomly selected editorials (18,000 words in each) from several English and Persian newspapers was recorded. After that, using a two-way chi square analysis, the overall x2 obs was found to be highly significant. So, the null hypothesis of no difference was confidently rejected. Finally, in order to determine the contribution of each subcategory to the overall x 2 value, one-way chi square analyses were applied to the individual subcategories. The results indicated that only two of the five subcategories of markers were statistically significant. This difference is then attributed to the differing spirits prevailing in the linguistic communities involved. Regarding the minor research question it was found that, in contrast to English writers, Persian writers are more writer-oriented in their writings.Keywords: metadiscoursal markers, textual meta-function, interpersonal meta-function, persuasive texts, English and Persian newspaper editorials
Procedia PDF Downloads 571118 Internet Memes: A Mirror of Culture and Society
Authors: Alexandra-Monica Toma
Abstract:
As the internet became a ruling force of society, computer-mediated communication has enriched its methods to convey meaning by combining linguistic means to visual means of expressivity. One of the elements of cyberspace is what we call a meme, a succinct, visually engaging tool used to communicate ideas or emotions, usually in a funny or ironic manner. Coined by Richard Dawkings in the late 1970s to refer to cultural genes, this term now denominates a special type of vernacular language used to share content on the internet. This research aims to analyse the basic mechanism that stands at the basis of meme creation as a blend of innovation and imitation and will approach some of the most widely used image macros remixed to generate new content while also pointing out success strategies. Moreover, this paper discusses whether memes can transcend the light-hearted and playful mood they mirror and become biting and sharp cultural comments. The study also uses the concept of multimodality and stresses how the text interacts with image, discussing three types of relations between the two: symmetry, amplification, and contradiction. We will furthermore show that memes are cultural artifacts and virtual tropes highly dependent on context and societal issues by using a corpus of memes created related to the COVID-19 pandemic.Keywords: context, computer-mediated communication, memes, multimodality
Procedia PDF Downloads 183117 The Morphology of Sri Lankan Text Messages
Authors: Chamindi Dilkushi Senaratne
Abstract:
Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.Keywords: bilingual, deviations, morphology, texts
Procedia PDF Downloads 267116 End-to-End Spanish-English Sequence Learning Translation Model
Authors: Vidhu Mitha Goutham, Ruma Mukherjee
Abstract:
The low availability of well-trained, unlimited, dynamic-access models for specific languages makes it hard for corporate users to adopt quick translation techniques and incorporate them into product solutions. As translation tasks increasingly require a dynamic sequence learning curve; stable, cost-free opensource models are scarce. We survey and compare current translation techniques and propose a modified sequence to sequence model repurposed with attention techniques. Sequence learning using an encoder-decoder model is now paving the path for higher precision levels in translation. Using a Convolutional Neural Network (CNN) encoder and a Recurrent Neural Network (RNN) decoder background, we use Fairseq tools to produce an end-to-end bilingually trained Spanish-English machine translation model including source language detection. We acquire competitive results using a duo-lingo-corpus trained model to provide for prospective, ready-made plug-in use for compound sentences and document translations. Our model serves a decent system for large, organizational data translation needs. While acknowledging its shortcomings and future scope, it also identifies itself as a well-optimized deep neural network model and solution.Keywords: attention, encoder-decoder, Fairseq, Seq2Seq, Spanish, translation
Procedia PDF Downloads 174115 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity
Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang
Abstract:
The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.Keywords: text information retrieval, natural language processing, new word discovery, information extraction
Procedia PDF Downloads 94114 Epistemic Stance in Chinese Medicine Translation: A Systemic Functional Perspective
Authors: Yan Yue
Abstract:
Epistemic stance refers to the writer’s judgement about the certainty of the proposition, which demonstrates writer’s degree of commitment and confidence to the status of the information. Epistemic stance can exert great consequence to the validity or reliability of the values of a statement, however, to date, it receives little attention in translations studies, especially from the perspective of systemic functional linguistics (SFL) and with the relation to translator’s domain knowledge. This study is corpus-based research carried out in SFL perspective, which investigates translator’s epistemic stance pattern in Chinese medicine discourse translations by translators with and without medical domain knowledge. Overall, our findings show that all translators tend to be neither too assertive nor too doubted about Chinese medicine statements, and they all tend to express their epistemic stance in a subjective rather than objective way. Individually, there is a clear pattern of epistemic stance marked off by translators’ medical expertise, which further consolidates the previous finding that epistemic asymmetry is found most salient between lay people and professionals. However, contrary to our hypothesis, translators as clinicians who have more medical knowledge are found to be more tentative to TCM statements than translators as non-clinicians. This finding could serve to refine the statements about the relation between writer’s domain knowledge and epistemic stance-taking and the current debate whether Chinese medicine texts should only be translated by clinicians.Keywords: epistemic stance, domain knowledge, SFL, medical translation
Procedia PDF Downloads 145113 How Unicode Glyphs Revolutionized the Way We Communicate
Authors: Levi Corallo
Abstract:
Typed language made by humans on computers and cell phones has made a significant distinction from previous modes of written language exchanges. While acronyms remain one of the most predominant markings of typed language, another and perhaps more recent revolution in the way humans communicate has been with the use of symbols or glyphs, primarily Emojis—globally introduced on the iPhone keyboard by Apple in 2008. This paper seeks to analyze the use of symbols in typed communication from both a linguistic and machine learning perspective. The Unicode system will be explored and methods of encoding will be juxtaposed with the current machine and human perception. Topics in how typed symbol usage exists in conversation will be explored as well as topics across current research methods dealing with Emojis like sentiment analysis, predictive text models, and so on. This study proposes that sequential analysis is a significant feature for analyzing unicode characters in a corpus with machine learning. Current models that are trying to learn or translate the meaning of Emojis should be starting to learn using bi- and tri-grams of Emoji, as well as observing the relationship between combinations of different Emoji in tandem. The sociolinguistics of an entire new vernacular of language referred to here as ‘typed language’ will also be delineated across my analysis with unicode glyphs from both a semantic and technical perspective.Keywords: unicode, text symbols, emojis, glyphs, communication
Procedia PDF Downloads 194112 Analysis of Linguistic Disfluencies in Bilingual Children’s Discourse
Authors: Sheena Christabel Pravin, M. Palanivelan
Abstract:
Speech disfluencies are common in spontaneous speech. The primary purpose of this study was to distinguish linguistic disfluencies from stuttering disfluencies in bilingual Tamil–English (TE) speaking children. The secondary purpose was to determine whether their disfluencies are mediated by native language dominance and/or on an early onset of developmental stuttering at childhood. A detailed study was carried out to identify the prosodic and acoustic features that uniquely represent the disfluent regions of speech. This paper focuses on statistical modeling of repetitions, prolongations, pauses and interjections in the speech corpus encompassing bilingual spontaneous utterances from school going children – English and Tamil. Two classifiers including Hidden Markov Models (HMM) and the Multilayer Perceptron (MLP), which is a class of feed-forward artificial neural network, were compared in the classification of disfluencies. The results of the classifiers document the patterns of disfluency in spontaneous speech samples of school-aged children to distinguish between Children Who Stutter (CWS) and Children with Language Impairment CLI). The ability of the models in classifying the disfluencies was measured in terms of F-measure, Recall, and Precision.Keywords: bi-lingual, children who stutter, children with language impairment, hidden markov models, multi-layer perceptron, linguistic disfluencies, stuttering disfluencies
Procedia PDF Downloads 216111 Information Disclosure And Financial Sentiment Index Using a Machine Learning Approach
Authors: Alev Atak
Abstract:
In this paper, we aim to create a financial sentiment index by investigating the company’s voluntary information disclosures. We retrieve structured content from BIST 100 companies’ financial reports for the period 1998-2018 and extract relevant financial information for sentiment analysis through Natural Language Processing. We measure strategy-related disclosures and their cross-sectional variation and classify report content into generic sections using synonym lists divided into four main categories according to their liquidity risk profile, risk positions, intra-annual information, and exposure to risk. We use Word Error Rate and Cosin Similarity for comparing and measuring text similarity and derivation in sets of texts. In addition to performing text extraction, we will provide a range of text analysis options, such as the readability metrics, word counts using pre-determined lists (e.g., forward-looking, uncertainty, tone, etc.), and comparison with reference corpus (word, parts of speech and semantic level). Therefore, we create an adequate analytical tool and a financial dictionary to depict the importance of granular financial disclosure for investors to identify correctly the risk-taking behavior and hence make the aggregated effects traceable.Keywords: financial sentiment, machine learning, information disclosure, risk
Procedia PDF Downloads 94110 Recognizing an Individual, Their Topic of Conversation and Cultural Background from 3D Body Movement
Authors: Gheida J. Shahrour, Martin J. Russell
Abstract:
The 3D body movement signals captured during human-human conversation include clues not only to the content of people’s communication but also to their culture and personality. This paper is concerned with automatic extraction of this information from body movement signals. For the purpose of this research, we collected a novel corpus from 27 subjects, arranged them into groups according to their culture. We arranged each group into pairs and each pair communicated with each other about different topics. A state-of-art recognition system is applied to the problems of person, culture, and topic recognition. We borrowed modeling, classification, and normalization techniques from speech recognition. We used Gaussian Mixture Modeling (GMM) as the main technique for building our three systems, obtaining 77.78%, 55.47%, and 39.06% from the person, culture, and topic recognition systems respectively. In addition, we combined the above GMM systems with Support Vector Machines (SVM) to obtain 85.42%, 62.50%, and 40.63% accuracy for person, culture, and topic recognition respectively. Although direct comparison among these three recognition systems is difficult, it seems that our person recognition system performs best for both GMM and GMM-SVM, suggesting that inter-subject differences (i.e. subject’s personality traits) are a major source of variation. When removing these traits from culture and topic recognition systems using the Nuisance Attribute Projection (NAP) and the Intersession Variability Compensation (ISVC) techniques, we obtained 73.44% and 46.09% accuracy from culture and topic recognition systems respectively.Keywords: person recognition, topic recognition, culture recognition, 3D body movement signals, variability compensation
Procedia PDF Downloads 540109 Craniopharyngiomas: Surgical Techniques: The Combined Interhemispheric Sub-Commissural Translaminaterminalis Approach to Tumors in and Around the Third Ventricle: Neurological and Functional Outcome
Authors: Pietro Mortini, Marco Losa
Abstract:
Objective: Resection of large lesions growing into the third ventricle remains a demanding surgery, sometimes at risk of severe post-operative complications. Transcallosal and transcortical routes were considered as approaches of choice to access the third ventricle, however neurological consequences like memory loss have been reported. We report clinical results of the previously described combined interhemispheric sub-commissural translaminaterminalis approach (CISTA) for the resection of large lesions located in the third ventricle. Methods: Authors conducted a retrospective analysis on 10 patients, who were operated through the CISTA, for the resection of lesions growing into the third ventricle. Results: Total resection was achieved in all cases. Cognitive worsening occurred only in one case. No perioperative deaths were recorded and, at last follow-up, all patients were alive. One year after surgery 80% of patients had an excellent outcome with a KPS 100 and Glasgow Outcome score (GOS) Conclusion: The CISTA represents a safe and effective alternative to transcallosal and transcortical routes to resect lesions growing into the third ventricle. It allows for a multiangle trajectory to access the third ventricle with a wide working area free from critical neurovascular structures, without any section of the corpus callosum, the anterior commissure and the fornix.Keywords: craniopharingioma, surgery, sub-commissural translaminaterminalis approach (CISTA),
Procedia PDF Downloads 293108 Using English Discourse Markers by Saudi EFL Learners: A Descriptive Approach
Authors: Sadeq Al Yaari, Fayza Al Hammadi, Nassr Almaflehi, Ayman Al Yaari, Adham Al Yaari, Montaha Al Yaari, Aayah Al Yaari, Sajedah Al Yaari
Abstract:
Background: The language of EFL learners is of special interests to linguists. Little research has been tackled on issues concerning English Discourse Markers (EDMs) among Saudi EFL learners. Aims: Employing a corpus-based descriptive analysis, the current study attempts at detecting EDMs in the talk of Saudi EFL learners, their frequency, use, usage, etc., in comparison to other EFL learners as well as native speakers. Methods: Two hundreds Saudi EFL learners were randomly selected from 20 public and private schools (ten students from each school) across the Kingdom of Saudi Arabia (KSA). Subjects were individually recorded while they were studying English in class. Recordings were then linguistically and statistically analyzed by the researchers. Conclusion: Results illustrate that EDMs “and”, “but” and “also” are the most frequent EDMs in the talk of Saudi EFL learners. These devices are randomly used by Saudi EFL learners who mix their use (appropriateness) with usage (correctedness) due to the influence of their L1 (Arabic). In compare to other EFL learners (native and non-native), Saudi EFL learners use less EDMs. These results confirmed the claims that EFL learners use EDMs less than native speakers. This paper, although preliminary in nature, can help arrive a better understanding of using EDMs by Saudi EFL learners. Further, it can also assist in getting appropriate insights into the way how these EDMs are used in Arab Gulf countries. The researchers decided to conduct an in-depth study into the use of EDMs in the oral work of Saudi EFL learners.Keywords: English discourse markers, Saudi EFL learners, use, usage, frequency, native speakers
Procedia PDF Downloads 45