Search results for: analyzing text data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 26144

Search results for: analyzing text data

25964 A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction

Authors: Smita Pallavi, Raj Ratn Pranesh, Sumit Kumar

Abstract:

Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats.

Keywords: table extraction, optical character recognition, image processing, text extraction, morphological transformation

Procedia PDF Downloads 117
25963 Sentence Variation in Academic Writing: A Contrastive Study of the Variation of Sentence Types between Male and Female ESL Writers

Authors: Fatima Muhammad Shitu

Abstract:

This paper focuses on the variation of sentence types in English academic writing. The major focus is on whether variation in sentence types can be attributable to the linguistic and most of all the gender of the writers. The objective of this paper is to analyze the sentence types produced by Male and Female ESL writers and to determine whether writers vary the frequency and use of sentence types across the text depending on the rhetorical choices of the writers to construct identity. This study is hinged on the functionalist approach to analyzing academic writing in use. For the purpose of this study, a corpus of 20 academic papers was created and the use of sentences types was analyzed. The data for the study was collated using percentages. In this case, the number of occurrences of the different sentence types were analyzed, calculated and then converted to percentages for each group i.e., male and female ESL writers. The results from these analyses were compared and contrasted in order to determine whether Male and Female ESL writer vary their sentence types, and, or employed the same or different sentence types in their texts. The conclusion is that Male and Female ESL writers not only vary in their use of sentence types in academic writings but also differ.

Keywords: sentence variation, ESL, gender, academic writing

Procedia PDF Downloads 294
25962 Urban Areas Management in Developing Countries: Analysis of the Urban Areas Crossed with Risk of Storm Water Drains, Aswan-Egypt

Authors: Omar Hamdy, Schichen Zhao, Hussein Abd El-Atty, Ayman Ragab, Muhammad Salem

Abstract:

One of the most risky areas in Aswan is Abouelreesh, which is suffering from flood disasters, as heavy deluge inundates urban areas causing considerable damage to buildings and infrastructure. Moreover, the main problem was the urban sprawl towards this risky area. This paper aims to identify the urban areas located in the risk areas prone to flash floods. Analyzing this phenomenon needs a lot of data to ensure satisfactory results; however, in this case the official data and field data were limited, and therefore, free sources of satellite data were used. This paper used ArcGIS tools to obtain the storm water drains network by analyzing DEM files. Additionally, historical imagery in Google Earth was studied to determine the age of each building. The last step was to overlay the urban area layer and the storm water drains layer to identify the vulnerable areas. The results of this study would be helpful to urban planners and government officials to make the disasters risk estimation and develop primary plans to recover the risky area, especially urban areas located in torrents.

Keywords: risk area, DEM, storm water drains, GIS

Procedia PDF Downloads 426
25961 A U-Net Based Architecture for Fast and Accurate Diagram Extraction

Authors: Revoti Prasad Bora, Saurabh Yadav, Nikita Katyal

Abstract:

In the context of educational data mining, the use case of extracting information from images containing both text and diagrams is of high importance. Hence, document analysis requires the extraction of diagrams from such images and processes the text and diagrams separately. To the author’s best knowledge, none among plenty of approaches for extracting tables, figures, etc., suffice the need for real-time processing with high accuracy as needed in multiple applications. In the education domain, diagrams can be of varied characteristics viz. line-based i.e. geometric diagrams, chemical bonds, mathematical formulas, etc. There are two broad categories of approaches that try to solve similar problems viz. traditional computer vision based approaches and deep learning approaches. The traditional computer vision based approaches mainly leverage connected components and distance transform based processing and hence perform well in very limited scenarios. The existing deep learning approaches either leverage YOLO or faster-RCNN architectures. These approaches suffer from a performance-accuracy tradeoff. This paper proposes a U-Net based architecture that formulates the diagram extraction as a segmentation problem. The proposed method provides similar accuracy with a much faster extraction time as compared to the mentioned state-of-the-art approaches. Further, the segmentation mask in this approach allows the extraction of diagrams of irregular shapes.

Keywords: computer vision, deep-learning, educational data mining, faster-RCNN, figure extraction, image segmentation, real-time document analysis, text extraction, U-Net, YOLO

Procedia PDF Downloads 101
25960 Prosperous Digital Image Watermarking Approach by Using DCT-DWT

Authors: Prabhakar C. Dhavale, Meenakshi M. Pawar

Abstract:

In this paper, everyday tons of data is embedded on digital media or distributed over the internet. The data is so distributed that it can easily be replicated without error, putting the rights of their owners at risk. Even when encrypted for distribution, data can easily be decrypted and copied. One way to discourage illegal duplication is to insert information known as watermark, into potentially valuable data in such a way that it is impossible to separate the watermark from the data. These challenges motivated researchers to carry out intense research in the field of watermarking. A watermark is a form, image or text that is impressed onto paper, which provides evidence of its authenticity. Digital watermarking is an extension of the same concept. There are two types of watermarks visible watermark and invisible watermark. In this project, we have concentrated on implementing watermark in image. The main consideration for any watermarking scheme is its robustness to various attacks

Keywords: watermarking, digital, DCT-DWT, security

Procedia PDF Downloads 389
25959 Short Text Classification for Saudi Tweets

Authors: Asma A. Alsufyani, Maram A. Alharthi, Maha J. Althobaiti, Manal S. Alharthi, Huda Rizq

Abstract:

Twitter is one of the most popular microblogging sites that allows users to publish short text messages called 'tweets'. Increasing the number of accounts to follow (followings) increases the number of tweets that will be displayed from different topics in an unclassified manner in the timeline of the user. Therefore, it can be a vital solution for many Twitter users to have their tweets in a timeline classified into general categories to save the user’s time and to provide easy and quick access to tweets based on topics. In this paper, we developed a classifier for timeline tweets trained on a dataset consisting of 3600 tweets in total, which were collected from Saudi Twitter and annotated manually. We experimented with the well-known Bag-of-Words approach to text classification, and we used support vector machines (SVM) in the training process. The trained classifier performed well on a test dataset, with an average F1-measure equal to 92.3%. The classifier has been integrated into an application, which practically proved the classifier’s ability to classify timeline tweets of the user.

Keywords: corpus creation, feature extraction, machine learning, short text classification, social media, support vector machine, Twitter

Procedia PDF Downloads 124
25958 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev

Abstract:

Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: data analysis, drug development, sentiment analysis, text-mining

Procedia PDF Downloads 124
25957 Moral Wrongdoers: Evaluating the Value of Moral Actions Performed by War Criminals

Authors: Jean-Francois Caron

Abstract:

This text explores the value of moral acts performed by war criminals, and the extent to which they should alleviate the punishment these individuals ought to receive for violating the rules of war. Without neglecting the necessity of retribution in war crimes cases, it argues from an ethical perspective that we should not rule out the possibility of considering lesser punishments for war criminals who decide to perform a moral act, as it might produce significant positive moral outcomes. This text also analyzes how such a norm could be justified from a moral perspective.

Keywords: war criminals, pardon, amnesty, retribution

Procedia PDF Downloads 250
25956 Architectural Experience of the Everyday in Phuket Old Town

Authors: Thirayu Jumsai na Ayudhya

Abstract:

Initial attempts to understand about what architecture means to people as they go about their everyday life through my previous research revealed that knowledge such as environmental psychology, environmental perception, environmental aesthetics, did not adequately address a perceived need for the contextualized and holistic theoretical framework. In my previous research, it is found that people’s making senses of their everyday architecture can be described in terms of four super‐ordinate themes; (1) building in urban (text), (2) building in (text), (3) building in human (text), (4) and building in time (text). For more comprehensively understanding of how people make sense of their everyday architectural experience, in this ongoing research Phuket Old town was selected as the focal urban context where the distinguish character of Chino-Portuguese is remarkable. It is expected that in a unique urban context like Phuket old town unprecedented super-ordinate themes will be unveiled through the reflection of people’s everyday experiences. The ongoing research of people’s architectural experience conducted in Phuket Island, Thailand, will be presented succinctly. The research will address the question of how do people make sense of their everyday architecture/buildings especially in a unique urban context, Phuket Old town, and identify ways in which people make sense of their everyday architecture. Participant-Produced-Photograph (PPP) and Interpretative Phenomenological Analysis (IPA) are adopted as main methodologies. PPP allows people to express experiences of their everyday urban context freely without any interference or forced-data generating by researchers. With IPA methodology a small pool of participants is considered desirable given the detailed level of analysis required and its potential to produce a meaningful outcome.

Keywords: architectural experience, the everyday architecture, Phuket, Thailand

Procedia PDF Downloads 269
25955 A Deep Learning Approach to Subsection Identification in Electronic Health Records

Authors: Nitin Shravan, Sudarsun Santhiappan, B. Sivaselvan

Abstract:

Subsection identification, in the context of Electronic Health Records (EHRs), is identifying the important sections for down-stream tasks like auto-coding. In this work, we classify the text present in EHRs according to their information, using machine learning and deep learning techniques. We initially describe briefly about the problem and formulate it as a text classification problem. Then, we discuss upon the methods from the literature. We try two approaches - traditional feature extraction based machine learning methods and deep learning methods. Through experiments on a private dataset, we establish that the deep learning methods perform better than the feature extraction based Machine Learning Models.

Keywords: deep learning, machine learning, semantic clinical classification, subsection identification, text classification

Procedia PDF Downloads 182
25954 Ontology for a Voice Transcription of OpenStreetMap Data: The Case of Space Apprehension by Visually Impaired Persons

Authors: Said Boularouk, Didier Josselin, Eitan Altman

Abstract:

In this paper, we present a vocal ontology of OpenStreetMap data for the apprehension of space by visually impaired people. Indeed, the platform based on produsage gives a freedom to data producers to choose the descriptors of geocoded locations. Unfortunately, this freedom, called also folksonomy leads to complicate subsequent searches of data. We try to solve this issue in a simple but usable method to extract data from OSM databases in order to send them to visually impaired people using Text To Speech technology. We focus on how to help people suffering from visual disability to plan their itinerary, to comprehend a map by querying computer and getting information about surrounding environment in a mono-modal human-computer dialogue.

Keywords: TTS, ontology, open street map, visually impaired

Procedia PDF Downloads 267
25953 Exploring the Landscape of Information Visualization through a Mark Lombardi Lens

Authors: Alon Friedman, Antonio Sanchez Chinchon

Abstract:

This bibliometric study takes an artistic and storytelling approach to explore the term ”information visualization.” Analyzing over 1008 titles collected from databases that specialize in data visualization research, we examine the titles of these publications to report on the characteristics and development trends in the field. Employing a qualitative methodology, we delve into the titles of these publications, extracting leading terms and exploring the cooccurrence of these terms to gain deeper insights. By systematically analyzing the leading terms and their relationships within the titles, we shed light on the prevailing themes that shape the landscape of ”information visualization” by employing the artist Mark Lombardi’s techniques to visualize our findings. By doing so, this study provides valuable insights into bibliometrics visualization while also opening new avenues for leveraging art and storytelling to enhance data representation.

Keywords: bibliometrics analysis, Mark Lombardi design, information visualization, qualitative methodology

Procedia PDF Downloads 47
25952 Optimizing Communications Overhead in Heterogeneous Distributed Data Streams

Authors: Rashi Bhalla, Russel Pears, M. Asif Naeem

Abstract:

In this 'Information Explosion Era' analyzing data 'a critical commodity' and mining knowledge from vertically distributed data stream incurs huge communication cost. However, an effort to decrease the communication in the distributed environment has an adverse influence on the classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost and accuracy. This paper proposes a method based on Bayesian inference to reduce the communication volume in a heterogeneous distributed environment while retaining prediction accuracy. Our experimental evaluation reveals that a significant reduction in communication can be achieved across a diverse range of dataset types.

Keywords: big data, bayesian inference, distributed data stream mining, heterogeneous-distributed data

Procedia PDF Downloads 128
25951 On the Relationship between the Concepts of "[New] Social Democracy" and "Democratic Socialism"

Authors: Gintaras Mitrulevičius

Abstract:

This text, which is based on the conference report, seeks to briefly examine the relationship between the concepts of social democracy and democratic socialism, drawing attention to the essential aspects of its development and, in particular, discussing the contradictions in the relationship between these concepts in the modern period. In the preparation of this text, such research methods as historical, historical-comparative methods were used, as well as methods of analyzing, synthesizing, and generalizing texts. The history of the use of terms in social democracy and democratic socialism shows that these terms were used alternately and almost synonymously. At the end of the 20th century, traditional social democracy was transformed into the so-called "new social democracy." Many of the new social democrats do not consider themselves democratic socialists and avoid the historically characteristic identification of social democracy with democratic socialism. It has become quite popular to believe that social democracy is a separate ideology from democratic socialism. Or that it has become a variant of the ideology of liberalism. This is a testimony to the crisis of ideological self-awareness of social democracy. Since the beginning of the 21st century, social democracy has also experienced a growing crisis of electoral support. This, among other things, led to her slight shift to the left. In this context, some social democrats are once again talking about democratic socialism. The rise of the ideas of democratic socialism in the United States was catalyzed by Bernie Sanders. But the proponents of democratic socialism in the United States have different concepts of democratic socialism. In modern Europe, democratic socialism is also spoken of by leftists of non-social democratic origin, whose understanding is different from that of democratic socialism inherent in classical social democracy. Some political scientists also single out the concepts in question. Analysis of the problem shows that there are currently several concepts of democratic socialism on the spectrum of the political left, both social-democratic and non-social-democratic.

Keywords: democratic socializm, socializm, social democracy, new social democracy, political ideologies

Procedia PDF Downloads 88
25950 Study on the Focus of Attention of Special Education Students in Primary School

Authors: Tung-Kuang Wu, Hsing-Pei Hsieh, Ying-Ru Meng

Abstract:

Special Education in Taiwan has been facing difficulties including shortage of teachers and lack in resources. Some students need to receive special education are thus not identified or admitted. Fortunately, information technologies can be applied to relieve some of the difficulties. For example, on-line multimedia courseware can be used to assist the learning of special education students and take pretty much workload from special education teachers. However, there may exist cognitive variations between students in special or regular educations, which suggests the design of online courseware requires different considerations. This study aims to investigate the difference in focus of attention (FOA) between special and regular education students of primary school in viewing the computer screen. The study is essential as it helps courseware developers in determining where to put learning elements that matter the most on the right position of screen. It may also assist special education specialists to better understand the subtle differences among various subtypes of learning disabilities. This study involves 76 special education students (among them, 39 are students with mental retardation, MR, and 37 are students with learning disabilities, LDs) and 42 regular education students. The participants were asked to view a computer screen showing a picture partitioned into 3 × 3 areas with each area filled with text or icon. The subjects were then instructed to mark on the prior given paper sheets, which are also partitioned into 3 × 3 grids, the areas corresponding to the pictures on the computer screen that they first set their eyes on. The data are then collected and analyzed. Major findings are listed: 1. In both text and icon scenario, significant differences exist in the first preferred FOA between special and regular education students. The first FOA for the former is mainly on area 1 (upper left area, 53.8% / 51.3% for MR / LDs students in text scenario; and 53.8% / 56.8% for MR / LDs students in icons scenario), while the latter on area 5 (middle area, 50.0% and 57.1% in text and icons scenarios). 2. The second most preferred area in text scenario for students with MR and LDs are area 2 (upper-middle, 20.5%) and 5 (middle area, 24.3%). In icons scenario, the results are similar, but lesser in percentage. 3. Students with LDs that show similar preference (either in text or icons scenarios) in FOA to regular education students tend to be of some specific sub-type of learning disabilities. For instance, students with LDs that chose area 5 (middle area, either in text or icon scenario) as their FOA are mostly ones that have reading or writing disability. Also, three (out of 13) subjects in this category, after going through the rediagnosis process, were excluded from being learning disabilities. In summary, the findings suggest when designing multimedia courseware for students with MR and LDs, the essential learning elements should be placed on area 1, 2 and 5. In addition, FOV preference may also potentially be used as an indicator for diagnosing students with LDs.

Keywords: focus of attention, learning disabilities, mental retardation, on-line multimedia courseware, special education

Procedia PDF Downloads 141
25949 Insight-Based Evaluation of a Map-Based Dashboard

Authors: Anna Fredriksson Häägg, Charlotte Weil, Niklas Rönnberg

Abstract:

Map-based dashboards are used for data exploration every day. The present study used an insight-based methodology for evaluating a map-based dashboard that presents research findings of water management and ecosystem services in the Amazon. In addition to analyzing the insights gained from using the dashboard, the evaluation method was compared to standardized questionnaires and task-based evaluations. The result suggests that the dashboard enabled the participants to gain domain-relevant, complex insights regarding the topic presented. Furthermore, the insight-based analysis highlighted unexpected insights and hypotheses regarding causes and potential adaptation strategies for remediation. Although time- and resource-consuming, the insight-based methodology was shown to have the potential of thoroughly analyzing how end users can utilize map-based dashboards for data exploration and decision making. Finally, the insight-based methodology is argued to evaluate tools in scenarios more similar to real-life usage compared to task-based evaluation methods.

Keywords: visual analytics, dashboard, insight-based evaluation, geographic visualization

Procedia PDF Downloads 90
25948 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques

Authors: Mei-Yi Wu, Shang-Ming Huang

Abstract:

The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.

Keywords: mobile image retrieval, text mining, product information service system, online marketing

Procedia PDF Downloads 329
25947 The Application of a Hybrid Neural Network for Recognition of a Handwritten Kazakh Text

Authors: Almagul Assainova , Dariya Abykenova, Liudmila Goncharenko, Sergey Sybachin, Saule Rakhimova, Abay Aman

Abstract:

The recognition of a handwritten Kazakh text is a relevant objective today for the digitization of materials. The study presents a model of a hybrid neural network for handwriting recognition, which includes a convolutional neural network and a multi-layer perceptron. Each network includes 1024 input neurons and 42 output neurons. The model is implemented in the program, written in the Python programming language using the EMNIST database, NumPy, Keras, and Tensorflow modules. The neural network training of such specific letters of the Kazakh alphabet as ә, ғ, қ, ң, ө, ұ, ү, h, і was conducted. The neural network model and the program created on its basis can be used in electronic document management systems to digitize the Kazakh text.

Keywords: handwriting recognition system, image recognition, Kazakh font, machine learning, neural networks

Procedia PDF Downloads 226
25946 Corpus Linguistics as a Tool for Translation Studies Analysis: A Bilingual Parallel Corpus of Students’ Translations

Authors: Juan-Pedro Rica-Peromingo

Abstract:

Nowadays, corpus linguistics has become a key research methodology for Translation Studies, which broadens the scope of cross-linguistic studies. In the case of the study presented here, the approach used focuses on learners with little or no experience to study, at an early stage, general mistakes and errors, the correct or incorrect use of translation strategies, and to improve the translational competence of the students. Led by Sylviane Granger and Marie-Aude Lefer of the Centre for English Corpus Linguistics of the University of Louvain, the MUST corpus (MUltilingual Student Translation Corpus) is an international project which brings together partners from Europe and worldwide universities and connects Learner Corpus Research (LCR) and Translation Studies (TS). It aims to build a corpus of translations carried out by students including both direct (L2 > L1) an indirect (L1 > L2) translations, from a great variety of text types, genres, and registers in a wide variety of languages: audiovisual translations (including dubbing, subtitling for hearing population and for deaf population), scientific, humanistic, literary, economic and legal translation texts. This paper focuses on the work carried out by the Spanish team from the Complutense University (UCMA), which is part of the MUST project, and it describes the specific features of the corpus built by its members. All the texts used by UCMA are either direct or indirect translations between English and Spanish. Students’ profiles comprise translation trainees, foreign language students with a major in English, engineers studying EFL and MA students, all of them with different English levels (from B1 to C1); for some of the students, this would be their first experience with translation. The MUST corpus is searchable via Hypal4MUST, a web-based interface developed by Adam Obrusnik from Masaryk University (Czech Republic), which includes a translation-oriented annotation system (TAS). A distinctive feature of the interface is that it allows source texts and target texts to be aligned, so we can be able to observe and compare in detail both language structures and study translation strategies used by students. The initial data obtained point out the kind of difficulties encountered by the students and reveal the most frequent strategies implemented by the learners according to their level of English, their translation experience and the text genres. We have also found common errors in the graduate and postgraduate university students’ translations: transfer errors, lexical errors, grammatical errors, text-specific translation errors, and cultural-related errors have been identified. Analyzing all these parameters will provide more material to bring better solutions to improve the quality of teaching and the translations produced by the students.

Keywords: corpus studies, students’ corpus, the MUST corpus, translation studies

Procedia PDF Downloads 117
25945 Entropy in a Field of Emergence in an Aspect of Linguo-Culture

Authors: Nurvadi Albekov

Abstract:

Communicative situation is a basis, which designates potential models of ‘constructed forms’, a motivated basis of a text, for a text can be assumed as a product of the communicative situation. It is within the field of emergence the models of text, that can be potentially prognosticated in a certain communicative situation, are designated. Every text can be assumed as conceptual system structured on the base of certain communicative situation. However in the process of ‘structuring’ of a certain model of ‘conceptual system’ consciousness of a recipient is able act only within the border of the field of emergence for going out of this border indicates misunderstanding of the communicative situation. On the base of communicative situation we can witness the increment of meaning where the synergizing of the informative model of communication, formed by using of the invariant units of a language system, is a result of verbalization of the communicative situation. The potential of the models of a text, prognosticated within the field of emergence, also depends on the communicative situation. The conception ‘the field of emergence’ is interpreted as a unit of the language system, having poly-directed universal structure, implying the presence of the core, the center and the periphery, including different levels of means of a functioning system of language, both in terms of linguistic resources, and in terms of extra linguistic factors interaction of which results increment of a text. The conception ‘field of emergence’ is considered as the most promising in the analysis of texts: oral, written, printed and electronic. As a unit of the language system field of emergence has several properties that predict its use during the study of a text in different levels. This work is an attempt analysis of entropy in a text in the aspect of lingua-cultural code, prognosticated within the model of the field of emergence. The article describes the problem of entropy in the field of emergence, caused by influence of the extra-linguistic factors. The increasing of entropy is caused not only by the fact of intrusion of the language resources but by influence of the alien culture in a whole, and by appearance of non-typical for this very culture symbols in the field of emergence. The borrowing of alien lingua-cultural symbols into the lingua-culture of the author is a reason of increasing the entropy when constructing a text both in meaning and in structuring level. It is nothing but artificial formatting of lexical units that violate stylistic unity of a phrase. It is marked that one of the important characteristics descending the entropy in the field of emergence is a typical similarity of lexical and semantic resources of the different lingua-cultures in aspects of extra linguistic factors.

Keywords: communicative situation, field of emergence, lingua-culture, entropy

Procedia PDF Downloads 335
25944 The Difference of Learning Outcomes in Reading Comprehension between Text and Film as The Media in Indonesian Language for Foreign Speaker in Intermediate Level

Authors: Siti Ayu Ningsih

Abstract:

This study aims to find the differences outcomes in learning reading comprehension with text and film as media on Indonesian Language for foreign speaker (BIPA) learning at intermediate level. By using quantitative and qualitative research methods, the respondent of this study is a single respondent from D'Royal Morocco Integrative Islamic School in grade nine from secondary level. Quantitative method used to calculate the learning outcomes that have been given the appropriate action cycle, whereas qualitative method used to translate the findings derived from quantitative methods to be described. The technique used in this study is the observation techniques and testing work. Based on the research, it is known that the use of the text media is more effective than the film for intermediate level of Indonesian Language for foreign speaker learner. This is because, when using film the learner does not have enough time to take note the difficult vocabulary and don't have enough time to look for the meaning of the vocabulary from the dictionary. While the use of media texts shows the better effectiveness because it does not require additional time to take note the difficult words. For the words that are difficult or strange, the learner can immediately find its meaning from the dictionary. The presence of the text is also very helpful for Indonesian Language for foreign speaker learner to find the answers according to the questions more easily. By matching the vocabulary of the question into the text references.

Keywords: Indonesian language for foreign speaker, learning outcome, media, reading comprehension

Procedia PDF Downloads 174
25943 The Oral Production of University EFL Students: An Analysis of Tasks, Format, and Quality in Foreign Language Development

Authors: Vera Lucia Teixeira da Silva, Sandra Regina Buttros Gattolin de Paula

Abstract:

The present study focuses on academic literacy and addresses the impact of semantic-discursive resources on the constitution of genres that are produced in such context. The research considers the development of writing in the academic context in Portuguese. Researches that address academic literacy and the characteristics of the texts produced in this context are rare, mainly with focus on the development of writing, considering three variables: the constitution of the writer, the perception of the reader/interlocutor and the organization of the informational text flow. The research aims to map the semantic-discursive resources of the written register in texts of several genres and produced by students in the first semester of the undergraduate course in Letters. The hypothesis raised is that writing in the academic environment is not a recurrent literacy practice for these learners and can be explained by the ontogenetic and phylogenetic nature of language development. Qualitative in nature, the present research has as empirical data texts produced in a half-yearly course of Reading and Textual Production; these data result from the proposition of four different writing proposals, in a total of 600 texts. The corpus is analyzed based on semantic-discursive resources, seeking to contemplate relevant aspects of language (grammar, discourse and social context) that reveal the choices made in the reader/writer interrelationship and the organizational flow of the Text. Among the semantic-discursive resources, the analysis includes three resources, including (a) appraisal and negotiation to understand the attitudes negotiated (roles of the participants of the discourse and their relationship with the other); (b) ideation to explain the construction of the experience (activities performed and participants); and (c) periodicity to outline the flow of information in the organization of the text according to the genre it instantiates. The results indicate the organizational difficulties of the flow of the text information. Cartography contributes to the understanding of the way writers use language in an effort to present themselves, evaluate someone else’s work, and communicate with readers.

Keywords: academic writing, Portuguese mother tongue, semantic-discursive resources, academic context

Procedia PDF Downloads 91
25942 Analyzing Environmental Emotive Triggers in Terrorist Propaganda

Authors: Travis Morris

Abstract:

The purpose of this study is to measure the intersection of environmental security entities in terrorist propaganda. To the best of author’s knowledge, this is the first study of its kind to examine this intersection within terrorist propaganda. Rosoka, natural language processing software and frame analysis are used to advance our understanding of how environmental frames function as emotive triggers. Violent jihadi demagogues use frames to suggest violent and non-violent solutions to their grievances. Emotive triggers are framed in a way to leverage individual and collective attitudes in psychological warfare. A comparative research design is used because of the differences and similarities that exist between two variants of violent jihadi propaganda that target western audiences. Analysis is based on salience and network text analysis, which generates violent jihadi semantic networks. Findings indicate that environmental frames are used as emotive triggers across both data sets, but also as tactical and information data points. A significant finding is that certain core environmental emotive triggers like “water,” “soil,” and “trees” are significantly salient at the aggregate level across both data sets. All environmental entities can be classified into two categories, symbolic and literal. Importantly, this research illustrates how demagogues use environmental emotive triggers in cyber space from a subcultural perspective to mobilize target audiences to their ideology and praxis. Understanding the anatomy of propaganda construction is necessary in order to generate effective counter narratives in information operations. This research advances an additional method to inform practitioners and policy makers of how environmental security and propaganda intersect.

Keywords: propaganda analysis, emotive triggers environmental security, frames

Procedia PDF Downloads 114
25941 Natural Language Processing; the Future of Clinical Record Management

Authors: Khaled M. Alhawiti

Abstract:

This paper investigates the future of medicine and the use of Natural language processing. The importance of having correct clinical information available online is remarkable; improving patient care at affordable costs could be achieved using automated applications to use the online clinical information. The major challenge towards the retrieval of such vital information is to have it appropriately coded. Majority of the online patient reports are not found to be coded and not accessible as its recorded in natural language text. The use of Natural Language processing provides a feasible solution by retrieving and organizing clinical information, available in text and transforming clinical data that is available for use. Systems used in NLP are rather complex to construct, as they entail considerable knowledge, however significant development has been made. Newly formed NLP systems have been tested and have established performance that is promising and considered as practical clinical applications.

Keywords: clinical information, information retrieval, natural language processing, automated applications

Procedia PDF Downloads 378
25940 Increased Reaction and Movement Times When Text Messaging during Simulated Driving

Authors: Adriana M. Duquette, Derek P. Bornath

Abstract:

Reaction Time (RT) and Movement Time (MT) are important components of everyday life that have an effect on the way in which we move about our environment. These measures become even more crucial when an event can be caused (or avoided) in a fraction of a second, such as the RT and MT required while driving. The purpose of this study was to develop a more simple method of testing RT and MT during simulated driving with or without text messaging, in a university-aged population (n = 170). In the control condition, a randomly-delayed red light stimulus flashed on a computer interface after the participant began pressing the ‘gas’ pedal on a foot switch mat. Simple RT was defined as the time between the presentation of the light stimulus and the initiation of lifting the foot from the switch mat ‘gas’ pedal; while MT was defined as the time after the initiation of lifting the foot, to the initiation of depressing the switch mat ‘brake’ pedal. In the texting condition, upon pressing the ‘gas’ pedal, a ‘text message’ appeared on the computer interface in a dialog box that the participant typed on their cell phone while waiting for the light stimulus to turn red. In both conditions, the sequence was repeated 10 times, and an average RT (seconds) and average MT (seconds) were recorded. Condition significantly (p = .000) impacted overall RTs, as the texting condition (0.47 s) took longer than the no-texting (control) condition (0.34 s). Longer MTs were also recorded during the texting condition (0.28 s) than in the control condition (0.23 s), p = .001. Overall increases in Response Time (RT + MT) of 189 ms during the texting condition would equate to an additional 4.2 meters (to react to the stimulus and begin braking) if the participant had been driving an automobile at 80 km per hour. In conclusion, increasing task complexity due to the dual-task demand of text messaging during simulated driving caused significant increases in RT (41%), MT (23%) and Response Time (34%), thus further strengthening the mounting evidence against text messaging while driving.

Keywords: simulated driving, text messaging, reaction time, movement time

Procedia PDF Downloads 497
25939 A Method for Clinical Concept Extraction from Medical Text

Authors: Moshe Wasserblat, Jonathan Mamou, Oren Pereg

Abstract:

Natural Language Processing (NLP) has made a major leap in the last few years, in practical integration into medical solutions; for example, extracting clinical concepts from medical texts such as medical condition, medication, treatment, and symptoms. However, training and deploying those models in real environments still demands a large amount of annotated data and NLP/Machine Learning (ML) expertise, which makes this process costly and time-consuming. We present a practical and efficient method for clinical concept extraction that does not require costly labeled data nor ML expertise. The method includes three steps: Step 1- the user injects a large in-domain text corpus (e.g., PubMed). Then, the system builds a contextual model containing vector representations of concepts in the corpus, in an unsupervised manner (e.g., Phrase2Vec). Step 2- the user provides a seed set of terms representing a specific medical concept (e.g., for the concept of the symptoms, the user may provide: ‘dry mouth,’ ‘itchy skin,’ and ‘blurred vision’). Then, the system matches the seed set against the contextual model and extracts the most semantically similar terms (e.g., additional symptoms). The result is a complete set of terms related to the medical concept. Step 3 –in production, there is a need to extract medical concepts from the unseen medical text. The system extracts key-phrases from the new text, then matches them against the complete set of terms from step 2, and the most semantically similar will be annotated with the same medical concept category. As an example, the seed symptom concepts would result in the following annotation: “The patient complaints on fatigue [symptom], dry skin [symptom], and Weight loss [symptom], which can be an early sign for Diabetes.” Our evaluations show promising results for extracting concepts from medical corpora. The method allows medical analysts to easily and efficiently build taxonomies (in step 2) representing their domain-specific concepts, and automatically annotate a large number of texts (in step 3) for classification/summarization of medical reports.

Keywords: clinical concepts, concept expansion, medical records annotation, medical records summarization

Procedia PDF Downloads 106
25938 Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier

Authors: Saurabh Farkya, Govinda Surampudi

Abstract:

Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique.

Keywords: Character Recognition (OCR), Text to Speech (TTS), Support Vector Machines (SVM), Library of Support Vector Machines (LIBSVM)

Procedia PDF Downloads 468
25937 Systemic Functional Grammar Analysis of Barack Obama's Second Term Inaugural Speech

Authors: Sadiq Aminu, Ahmed Lamido

Abstract:

This research studies Barack Obama’s second inaugural speech using Halliday’s Systemic Functional Grammar (SFG). SFG is a text grammar which describes how language is used, so that the meaning of the text can be better understood. The primary source of data in this research work is Barack Obama’s second inaugural speech which was obtained from the internet. The analysis of the speech was based on the ideational and textual metafunctions of Systemic Functional Grammar. Specifically, the researcher analyses the Process Types and Participants (ideational) and the Theme/Rheme (textual). It was found that material process (process of doing) was the most frequently used ‘Process type’ and ‘We’ which refers to the people of America was the frequently used ‘Theme’. Application of the SFG theory, therefore, gives a better meaning to Barack Obama’s speech.

Keywords: ideational, metafunction, rheme, textual, theme

Procedia PDF Downloads 126
25936 A Study on Information Structure in the Vajrachedika-Prajna-paramita Sutra and Translation Aspect

Authors: Yoon-Cheol Park

Abstract:

This research focuses on examining the information structures in the old Chinese character-Korean translation of the Vajrachedika-prajna-paramita sutra. The background of this research comes from the fact that there were no previous researches which looked into the information structures in the target text of the Vajrachedika-prajna-paramita sutra by now. The existing researches on the Buddhist scripture translation mainly put weight on message conveyance by literal and semantic translation methods. But the message conveyance from one language to another has a necessity to be delivered with equivalent information structure. Thus, this research is intended to investigate on the flow of old and new information in the target text of Buddhist scripture, compared with source text. The Vajrachedika-prajna-paramita sutra unlike other Buddhist scriptures is composed of conversational structures between Buddha and his disciple, Suboli. This implies that the information flow can be changed by utterance context and some propositions. So, this research tries to analyze the flow of old and new information within the source and target text. As a result of analysis, this research can discover the following facts; firstly, there are the differences of the information flow in the message conveyance between the old Chinese character and Korean by language features. The old Chinese character reveals that old-new information flow is developed, while Korean indicates new-old information flow because of word order. Secondly, the source text of the Vajrachedika-prajna-paramita sutra includes abstruse terminologies, jargon and abstract words. These make influence on the target text and cause the change of the information flow. But the repetitive expressions of these words provide the old information in the target text. Lastly, the Vajrachedika-prajna-paramita sutra offers the expository structure from conversations between Buddha and Suboli. It means that the information flow is developed in the way of explaining specific subjects and of paraphrasing unfamiliar phrases and expressions. From the results of analysis above, this research can verify that the information structures in the target text of the Vajrachedika-prajna-paramita sutra are changed by specific subjects and terminologies, developed with the new-old information flow by repetitive expressions or word order and reveal the information structures familiar to target culture. It also implies that the translation of the Vajrachedika-prajna-paramita sutra as a religious book needs the message conveyance to take into account the information structures of two languages.

Keywords: abstruse terminologies, the information structure, new and old information, old Chinese character-Korean translation

Procedia PDF Downloads 345
25935 Contextual Distribution for Textual Alignment

Authors: Yuri Bizzoni, Marianne Reboul

Abstract:

Our program compares French and Italian translations of Homer’s Odyssey, from the XVIth to the XXth century. We focus on the third point, showing how distributional semantics systems can be used both to improve alignment between different French translations as well as between the Greek text and a French translation. Although we focus on French examples, the techniques we display are completely language independent.

Keywords: classical receptions, computational linguistics, distributional semantics, Homeric poems, machine translation, translation studies, text alignment

Procedia PDF Downloads 404