Search results for: qualitative text analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30159

Search results for: qualitative text analysis

30069 Adaptation of Projection Profile Algorithm for Skewed Handwritten Text Line Detection

Authors: Kayode A. Olaniyi, Tola. M. Osifeko, Adeola A. Ogunleye

Abstract:

Text line segmentation is an important step in document image processing. It represents a labeling process that assigns the same label using distance metric probability to spatially aligned units. Text line detection techniques have successfully been implemented mainly in printed documents. However, processing of the handwritten texts especially unconstrained documents has remained a key problem. This is because the unconstrained hand-written text lines are often not uniformly skewed. The spaces between text lines may not be obvious, complicated by the nature of handwriting and, overlapping ascenders and/or descenders of some characters. Hence, text lines detection and segmentation represents a leading challenge in handwritten document image processing. Text line detection methods that rely on the traditional global projection profile of the text document cannot efficiently confront with the problem of variable skew angles between different text lines. Hence, the formulation of a horizontal line as a separator is often not efficient. This paper presents a technique to segment a handwritten document into distinct lines of text. The proposed algorithm starts, by partitioning the initial text image into columns, across its width into chunks of about 5% each. At each vertical strip of 5%, the histogram of horizontal runs is projected. We have worked with the assumption that text appearing in a single strip is almost parallel to each other. The algorithm developed provides a sliding window through the first vertical strip on the left side of the page. It runs through to identify the new minimum corresponding to a valley in the projection profile. Each valley would represent the starting point of the orientation line and the ending point is the minimum point on the projection profile of the next vertical strip. The derived text-lines traverse around any obstructing handwritten vertical strips of connected component by associating it to either the line above or below. A decision of associating such connected component is made by the probability obtained from a distance metric decision. The technique outperforms the global projection profile for text line segmentation and it is robust to handle skewed documents and those with lines running into each other.

Keywords: connected-component, projection-profile, segmentation, text-line

Procedia PDF Downloads 106
30068 An Eco-Translatology Approach to the Translation of Spanish Tourism Advertising in Digital Communication in Chinese

Authors: Mingshu Liu, Laura Santamaria, Xavier Carmaniu Mainadé

Abstract:

As one of the sectors most affected by the COVID-19 pandemic, tourism is facing challenges in revitalizing the industry. But at the same time, it would be a good opportunity to take advantage of digital communication as an effective tool for tourism promotion. Our proposal aims to verify the linguistic operations on online platforms in China. The research is carried out based on the theory of Eco-traductology put forward by Gengshen Hu, whose contribution focuses on the translator's adaptation to the ecosystem environment and the three elaborated parameters (linguistic, cultural and communicative). We also relate it to Even-Zohar's and Toury's theoretical postulates on the Polysystem to elaborate on interdisciplinary methodology. Such a methodology allows us to analyze personal treatments and phraseology in the target text. As for the corpus, we adopt the official Spanish-language website of Turismo de España as the source text and the postings on the two major social networks in China, Weibo and Wechat, in 2019. Through qualitative analysis, we conclude that, in the tourism advertising campaign on Chinese social networks, chengyu (Chinese phraseology) and honorific titles are used very frequently.

Keywords: digital communication, eco-traductology, polysystem theory, tourism advertising

Procedia PDF Downloads 212
30067 Developed Text-Independent Speaker Verification System

Authors: Mohammed Arif, Abdessalam Kifouche

Abstract:

Speech is a very convenient way of communication between people and machines. It conveys information about the identity of the talker. Since speaker recognition technology is increasingly securing our everyday lives, the objective of this paper is to develop two automatic text-independent speaker verification systems (TI SV) using low-level spectral features and machine learning methods. (i) The first system is based on a support vector machine (SVM), which was widely used in voice signal processing with the aim of speaker recognition involving verifying the identity of the speaker based on its voice characteristics, and (ii) the second is based on Gaussian Mixture Model (GMM) and Universal Background Model (UBM) to combine different functions from different resources to implement the SVM based.

Keywords: speaker verification, text-independent, support vector machine, Gaussian mixture model, cepstral analysis

Procedia PDF Downloads 30
30066 From Text to Data: Sentiment Analysis of Presidential Election Political Forums

Authors: Sergio V Davalos, Alison L. Watkins

Abstract:

User generated content (UGC) such as website post has data associated with it: time of the post, gender, location, type of device, and number of words. The text entered in user generated content (UGC) can provide a valuable dimension for analysis. In this research, each user post is treated as a collection of terms (words). In addition to the number of words per post, the frequency of each term is determined by post and by the sum of occurrences in all posts. This research focuses on one specific aspect of UGC: sentiment. Sentiment analysis (SA) was applied to the content (user posts) of two sets of political forums related to the US presidential elections for 2012 and 2016. Sentiment analysis results in deriving data from the text. This enables the subsequent application of data analytic methods. The SASA (SAIL/SAI Sentiment Analyzer) model was used for sentiment analysis. The application of SASA resulted with a sentiment score for each post. Based on the sentiment scores for the posts there are significant differences between the content and sentiment of the two sets for the 2012 and 2016 presidential election forums. In the 2012 forums, 38% of the forums started with positive sentiment and 16% with negative sentiment. In the 2016 forums, 29% started with positive sentiment and 15% with negative sentiment. There also were changes in sentiment over time. For both elections as the election got closer, the cumulative sentiment score became negative. The candidate who won each election was in the more posts than the losing candidates. In the case of Trump, there were more negative posts than Clinton’s highest number of posts which were positive. KNIME topic modeling was used to derive topics from the posts. There were also changes in topics and keyword emphasis over time. Initially, the political parties were the most referenced and as the election got closer the emphasis changed to the candidates. The performance of the SASA method proved to predict sentiment better than four other methods in Sentibench. The research resulted in deriving sentiment data from text. In combination with other data, the sentiment data provided insight and discovery about user sentiment in the US presidential elections for 2012 and 2016.

Keywords: sentiment analysis, text mining, user generated content, US presidential elections

Procedia PDF Downloads 167
30065 We Wonder If They Mind: An Empirical Inquiry into the Narratological Function of Mind Wandering in Readers of Literary Texts

Authors: Tina Ternes, Florian Kleinau

Abstract:

The study investigates the content and triggers of mind wandering (MW) in readers of fictional texts. It asks whether readers’ MW is productive (text-related) or unproductive (text-unrelated). Methodologically, it bridges the gap between narratological and data-driven approaches by utilizing a sentence-by-sentence self-paced reading paradigm combined with thought probes in the reading of an excerpt of A. L. Kennedy’s “Baby Blue”. Results show that the contents of MW can be linked to text properties. We validated the role of self-reference in MW and found prediction errors to be triggers of MW. Results also indicate that the content of MW often travels along the lines of the text at hand and can thus be viewed as productive and integral to interpretation.

Keywords: narratology, mind wandering, reading fiction, meta cognition

Procedia PDF Downloads 64
30064 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 302
30063 Implementation of Lean Production in Business Enterprises: A Literature-Based Content Analysis of Implementation Procedures

Authors: P. Pötters, A. Marquet, B. Leyendecker

Abstract:

The objective of this paper is to investigate different implementation approaches for the implementation of Lean production in companies. Furthermore, a structured overview of those different approaches is to be made. Therefore, the present work is intended to answer the following research question: What differences and similarities exist between the various systematic approaches and phase models for the implementation of Lean Production? To present various approaches for the implementation of Lean Production discussed in the literature, a qualitative content analysis was conducted. Within the framework of a qualitative survey, a selection of texts dealing with lean production and its introduction was examined. The analysis presents different implementation approaches from the literature, covering the descriptive aspect of the study. The study also provides insights into similarities and differences among the implementation approaches, which are drawn from the analysis of latent text contents and author interpretations. In this study, the focus is on identifying differences and similarities among systemic approaches for implementing Lean Production. The research question takes into account the main object of consideration, objectives pursued, starting point, procedure, and endpoint of the implementation approach. The study defines the concept of Lean Production and presents various approaches described in literature that companies can use to implement Lean Production successfully. The study distinguishes between five systemic implementation approaches and seven phase models to help companies choose the most suitable approach for their implementation project. The findings of this study can contribute to enhancing transparency regarding the existing approaches for implementing Lean Production. This can enable companies to compare and contrast the available implementation approaches and choose the most suitable one for their specific project.

Keywords: implementation, lean production, phase models, systematic approaches

Procedia PDF Downloads 78
30062 News Publication on Facebook: Emotional Analysis of Hooks

Authors: Gemma Garcia Lopez

Abstract:

The goal of this study is to perform an emotional analysis of the hooks used in Facebook by three of the most important daily newspapers in the USA. These hook texts are used to get the user's attention and invite him to read the news and linked contents. Thanks to the emotional analysis in text, made with the tool of IBM, Tone Analyzer, we discovered that more than 30% of the hooks can be classified emotionally as joy, sadness, anger or fear. This study gathered the publications made by The New York Times, USA Today and The Washington Post during a random day. The results show that the choice of words by the journalist, can expose the reader to different emotions before clicking on the content. In the three cases analyzed, the absence of emotions in some cases, and the presence of emotions in text in others, appear in very similar percentages. Therefore, beyond the objectivity and veracity of the content, a new factor could come into play: the emotional influence on the reader as a mediatic manipulation tool.

Keywords: emotional analysis of newspapers hooks, emotions on Facebook, newspaper hooks on Facebook, news publication on Facebook

Procedia PDF Downloads 142
30061 Move Analysis of Death Row Statements: An Explanatory Study Applied to Death Row Statements in Texas Department of Criminal Justice Website

Authors: Giya Erina

Abstract:

Linguists have analyzed the rhetorical structure of various forensic genres, but only a few have investigated the complete structure of death row statements. Unlike other forensic text types, such as suicide or ransom notes, the focus of death row statement analysis is not the authenticity or falsity of the text, but its intended meaning and its communicative purpose. As it constitutes their last statement before their execution, there are probably many things that inmates would like to express. This study mainly examines the rhetorical moves of 200 death row statements from the Texas Department of Criminal Justice website using rhetorical move analysis. The rhetorical moves identified in the statements will be classified based on their communicative purpose, and they will be grouped into moves and steps. A move structure will finally be suggested from the most common or characteristic moves and steps, as well as some sub-moves. However, because of some statements’ atypicality, some moves may appear in different parts of the texts or not at all.

Keywords: Death row statements, forensic linguistics, genre analysis, move analysis

Procedia PDF Downloads 284
30060 Unraveling the Threads of Madness: Henry Russell’s 'The Maniac' as an Advocate for Deinstitutionalization in the Nineteenth Century

Authors: T. J. Laws-Nicola

Abstract:

Henry Russell was best known as a composer of more than 300 songs. Many of his compositions were popular for both their sentimental texts, as in ‘The Old Armchair,’ and those of a more political nature, such as ‘Woodsman, Spare That Tree!’ Indeed, Russell had written such songs of advocacy as those associated with abolitionism (‘The Slave Ship’) and environmentalism (‘Woodsman, Spare that Tree!’). ‘The Maniac’ is his only composition addressing the issue of institutionalization. The text is borrowed and adapted from the monodrama The Captive by M.G. ‘Monk’ Lewis. Through an analysis of form, harmony, melody, text, and thematic development and interactions between text and music we can approach a clearer understanding of ‘The Maniac’ and how the text and music interact. Select periodicals, such as The London Times, provide contemporary critical review for ‘The Maniac.’ Additional nineteenth century songs whose texts focus on madness and/or institutionalization will assist in building a stylistic and cultural context for ‘The Maniac.’ Through comparative analyses of ‘The Maniac’ with a body of songs that focus on similar topics, we can approach a clear understanding of the song as a vehicle for deinstitutionalization.

Keywords: 19th century song, institutionalization, M. G. Lewis, Henry Russell

Procedia PDF Downloads 514
30059 Indecisiveness in 'The Road Not Taken' by Robert Frost: An Expressive Critical Analysis

Authors: Kurt S. Candilas

Abstract:

This expressive critical study is an effort to bring in light new interpretation of Robert Frost poem 'The Road Not Taken' as a reflection of his indecisiveness in life. Specifically, it aims at examining Frost’s inner being, emphasizing his own self and experiences in the poem or text. The study employs the qualitative research design which made use of discourse analysis using the critical theory of expressivism as the main guide. In acquiring the data of the study, the art of historiography is used such as autobiographical and/or biographical notes, sources documents, and web information. In executing the methods involved in this study, it is observed that the poem shows a naturalist implicatures, expressing Frost’s strong feelings and emotions being devoid of free will and a narrow bit of confusions and ambiguities with his indecisions in life.

Keywords: The Road Not Taken, expressivism, indecisiveness, naturalist implicatures

Procedia PDF Downloads 328
30058 A Study on Sentiment Analysis Using Various ML/NLP Models on Historical Data of Indian Leaders

Authors: Sarthak Deshpande, Akshay Patil, Pradip Pandhare, Nikhil Wankhede, Rushali Deshmukh

Abstract:

Among the highly significant duties for any language most effective is the sentiment analysis, which is also a key area of NLP, that recently made impressive strides. There are several models and datasets available for those tasks in popular and commonly used languages like English, Russian, and Spanish. While sentiment analysis research is performed extensively, however it is lagging behind for the regional languages having few resources such as Hindi, Marathi. Marathi is one of the languages that included in the Indian Constitution’s 8th schedule and is the third most widely spoken language in the country and primarily spoken in the Deccan region, which encompasses Maharashtra and Goa. There isn’t sufficient study on sentiment analysis methods based on Marathi text due to lack of available resources, information. Therefore, this project proposes the use of different ML/NLP models for the analysis of Marathi data from the comments below YouTube content, tweets or Instagram posts. We aim to achieve a short and precise analysis and summary of the related data using our dataset (Dates, names, root words) and lexicons to locate exact information.

Keywords: multilingual sentiment analysis, Marathi, natural language processing, text summarization, lexicon-based approaches

Procedia PDF Downloads 50
30057 Entropy in a Field of Emergence in an Aspect of Linguo-Culture

Authors: Nurvadi Albekov

Abstract:

Communicative situation is a basis, which designates potential models of ‘constructed forms’, a motivated basis of a text, for a text can be assumed as a product of the communicative situation. It is within the field of emergence the models of text, that can be potentially prognosticated in a certain communicative situation, are designated. Every text can be assumed as conceptual system structured on the base of certain communicative situation. However in the process of ‘structuring’ of a certain model of ‘conceptual system’ consciousness of a recipient is able act only within the border of the field of emergence for going out of this border indicates misunderstanding of the communicative situation. On the base of communicative situation we can witness the increment of meaning where the synergizing of the informative model of communication, formed by using of the invariant units of a language system, is a result of verbalization of the communicative situation. The potential of the models of a text, prognosticated within the field of emergence, also depends on the communicative situation. The conception ‘the field of emergence’ is interpreted as a unit of the language system, having poly-directed universal structure, implying the presence of the core, the center and the periphery, including different levels of means of a functioning system of language, both in terms of linguistic resources, and in terms of extra linguistic factors interaction of which results increment of a text. The conception ‘field of emergence’ is considered as the most promising in the analysis of texts: oral, written, printed and electronic. As a unit of the language system field of emergence has several properties that predict its use during the study of a text in different levels. This work is an attempt analysis of entropy in a text in the aspect of lingua-cultural code, prognosticated within the model of the field of emergence. The article describes the problem of entropy in the field of emergence, caused by influence of the extra-linguistic factors. The increasing of entropy is caused not only by the fact of intrusion of the language resources but by influence of the alien culture in a whole, and by appearance of non-typical for this very culture symbols in the field of emergence. The borrowing of alien lingua-cultural symbols into the lingua-culture of the author is a reason of increasing the entropy when constructing a text both in meaning and in structuring level. It is nothing but artificial formatting of lexical units that violate stylistic unity of a phrase. It is marked that one of the important characteristics descending the entropy in the field of emergence is a typical similarity of lexical and semantic resources of the different lingua-cultures in aspects of extra linguistic factors.

Keywords: communicative situation, field of emergence, lingua-culture, entropy

Procedia PDF Downloads 344
30056 Linguistic Analysis of Holy Scriptures: A Comparative Study of Islamic Jurisprudence and the Western Hermeneutical Tradition

Authors: Sana Ammad

Abstract:

The tradition of linguistic analysis in Islam and Christianity has developed independently of each other in lieu of the social developments specific to their historical context. However, recently increasing number of Muslim academics educated in the West have tried to apply the Western tradition of linguistic interpretation to the Qur’anic text while completely disregarding the Islamic linguistic tradition used and developed by the traditional scholars over the centuries. The aim of the paper is to outline the linguistic tools and methods used by the traditional Islamic scholars for the purpose of interpretating the Holy Qur’an and shed light on how they contribute towards a better understanding of the text compared to their Western counterparts. This paper carries out a descriptive-comparative study of the linguistic tools developed and perfected by the traditional scholars in Islam for the purpose of textual analysis of the Qur’an as they have been described in the authentic works of Usul Al Fiqh (Jurisprudence) and the principles of textual analysis employed by the Western hermeneutical tradition for the study of the Bible. First, it briefly outlines the independent historical development of the two traditions emphasizing the final normative shape that they have taken. Then it draws a comparison of the two traditions highlighting the similarities and the differences existing between them. In the end, the paper demonstrates the level of academic excellence achieved by the traditional linguistic scholars in their efforts to develop appropriate tools of textual interpretation and how these tools are more suitable for interpreting the Qur’an compared to the Western principles. Since the aim of interpreters of both the traditions is to try and attain an objective understanding of the Scriptures, the emphasis of the paper shall be to highlight how well the Islamic method of linguistic interpretation contributes to an objective understanding of the Qur’anic text. The paper concludes with the following findings: The Western hermeneutical tradition of linguistic analysis developed within the Western historical context. However, the Islamic method of linguistic analysis is much more highly developed and complex and serves better the purpose of objective understanding of the Holy text.

Keywords: Islamic jurisprudence, linguistic analysis, textual interpretation, western hermeneutics

Procedia PDF Downloads 311
30055 Academic Literacy: Semantic-Discursive Resource and the Relationship with the Constitution of Genre for the Development of Writing

Authors: Lucia Rottava

Abstract:

The present study focuses on academic literacy and addresses the impact of semantic-discursive resources on the constitution of genres that are produced in such context. The research considers the development of writing in the academic context in Portuguese. Researches that address academic literacy and the characteristics of the texts produced in this context are rare, mainly with focus on the development of writing, considering three variables: the constitution of the writer, the perception of the reader/interlocutor and the organization of the informational text flow. The research aims to map the semantic-discursive resources of the written register in texts of several genres and produced by students in the first semester of the undergraduate course in letters. The hypothesis raised is that writing in the academic environment is not a recurrent literacy practice for these learners and can be explained by the ontogenetic and phylogenetic nature of language development. Qualitative in nature, the present research has as empirical data texts produced in a half-yearly course of Reading and Textual Production; these data result from the proposition of four different writing proposals, in a total of 600 texts. The corpus is analyzed based on semantic-discursive resources, seeking to contemplate relevant aspects of language (grammar, discourse and social context) that reveal the choices made in the reader/writer interrelationship and the organizational flow of the text. Among the semantic-discursive resources, the analysis includes three resources, including (a) appraisal and negotiation to understand the attitudes negotiated (roles of the participants of the discourse and their relationship with the other); (b) ideation to explain the construction of the experience (activities performed and participants); and (c) periodicity to outline the flow of information in the organization of the text according to the genre it instantiates. The results indicate the organizational difficulties of the flow of the text information. Cartography contributes to the understanding of the way writers use language in an effort to present themselves, evaluate someone else’s work, and communicate with readers.

Keywords: academic writing, portuguese mother tongue, semantic-discursive resources, sistemic funcional linguistic

Procedia PDF Downloads 110
30054 A Study on Information Structure in the Vajrachedika-Prajna-paramita Sutra and Translation Aspect

Authors: Yoon-Cheol Park

Abstract:

This research focuses on examining the information structures in the old Chinese character-Korean translation of the Vajrachedika-prajna-paramita sutra. The background of this research comes from the fact that there were no previous researches which looked into the information structures in the target text of the Vajrachedika-prajna-paramita sutra by now. The existing researches on the Buddhist scripture translation mainly put weight on message conveyance by literal and semantic translation methods. But the message conveyance from one language to another has a necessity to be delivered with equivalent information structure. Thus, this research is intended to investigate on the flow of old and new information in the target text of Buddhist scripture, compared with source text. The Vajrachedika-prajna-paramita sutra unlike other Buddhist scriptures is composed of conversational structures between Buddha and his disciple, Suboli. This implies that the information flow can be changed by utterance context and some propositions. So, this research tries to analyze the flow of old and new information within the source and target text. As a result of analysis, this research can discover the following facts; firstly, there are the differences of the information flow in the message conveyance between the old Chinese character and Korean by language features. The old Chinese character reveals that old-new information flow is developed, while Korean indicates new-old information flow because of word order. Secondly, the source text of the Vajrachedika-prajna-paramita sutra includes abstruse terminologies, jargon and abstract words. These make influence on the target text and cause the change of the information flow. But the repetitive expressions of these words provide the old information in the target text. Lastly, the Vajrachedika-prajna-paramita sutra offers the expository structure from conversations between Buddha and Suboli. It means that the information flow is developed in the way of explaining specific subjects and of paraphrasing unfamiliar phrases and expressions. From the results of analysis above, this research can verify that the information structures in the target text of the Vajrachedika-prajna-paramita sutra are changed by specific subjects and terminologies, developed with the new-old information flow by repetitive expressions or word order and reveal the information structures familiar to target culture. It also implies that the translation of the Vajrachedika-prajna-paramita sutra as a religious book needs the message conveyance to take into account the information structures of two languages.

Keywords: abstruse terminologies, the information structure, new and old information, old Chinese character-Korean translation

Procedia PDF Downloads 353
30053 Text Based Shuffling Algorithm on Graphics Processing Unit for Digital Watermarking

Authors: Zayar Phyo, Ei Chaw Htoon

Abstract:

In a New-LSB based Steganography method, the Fisher-Yates algorithm is used to permute an existing array randomly. However, that algorithm performance became slower and occurred memory overflow problem while processing the large dimension of images. Therefore, the Text-Based Shuffling algorithm aimed to select only necessary pixels as hiding characters at the specific position of an image according to the length of the input text. In this paper, the enhanced text-based shuffling algorithm is presented with the powered of GPU to improve more excellent performance. The proposed algorithm employs the OpenCL Aparapi framework, along with XORShift Kernel including the Pseudo-Random Number Generator (PRNG) Kernel. PRNG is applied to produce random numbers inside the kernel of OpenCL. The experiment of the proposed algorithm is carried out by practicing GPU that it can perform faster-processing speed and better efficiency without getting the disruption of unnecessary operating system tasks.

Keywords: LSB based steganography, Fisher-Yates algorithm, text-based shuffling algorithm, OpenCL, XORShiftKernel

Procedia PDF Downloads 131
30052 Text Localization in Fixed-Layout Documents Using Convolutional Networks in a Coarse-to-Fine Manner

Authors: Beier Zhu, Rui Zhang, Qi Song

Abstract:

Text contained within fixed-layout documents can be of great semantic value and so requires a high localization accuracy, such as ID cards, invoices, cheques, and passports. Recently, algorithms based on deep convolutional networks achieve high performance on text detection tasks. However, for text localization in fixed-layout documents, such algorithms detect word bounding boxes individually, which ignores the layout information. This paper presents a novel architecture built on convolutional neural networks (CNNs). A global text localization network and a regional bounding-box regression network are introduced to tackle the problem in a coarse-to-fine manner. The text localization network simultaneously locates word bounding points, which takes the layout information into account. The bounding-box regression network inputs the features pooled from arbitrarily sized RoIs and refine the localizations. These two networks share their convolutional features and are trained jointly. A typical type of fixed-layout documents: ID cards, is selected to evaluate the effectiveness of the proposed system. These networks are trained on data cropped from nature scene images, and synthetic data produced by a synthetic text generation engine. Experiments show that our approach locates high accuracy word bounding boxes and achieves state-of-the-art performance.

Keywords: bounding box regression, convolutional networks, fixed-layout documents, text localization

Procedia PDF Downloads 177
30051 Emotions in Health Tweets: Analysis of American Government Official Accounts

Authors: García López

Abstract:

The Government Departments of Health have the task of informing and educating citizens about public health issues. For this, they use channels like Twitter, key in the search for health information and the propagation of content. The tweets, important in the virality of the content, may contain emotions that influence the contagion and exchange of knowledge. The goal of this study is to perform an analysis of the emotional projection of health information shared on Twitter by official American accounts: the disease control account CDCgov, National Institutes of Health, NIH, the government agency HHSGov, and the professional organization PublicHealth. For this, we used Tone Analyzer, an International Business Machines Corporation (IBM) tool specialized in emotion detection in text, corresponding to the categorical model of emotion representation. For 15 days, all tweets from these accounts were analyzed with the emotional analysis tool in text. The results showed that their tweets contain an important emotional load, a determining factor in the success of their communications. This exposes that official accounts also use subjective language and contain emotions. The predominance of emotion joy over sadness and the strong presence of emotions in their tweets stimulate the virality of content, a key in the work of informing that government health departments have.

Keywords: emotions in tweets, emotion detection in the text, health information on Twitter, American health official accounts, emotions on Twitter, emotions and content

Procedia PDF Downloads 120
30050 Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models (HMMs)

Authors: Rabi Mouhcine, Amrouch Mustapha, Mahani Zouhir, Mammass Driss

Abstract:

In this paper, we present a system for offline recognition cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The system is analytical without explicit segmentation used embedded training to perform and enhance the character models. Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models and trained by embedded training. The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.

Keywords: recognition, handwriting, Arabic text, HMMs, embedded training

Procedia PDF Downloads 335
30049 Enhance the Power of Sentiment Analysis

Authors: Yu Zhang, Pedro Desouza

Abstract:

Since big data has become substantially more accessible and manageable due to the development of powerful tools for dealing with unstructured data, people are eager to mine information from social media resources that could not be handled in the past. Sentiment analysis, as a novel branch of text mining, has in the last decade become increasingly important in marketing analysis, customer risk prediction and other fields. Scientists and researchers have undertaken significant work in creating and improving their sentiment models. In this paper, we present a concept of selecting appropriate classifiers based on the features and qualities of data sources by comparing the performances of five classifiers with three popular social media data sources: Twitter, Amazon Customer Reviews, and Movie Reviews. We introduced a couple of innovative models that outperform traditional sentiment classifiers for these data sources, and provide insights on how to further improve the predictive power of sentiment analysis. The modelling and testing work was done in R and Greenplum in-database analytic tools.

Keywords: sentiment analysis, social media, Twitter, Amazon, data mining, machine learning, text mining

Procedia PDF Downloads 336
30048 Evaluation Means in English and Russian Academic Discourse: Through Comparative Analysis towards Translation

Authors: Albina Vodyanitskaya

Abstract:

Given the culture- and language-specific nature of evaluation, this phenomenon is widely studied around the linguistic world and may be regarded as a challenge for translators. Evaluation penetrates all the levels of a scientific text, influences its composition and the reader’s attitude towards the information presented. One of the most challenging and rarely studied phenomena is the individual style of the scientific writer, which is mostly reflected in the use of evaluative language means. The evaluative and expressive potential of a scientific text is becoming more and more welcoming area for researchers, which stems in the shift towards anthropocentric paradigm in linguistics. Other reasons include: the cognitive and psycholinguistic processes that accompany knowledge acquisition, a genre-determined nature of a scientific text, the increasing public concern about the quality of scientific papers and some such. One more important issue, is the fact that linguists all over the world still argue about the definition of evaluation and its functions in the text. The author analyzes various approaches towards the study of evaluation and scientific texts. A comparative analysis of English and Russian dissertations and other scientific papers with regard to evaluative language means reveals major differences and similarities between English and Russian scientific style. Though standardized and genre-specific, English scientific texts contain more figurative and expressive evaluative means than the Russian ones, which should be taken into account while translating scientific papers. The processes that evaluation undergoes while being expressed by means of a target language are also analyzed. The author offers a target-language-dependent strategy for the translation of evaluation in English and Russian scientific texts. The findings may contribute to the theory and practice of translation and can increase scientific writers’ awareness of inter-language and intercultural differences in evaluative language means.

Keywords: academic discourse, evaluation, scientific text, scientific writing, translation

Procedia PDF Downloads 337
30047 Developing a Model of Teaching Writing Based On Reading Approach through Reflection Strategy for EFL Students of STKIP YPUP

Authors: Eny Syatriana, Ardiansyah

Abstract:

The purpose of recent study was to develop a learning model on writing, based on the reading texts which will be read by the students using reflection strategy. The strategy would allow the students to read the text and then they would write back the main idea and to develop the text by using their own sentences. So, the writing practice was begun by reading an interesting text, then the students would develop the text which has been read into their writing. The problem questions are (1) what kind of learning model that can develop the students writing ability? (2) what is the achievement of the students of STKIP YPUP through reflection strategy? (3) is the using of the strategy effective to develop students competence In writing? (4) in what level are the students interest toward the using of a strategy In writing subject? This development research consisted of some steps, they are (1) need analysis (2) model design (3) implementation (4) model evaluation. The need analysis was applied through discussion among the writing lecturers to create a learning model for writing subject. To see the effectiveness of the model, an experiment would be delivered for one class. The instrument and learning material would be validated by the experts. In every steps of material development, there was a learning process, where would be validated by an expert. The research used development design. These Principles and procedures or research design and development .This study, researcher would do need analysis, creating prototype, content validation, and limited empiric experiment to the sample. In each steps, there should be an assessment and revision to the drafts before continue to the next steps. The second year, the prototype would be tested empirically to four classes in STKIP YPUP for English department. Implementing the test greatly was done through the action research and followed by evaluation and validation from the experts.

Keywords: learning model, reflection, strategy, reading, writing, development

Procedia PDF Downloads 350
30046 Investigating Dynamic Transition Process of Issues Using Unstructured Text Analysis

Authors: Myungsu Lim, William Xiu Shun Wong, Yoonjin Hyun, Chen Liu, Seongi Choi, Dasom Kim, Namgyu Kim

Abstract:

The amount of real-time data generated through various mass media has been increasing rapidly. In this study, we had performed topic analysis by using the unstructured text data that is distributed through news article. As one of the most prevalent applications of topic analysis, the issue tracking technique investigates the changes of the social issues that identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has limitation that it cannot discover dynamic mutation process of complex social issues. The purpose of this study is to overcome the limitations of the existing issue tracking method. We first derived core issues of each period, and then discover the dynamic mutation process of various issues. In this study, we further analyze the mutation process from the perspective of the issues categories, in order to figure out the pattern of issue flow, including the frequency and reliability of the pattern. In other words, this study allows us to understand the components of the complex issues by tracking the dynamic history of issues. This methodology can facilitate a clearer understanding of complex social phenomena by providing mutation history and related category information of the phenomena.

Keywords: Data Mining, Issue Tracking, Text Mining, topic Analysis, topic Detection, Trend Detection

Procedia PDF Downloads 386
30045 Examining Reading Comprehension Skills Based on Different Reading Comprehension Frameworks and Taxonomies

Authors: Seval Kula-Kartal

Abstract:

Developing students’ reading comprehension skills is an aim that is difficult to accomplish and requires to follow long-term and systematic teaching and assessment processes. In these processes, teachers need tools to provide guidance to them on what reading comprehension is and which comprehension skills they should develop. Due to a lack of clear and evidence-based frameworks defining reading comprehension skills, especially in Turkiye, teachers and students mostly follow various processes in the classrooms without having an idea about what their comprehension goals are and what those goals mean. Since teachers and students do not have a clear view of comprehension targets, strengths, and weaknesses in students’ comprehension skills, the formative feedback processes cannot be managed in an effective way. It is believed that detecting and defining influential comprehension skills may provide guidance both to teachers and students during the feedback process. Therefore, in the current study, some of the reading comprehension frameworks that define comprehension skills operationally were examined. The aim of the study is to develop a simple and clear framework that can be used by teachers and students during their teaching, learning, assessment, and feedback processes. The current study is qualitative research in which documents related to reading comprehension skills were analyzed. Therefore, the study group consisted of recourses and frameworks which made big contributions to theoretical and operational definitions of reading comprehension. A content analysis was conducted on the resources included in the study group. To determine the validity of the themes and sub-categories revealed as the result of content analysis, three educational assessment experts were asked to examine the content analysis results. The Fleiss’ Cappa coefficient revealed that there is consistency among themes and categories defined by three different experts. The content analysis of the reading comprehension frameworks revealed that comprehension skills could be examined under four different themes. The first and second themes focus on understanding information given explicitly or implicitly within a text. The third theme includes skills used by the readers to make connections between their personal knowledge and the information given in the text. Lastly, the fourth theme focus on skills used by readers to examine the text with a critical view. The results suggested that fundamental reading comprehension skills can be examined under four themes. Teachers are recommended to use these themes in their reading comprehension teaching and assessment processes. Acknowledgment: This research is supported by Pamukkale University Scientific Research Unit within the project, whose title is Developing A Reading Comprehension Rubric.

Keywords: reading comprehension, assessing reading comprehension, comprehension taxonomies, educational assessment

Procedia PDF Downloads 70
30044 A Clustering Algorithm for Massive Texts

Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen

Abstract:

Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.

Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process

Procedia PDF Downloads 411
30043 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques

Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart

Abstract:

Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.

Keywords: machine learning, text classification, NLP techniques, semantic representation

Procedia PDF Downloads 78
30042 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 131
30041 Symmetric Key Encryption Algorithm Using Indian Traditional Musical Scale for Information Security

Authors: Aishwarya Talapuru, Sri Silpa Padmanabhuni, B. Jyoshna

Abstract:

Cryptography helps in preventing threats to information security by providing various algorithms. This study introduces a new symmetric key encryption algorithm for information security which is linked with the "raagas" which means Indian traditional scale and pattern of music notes. This algorithm takes the plain text as input and starts its encryption process. The algorithm then randomly selects a raaga from the list of raagas that is assumed to be present with both sender and the receiver. The plain text is associated with the thus selected raaga and an intermediate cipher-text is formed as the algorithm converts the plain text characters into other characters, depending upon the rules of the algorithm. This intermediate code or cipher text is arranged in various patterns in three different rounds of encryption performed. The total number of rounds in the algorithm is equal to the multiples of 3. To be more specific, the outcome or output of the sequence of first three rounds is again passed as the input to this sequence of rounds recursively, till the total number of rounds of encryption is performed. The raaga selected by the algorithm and the number of rounds performed will be specified at an arbitrary location in the key, in addition to important information regarding the rounds of encryption, embedded in the key which is known by the sender and interpreted only by the receiver, thereby making the algorithm hack proof. The key can be constructed of any number of bits without any restriction to the size. A software application is also developed to demonstrate this process of encryption, which dynamically takes the plain text as input and readily generates the cipher text as output. Therefore, this algorithm stands as one of the strongest tools for information security.

Keywords: cipher text, cryptography, plaintext, raaga

Procedia PDF Downloads 270
30040 Learning about the Strengths and Weaknesses of Urban Climate Action Plans

Authors: Prince Dacosta Aboagye, Ayyoob Sharifi

Abstract:

Cities respond to climate concerns mainly through their climate action plans (CAPs). A comprehensive content analysis of the dynamics in existing urban CAPs is not well represented in the literature. This literature void presents a difficulty in appreciating the strengths and weaknesses of urban CAPs. Here, we perform a qualitative content analysis (QCA) on CAPs from 278 cities worldwide and use text-mining tools to map and visualize the relevant data. Our analysis showed a decline in the number of CAPs developed and published following the global COVID-19 lockdown period. Evidently, megacities are leading the deep decarbonisation agenda. We also observed a transition from developing mainly mitigation-focused CAPs pre-COP21 to both mitigation and adaptation CAPs. A lack of inclusiveness in local climate planning was common among European and North American cities. The evidence is a catalyst for understanding the trends in existing urban CAPs to shape future urban climate planning.

Keywords: urban, climate action plans, strengths, weaknesses

Procedia PDF Downloads 79