Search results for: text representation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2463

Search results for: text representation

2373 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Sidi Yang, Haiyi Zhang

Abstract:

Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: text mining, Twitter, topic model, sentiment analysis

Procedia PDF Downloads 179
2372 Text Localization in Fixed-Layout Documents Using Convolutional Networks in a Coarse-to-Fine Manner

Authors: Beier Zhu, Rui Zhang, Qi Song

Abstract:

Text contained within fixed-layout documents can be of great semantic value and so requires a high localization accuracy, such as ID cards, invoices, cheques, and passports. Recently, algorithms based on deep convolutional networks achieve high performance on text detection tasks. However, for text localization in fixed-layout documents, such algorithms detect word bounding boxes individually, which ignores the layout information. This paper presents a novel architecture built on convolutional neural networks (CNNs). A global text localization network and a regional bounding-box regression network are introduced to tackle the problem in a coarse-to-fine manner. The text localization network simultaneously locates word bounding points, which takes the layout information into account. The bounding-box regression network inputs the features pooled from arbitrarily sized RoIs and refine the localizations. These two networks share their convolutional features and are trained jointly. A typical type of fixed-layout documents: ID cards, is selected to evaluate the effectiveness of the proposed system. These networks are trained on data cropped from nature scene images, and synthetic data produced by a synthetic text generation engine. Experiments show that our approach locates high accuracy word bounding boxes and achieves state-of-the-art performance.

Keywords: bounding box regression, convolutional networks, fixed-layout documents, text localization

Procedia PDF Downloads 194
2371 Second Representation of Modules over Commutative Rings

Authors: Jawad Abuhlail, Hamza Hroub

Abstract:

Let R be a commutative ring. Representation theory studies the representation of R-modules as (possibly finite) sums of special types of R-submodules. Here we are interested in a class of R-modules between the class of semisimple R-modules and the class of R-modules that can be written as (possibly finite) sums of secondary R-submodules (we know that every simple R-submodule is secondary). We investigate R-modules which can be written as (possibly finite) sums of second R-submodules (we call those modules second representable). Moreover, we investigate the class of (main) second attached prime ideals related to a module with such representation. We provide sufficient conditions for an R-module M to get a (minimal) second representation. We also found the collection of second attached prime ideals for some types of second representable R-modules, in particular within the class of injective R-modules. As we know that every simple R-submodule is second and every second R-submodule is secondary, we can see the importance of the second representable R-module.

Keywords: lifting modules, second attached prime ideals, second representations, secondary representations, semisimple modules, second submodules

Procedia PDF Downloads 192
2370 Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models (HMMs)

Authors: Rabi Mouhcine, Amrouch Mustapha, Mahani Zouhir, Mammass Driss

Abstract:

In this paper, we present a system for offline recognition cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The system is analytical without explicit segmentation used embedded training to perform and enhance the character models. Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models and trained by embedded training. The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.

Keywords: recognition, handwriting, Arabic text, HMMs, embedded training

Procedia PDF Downloads 354
2369 Me and My Selfie: Identity Building Through Self Representation in Social Media

Authors: Revytia Tanera

Abstract:

This research is a pilot study to examine the rise of selfie trend in dealing with individual self representation and identity building in social media. The symbolic interactionism theory is used as the concept of the desired self image, and Cooley’s looking glass-self concept is used to analyze the mechanical reflection of ourselves; how do people perform their “digital self” in social media. In-depth interviews were conducted in the study with a non-random sample who owns a smartphone with a front camera feature and are active in social media. This research is trying to find out whether the selfie trend brings any influence on identity building on each individual. Through analysis of interview results, it can be concluded that people take selfie photos in order to express themselves and to boost their confidence. This study suggests a follow up and more in depth analysis on identity and self representation from various age groups.

Keywords: self representation, selfie, social media, symbolic interaction, looking glass-self

Procedia PDF Downloads 297
2368 Poetics of the Connecting ha’: A Textual Study in the Poetry of Al-Husari Al-Qayrawani

Authors: Mahmoud al-Ashiriy

Abstract:

This paper begins from the idea that the real history of literature is the history of its style. And since the rhyme –as known- is not merely the last letter, that have received a lot of analysis and investigation, but it is a collection of other values in addition to its different markings. This paper will explore the work of the connecting ha’ and its effectiveness in shaping the text of poetry, since it establishes vocal rhythms in addition to its role in indicating references through the pronoun, vertically through the poem through the sequence of its verses, also horizontally through what environs the one verse of sentences. If the scientific formation of prosody stopped at the possibilities and prohibitions; literary criticism and poetry studies should explore what is above the rule of aesthetic horizon of poetic effectiveness that varies from a text to another, a poet to another, a literary period to another, or from a poetic taste to another. Then the paper will explore this poetic essence in the texts of the famous Andalusian Poet Al-Husari Al-Qayrawani through his well-known Daliyya (a poem that its verses end with the letter D), and the role of the connecting ha’ in fulfilling its text and the accomplishment of its poetics, departing from this to the diwan (the big collection of poems) also as a higher text that surpasses the text/poem, and through what it represents of effectiveness the work of the phenomenon in accomplishing the poetics of the poem of Al-Husari Al-Qayrawani who is one of the pillars of Arabic poetics in Andalusia.

Keywords: Al-Husari Al-Qayrawni, poetics, rhyme, stylistics, science of the text

Procedia PDF Downloads 573
2367 An Assessment of Female Representation in Philippine Cinema in Comparison to American Cinema (1975 to 2020)

Authors: Amanda Julia Binay, Patricia Elise Suarez

Abstract:

Female representation in media is an important subject in the discussion of gender equality, especially in impactful and influential media like film. As the Filipino film industry continues to grow and evolve, the need for analysis on Filipino female representation on screen is imperative. Additionally, there has been limited research made on female representation in the Philippine film scene. Thus, the paper aims to analyze the presence and evolution of female representation in Philippine cinema and compare the findings with that of American films to see how Filipino filmmakers hold their own against the standards of international movements that call for more and better female representation, especially in Hollywood. The participants selected were Filipino and American films released within the years 1975 to 2020 in five (5) year intervals. Twenty (20) critically acclaimed and highest-grossing Filipino films and twenty (20) critically acclaimed and highest-grossing Hollywood films were then subject to the Bechdel and Peirce tests to obtain statistical measures of their female representation. The findings of the study reveal that the presence of female representation in Philippine film history has been consistent and has continued to grow and evolve throughout the years, with strong female leads with vibrant characteristics and diverse stories. However, analysis of female representation regarding American films has shown an extreme lack thereof with more misogynistic, sexist, and limiting ideals. Thus, the study concludes that the state of female representation in Philippine cinema and film industry holds its own when compared to American cinema and film industry and even outperforms it in many aspects of female representation, such as consistent inclusion and depiction of multi-dimensional female leads and female relationships. Hence, the study implies that women’s consistent presence in Philippine cinema mirrors Filipino women’s prominent role in Philippine society and that American cinema must continue to make efforts to change their portrayals of female characters, leads, and relationships to make them more grounded in reality.

Keywords: female representation, gender studies, feminism, philippine cinema, American cinema, bechdel test, peirce test, comparative analysis

Procedia PDF Downloads 380
2366 A Clustering Algorithm for Massive Texts

Authors: Ming Liu, Chong Wu, Bingquan Liu, Lei Chen

Abstract:

Internet users have to face the massive amount of textual data every day. Organizing texts into categories can help users dig the useful information from large-scale text collection. Clustering, in fact, is one of the most promising tools for categorizing texts due to its unsupervised characteristic. Unfortunately, most of traditional clustering algorithms lose their high qualities on large-scale text collection. This situation mainly attributes to the high- dimensional vectors generated from texts. To effectively and efficiently cluster large-scale text collection, this paper proposes a vector reconstruction based clustering algorithm. Only the features that can represent the cluster are preserved in cluster’s representative vector. This algorithm alternately repeats two sub-processes until it converges. One process is partial tuning sub-process, where feature’s weight is fine-tuned by iterative process. To accelerate clustering velocity, an intersection based similarity measurement and its corresponding neuron adjustment function are proposed and implemented in this sub-process. The other process is overall tuning sub-process, where the features are reallocated among different clusters. In this sub-process, the features useless to represent the cluster are removed from cluster’s representative vector. Experimental results on the three text collections (including two small-scale and one large-scale text collections) demonstrate that our algorithm obtains high quality on both small-scale and large-scale text collections.

Keywords: vector reconstruction, large-scale text clustering, partial tuning sub-process, overall tuning sub-process

Procedia PDF Downloads 435
2365 Representation of Reality in Nigerian Poetry

Authors: Zainab Abdulkarim

Abstract:

Literature is the study of life, a source of knowledge. It involves the truth about many things in life. Most of these creative artistes most especially the poets are representatives of the voices of the people. These set of artistes have been the critics to all involved in the development of their nation. This paper will examine how Nigerian Poets goes further not just by writing but by showing the different ways the country has been convoluted. This paper intends to show the power and ability literature has in representation. The power is to represent the important values of life. There is no doubt that literature asserts truth. Through the various poems examined in this paper, Nigerian Poets have proved to portray the realities of the nation.

Keywords: literature, poets, reality, representation

Procedia PDF Downloads 314
2364 Image Transform Based on Integral Equation-Wavelet Approach

Authors: Yuan Yan Tang, Lina Yang, Hong Li

Abstract:

Harmonic model is a very important approximation for the image transform. The harmanic model converts an image into arbitrary shape; however, this mode cannot be described by any fixed functions in mathematics. In fact, it is represented by partial differential equation (PDE) with boundary conditions. Therefore, to develop an efficient method to solve such a PDE is extremely significant in the image transform. In this paper, a novel Integral Equation-Wavelet based method is presented, which consists of three steps: (1) The partial differential equation is converted into boundary integral equation and representation by an indirect method. (2) The boundary integral equation and representation are changed to plane integral equation and representation by boundary measure formula. (3) The plane integral equation and representation are then solved by a method we call wavelet collocation. Our approach has two main advantages, the shape of an image is arbitrary and the program code is independent of the boundary. The performance of our method is evaluated by numerical experiments.

Keywords: harmonic model, partial differential equation (PDE), integral equation, integral representation, boundary measure formula, wavelet collocation

Procedia PDF Downloads 558
2363 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 151
2362 Representation of the Kurdish Opposition: From Periphery to Center

Authors: Songul Miftakhov

Abstract:

This study explores political representation and engagement of Eastern and Southeastern Anatolia regions, known to have dense Kurdish population and referred further to as Eastern region, in the Turkish parliament between 1946 and 1980. Traditional local notables had most of the privileges to be represented given their connectedness with political parties. Traditional local notables integrated into right-wing parties considering political and economic aspects. At the same time, they kept control over local political involvement channels. As a result, political representation and presence were monopolized at central, local and civil society levels. One part of Kurdish intellectuals was marginalized from the parliament after addressing issues in Eastern Anatolia and trying to develop solutions apart from the mainstream. Some of them took part in Kurdish oppositional left wing in the 1960s and jounced power of settled notables in 1970s in local administrations or as independent members of the parliament.

Keywords: Kurdish representation, parliament, local nobles, Eastern and Southeastern Anatolia

Procedia PDF Downloads 153
2361 Symmetric Key Encryption Algorithm Using Indian Traditional Musical Scale for Information Security

Authors: Aishwarya Talapuru, Sri Silpa Padmanabhuni, B. Jyoshna

Abstract:

Cryptography helps in preventing threats to information security by providing various algorithms. This study introduces a new symmetric key encryption algorithm for information security which is linked with the "raagas" which means Indian traditional scale and pattern of music notes. This algorithm takes the plain text as input and starts its encryption process. The algorithm then randomly selects a raaga from the list of raagas that is assumed to be present with both sender and the receiver. The plain text is associated with the thus selected raaga and an intermediate cipher-text is formed as the algorithm converts the plain text characters into other characters, depending upon the rules of the algorithm. This intermediate code or cipher text is arranged in various patterns in three different rounds of encryption performed. The total number of rounds in the algorithm is equal to the multiples of 3. To be more specific, the outcome or output of the sequence of first three rounds is again passed as the input to this sequence of rounds recursively, till the total number of rounds of encryption is performed. The raaga selected by the algorithm and the number of rounds performed will be specified at an arbitrary location in the key, in addition to important information regarding the rounds of encryption, embedded in the key which is known by the sender and interpreted only by the receiver, thereby making the algorithm hack proof. The key can be constructed of any number of bits without any restriction to the size. A software application is also developed to demonstrate this process of encryption, which dynamically takes the plain text as input and readily generates the cipher text as output. Therefore, this algorithm stands as one of the strongest tools for information security.

Keywords: cipher text, cryptography, plaintext, raaga

Procedia PDF Downloads 289
2360 Residential Architecture and Its Representation in Movies: Bangkok's Spatial Research in the Study of Thai Cinematography

Authors: Janis Matvejs

Abstract:

Visual representation of a city creates unique perspectives that allow to interpret the urban environment and enable to understand a space that is culturally created and territorially organized. Residential complexes are an essential part of cities and cinema is a specific representation form of these areas. There has been very little research done on exploring how these areas are depicted in the Thai movies. The aim of this research is to interpret the discourse of residential areas of Bangkok throughout the 20th and 21st centuries and to examine essential changes in the residential structure. Specific cinematic formal techniques in relation to the urban image were used. The movie review results were compared with changes in Bangkok’s residential development. Movie analysis displayed that residential areas are frequently used in Thai cinematography and they make up an integral part of the urban visual perception.

Keywords: Bangkok, cinema, residential area, representation, visual perception

Procedia PDF Downloads 194
2359 The Effects of Watching Text-Relevant Video Segments with/without Subtitles on Vocabulary Development of Arabic as a Foreign Language Learners

Authors: Amirreza Karami, Hawraa Nafea Hameed Alzouwain, Freddie A. Bowles

Abstract:

This study investigates the effects of watching text-relevant video segments with/without subtitles on vocabulary development of Arabic as a Foreign Language (AFL) learners. The participants of the study were assigned to two groups: one control group and one experimental group. The control group received no video-based instruction while the experimental group watched a text-relevant video segment in three stages: pre, while, and post-instruction. The preliminary results of the pre-test and post-test show that watching text-relevant video segments through following a pre-while-post procedure can help the vocabulary development of AFL learners more than non-video-based instruction.

Keywords: text-relevant video segments, vocabulary development, Arabic as a Foreign Language, AFL, pre-while-post instruction

Procedia PDF Downloads 165
2358 The Representation of Female Characters by Women Directors in Surveillance Spaces in Turkish Cinema

Authors: Berceste Gülçin Özdemir

Abstract:

The representation of women characters in cinema has been discussed for centuries. In cinema where dominant narrative codes prevail and scopophilic views exist over women characters, passive stereotypes of women are observed in the representation of women characters. In films shot from a woman’s point of view in Turkish Cinema and even in the films outside the main stream in which the stories of women characters are told, the fact that women characters are discussed on the basis of feminist film theories triggers the question: ‘Are feminist films produced in Turkish Cinema?’ The spaces that are used in the representation of women characters are observed to be used as spaces that convert characters into passive subjects on the basis of the space factor in the narrative. The representation of women characters in the possible surveillance spaces integrates the characters and compresses them in these spaces. In this study, narrative analysis was used to investigate women characters representation in the surveillance spaces. For the study framework, firstly a case study films are selected, and in the second level, women characters representations in surveillance spaces are argued by narrative analysis using feminist film theories. Two questions are argued with feminist film theories: ‘Why do especially women directors represent their female characters to viewers by representing them in surveillance spaces?’ and ‘Can this type of presentation contribute to the feminist film practice and become important with regard to feminist film theories?’ The representation of women characters in a passive and observed way in surveillance spaces of the narrative reveals the questioning of also the discourses of films outside of the main stream. As films that produce alternative discourses and reveal different cinematic languages, those outside the main stream are expected to bring other points of view also to the representation of women characters in spaces. These questionings are selected as the baseline and Turkish films such as Watch Tower and Mustang, directed by women, were examined. This examination paves the way for discussions regarding the women characters in surveillance spaces. Outcomes can be argued from the viewpoint of representation in the genre by feminist film theories. In the context of feminist film theories and feminist film practice, alternatives should be found that can corporally reveal the existence of women in both the representation of women characters in spaces and in the usage of the space factor.

Keywords: feminist film theory, representation, space, women directors

Procedia PDF Downloads 287
2357 A Study of Various Ontology Learning Systems from Text and a Look into Future

Authors: Fatima Al-Aswadi, Chan Yong

Abstract:

With the large volume of unstructured data that increases day by day on the web, the motivation of representing the knowledge in this data in the machine processable form is increased. Ontology is one of the major cornerstones of representing the information in a more meaningful way on the semantic Web. The goal of Ontology learning from text is to elicit and represent domain knowledge in the machine readable form. This paper aims to give a follow-up review on the ontology learning systems from text and some of their defects. Furthermore, it discusses how far the ontology learning process will enhance in the future.

Keywords: concept discovery, deep learning, ontology learning, semantic relation, semantic web

Procedia PDF Downloads 521
2356 Principle Components Updates via Matrix Perturbations

Authors: Aiman Elragig, Hanan Dreiwi, Dung Ly, Idriss Elmabrook

Abstract:

This paper highlights a new approach to look at online principle components analysis (OPCA). Given a data matrix X R,^m x n we characterise the online updates of its covariance as a matrix perturbation problem. Up to the principle components, it turns out that online updates of the batch PCA can be captured by symmetric matrix perturbation of the batch covariance matrix. We have shown that as n→ n0 >> 1, the batch covariance and its update become almost similar. Finally, utilize our new setup of online updates to find a bound on the angle distance of the principle components of X and its update.

Keywords: online data updates, covariance matrix, online principle component analysis, matrix perturbation

Procedia PDF Downloads 195
2355 Teaching Pragmatic Coherence in Literary Text: Analysis of Chimamanda Adichie’s Americanah

Authors: Joy Aworo-Okoroh

Abstract:

Literary texts are mirrors of a real-life situation. Thus, authors choose the linguistic items that would best encode their intended meanings and messages. However, words mean more than they seem. The meaning of words is not static rather, it is dynamic as they constantly enter into relationships within a context. Literary texts can only be meaningful if all pragmatic cues are identified and interpreted. Drawing upon Teun Van Djik's theory of local pragmatic coherence, it is established that words enter into relations in a text and these relations account for sequential speech acts in the texts. Comprehension of the text is dependent on the interpretation of these relations.To show the relevance of pragmatic coherence in literary text analysis, ten conversations were selected in Americanah in order to give a clear idea of the pragmatic relations used. The conversations were analysed, identifying the speech act and epistemic relations inherent in them. A subtle analysis of the structure of the conversations was also carried out. It was discovered that justification is the most commonly used relation and the meaning of the text is dependent on the interpretation of these instances' pragmatic coherence. The study concludes that to effectively teach literature in English, pragmatic coherence should be incorporated as words mean more than they say.

Keywords: pragmatic coherence, epistemic coherence, speech act, Americanah

Procedia PDF Downloads 136
2354 The Impact of Text Modifications on Ethiopian Students’ Reading Comprehension and Motivation

Authors: Asefa Kenefergib, Dawit Amogne, Yinager Teklesellassie

Abstract:

A study investigated the effects of text modifications on reading comprehension and motivation among Ethiopian secondary school students. A total of 120 students participated, initially taking a reading comprehension pretest and completing a reading motivation questionnaire. Afterward, they were divided into three groups: control, simplified, and elaborated. Each group then took part in a reading comprehension posttest and another reading motivation questionnaire following an eight-week instructional intervention. Despite initial differences, both the simplified and elaborated text groups showed comparable levels of reading motivation and comprehension. The data were analyzed using SPSS version 25, with a one-way ANOVA used to assess the effectiveness of the modified texts in enhancing reading comprehension. The results indicated that the experimental groups performed significantly better on the posttest compared to the control group, suggesting that text modifications can positively influence students' comprehension skills. Furthermore, the impact of text modifications on student reading motivation was assessed using a one-way ANOVA. The findings revealed that both the elaborated and simplified text groups scored higher than the control group in various dimensions of reading motivation, including reading efficacy, curiosity, challenge, compliance, and reading work avoidance. However, the control and simplified groups had nearly similar mean scores in the dimension of reading competition. These results clearly demonstrate that modifying texts can enhance EFL learners' reading motivation and comprehension.

Keywords: simplification, elaboration, reading motivation, reading comprehension

Procedia PDF Downloads 38
2353 First Time Voters Representation of Leadership as Exemplified by 2016 Presidentiables

Authors: Fevy Kae Mateo, Kimberly Javier, Alyzza Marie Palles

Abstract:

Leadership is a process of relationship involving interaction with other people. Leaders emphasise authority, which executes and implements regulations, maintains the rules and leads to a better future. The First Time voters are very significant because there are the stakeholders of the type of leader to be deployed. They also have the capacity of engaging the government and can be the agents of change. The objective of the study is to identify the strengths and weaknesses of leader. Moreover, the study identifies the qualities of a leader. Finally, the study determines first-time voter’s representation of a leader. Focus Group Discussion was carried out into two groups of first time voter’s ages 18 to 21 years old. Verbatim transcripts of the discussion were analyzed using Thematic Analysis. Overall results showed super ordinate themes for weaknesses of leader: Lace of transparency in the government, poor communication strategy, and valuing experience over potential and other contributory factor; for strength of a leader: analytical skill, emotional intelligence in political work, analytical ability and economic status on political participation; finally, in the representation of a leader: positive representation of a leader and negative representation of a leader.

Keywords: first time voters, focus group discussion, leadership, qualitative research design

Procedia PDF Downloads 251
2352 A Similarity Measure for Classification and Clustering in Image Based Medical and Text Based Banking Applications

Authors: K. P. Sandesh, M. H. Suman

Abstract:

Text processing plays an important role in information retrieval, data-mining, and web search. Measuring the similarity between the documents is an important operation in the text processing field. In this project, a new similarity measure is proposed. To compute the similarity between two documents with respect to a feature the proposed measure takes the following three cases into account: (1) The feature appears in both documents; (2) The feature appears in only one document and; (3) The feature appears in none of the documents. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems, especially in banking and health sectors. The results show that the performance obtained by the proposed measure is better than that achieved by the other measures.

Keywords: document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms

Procedia PDF Downloads 518
2351 Visual Text Analytics Technologies for Real-Time Big Data: Chronological Evolution and Issues

Authors: Siti Azrina B. A. Aziz, Siti Hafizah A. Hamid

Abstract:

New approaches to analyze and visualize data stream in real-time basis is important in making a prompt decision by the decision maker. Financial market trading and surveillance, large-scale emergency response and crowd control are some example scenarios that require real-time analytic and data visualization. This situation has led to the development of techniques and tools that support humans in analyzing the source data. With the emergence of Big Data and social media, new techniques and tools are required in order to process the streaming data. Today, ranges of tools which implement some of these functionalities are available. In this paper, we present chronological evolution evaluation of technologies for supporting of real-time analytic and visualization of the data stream. Based on the past research papers published from 2002 to 2014, we gathered the general information, main techniques, challenges and open issues. The techniques for streaming text visualization are identified based on Text Visualization Browser in chronological order. This paper aims to review the evolution of streaming text visualization techniques and tools, as well as to discuss the problems and challenges for each of identified tools.

Keywords: information visualization, visual analytics, text mining, visual text analytics tools, big data visualization

Procedia PDF Downloads 399
2350 Assessment of the Validity of Sentiment Analysis as a Tool to Analyze the Emotional Content of Text

Authors: Trisha Malhotra

Abstract:

Sentiment analysis is a recent field of study that computationally assesses the emotional nature of a body of text. To assess its test-validity, sentiment analysis was carried out on the emotional corpus of text from a personal 15-day mood diary. Self-reported mood scores varied more or less accurately with daily mood evaluation score given by the software. On further assessment, it was found that while sentiment analysis was good at assessing ‘global’ mood, it was not able to ‘locally’ identify and differentially score synonyms of various emotional words. It is further critiqued for treating the intensity of an emotion as universal across cultures. Finally, the software is shown not to account for emotional complexity in sentences by treating emotions as strictly positive or negative. Hence, it is posited that a better output could be two (positive and negative) affect scores for the same body of text.

Keywords: analysis, data, diary, emotions, mood, sentiment

Procedia PDF Downloads 269
2349 Motion Effects of Arabic Typography on Screen-Based Media

Authors: Ibrahim Hassan

Abstract:

Motion typography is one of the most important types of visual communication based on display. Through the digital display media, we can control the text properties (size, direction, thickness, color, etc.). The use of motion typography in visual communication made it have several images. We need to adjust the terminology and clarify the different differences between them, so relying on the word motion typography -considered a general term- is not enough to separate the different communicative functions of the moving text. In this paper, we discuss the different effects of motion typography on Arabic writing and how we can achieve harmony between the movement and the letterform, and we will, during our experiments, present a new type of text movement.

Keywords: Arabic typography, motion typography, kinetic typography, fluid typography, temporal typography

Procedia PDF Downloads 160
2348 Recognition of Grocery Products in Images Captured by Cellular Phones

Authors: Farshideh Einsele, Hassan Foroosh

Abstract:

In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation, style, illumination, and can suffer from perspective distortion. Pre-processing is performed to make the characters scale and rotation invariant. Since text degradations can not be appropriately defined using wellknown geometric transformations such as translation, rotation, affine transformation and shearing, we use the whole character black pixels as our feature vector. Classification is performed with minimum distance classifier using the maximum likelihood criterion, which delivers very promising Character Recognition Rate (CRR) of 89%. We achieve considerably higher Word Recognition Rate (WRR) of 99% when using lower level linguistic knowledge about product words during the recognition process.

Keywords: camera-based OCR, feature extraction, document, image processing, grocery products

Procedia PDF Downloads 406
2347 Instance Selection for MI-Support Vector Machines

Authors: Amy M. Kwon

Abstract:

Support vector machine (SVM) is a well-known algorithm in machine learning due to its superior performance, and it also functions well in multiple-instance (MI) problems. Our study proposes a schematic algorithm to select instances based on Hausdorff distance, which can be adapted to SVMs as input vectors under the MI setting. Based on experiments on five benchmark datasets, our strategy for adapting representation outperformed in comparison with original approach. In addition, task execution times (TETs) were reduced by more than 80% based on MissSVM. Hence, it is noteworthy to consider this representation adaptation to SVMs under MI-setting.

Keywords: support vector machine, Margin, Hausdorff distance, representation selection, multiple-instance learning, machine learning

Procedia PDF Downloads 34
2346 Pragmatic Survey of Precedence as Linguistic 'Déjà Vu' in Political Text and Talk

Authors: Zarine Avetisyan

Abstract:

Both in language and literature there exists the theory of recurrence of text and talk chunks which brings us to the notion of precedence. It must be stated that precedence as a pragma-linguistic phenomenon is yet underknown and it is the main objective of the present research to revisit and reveal it thoroughly. In line with the main research objective, analysis of political text and talk provides abundant relevant data for the illustration of the phenomenon of precedence. The analysis focuses on certain pragmatic universals (e.g. intention) and categories (e.g. speech techniques) which lead to the disclosure of the present object of study.

Keywords: intention, precedence, political discourse, pragmatic universals

Procedia PDF Downloads 430
2345 The Winning Possibility of Female Candidate in Korea

Authors: Minjeoung Kim

Abstract:

The majority of Korean female members of parliament(MPs) had been elected from the proportional representation till the 19th assemblies but in the 20th general election women MPs of the district representation is slightly more than women MPs of the proportional representation. The chance of women candidates to win is not as low as we assume. Therefore this study aims to reveal which factors influence the election of women candidates, other factors except the political party, because the effect of political party is already well known. Gangnam Eul is selected because female candidate was elected in spite of the low percentage of vote won by her political party. According to the survey, the female candidate was elected thanks to her policies and election pledges. Therefore, women candidates can be elected when they are nominated as candidates by their party in a safe constituency but also they can be elected with their good policies and election pledges in an unsafe constituency. And also the degree of the education, the age and the profession of voters influenced the support of female candidate.

Keywords: women candidates, 20th general election, winning in the district representation, policies and election pledges

Procedia PDF Downloads 253
2344 Automatic Assignment of Geminate and Epenthetic Vowel for Amharic Text-to-Speech System

Authors: Tadesse Anberbir, Felix Bankole, Tomio Takara, Girma Mamo

Abstract:

In the development of a text-to-speech synthesizer, automatic derivation of correct pronunciation from the grapheme form of a text is a central problem. Particularly deriving phonological features which are not shown in orthography is challenging. In the Amharic language, geminates and epenthetic vowels are very crucial for proper pronunciation but neither is shown in orthography. In this paper, we proposed and integrated a morphological analyzer into an Amharic Text-to-Speech system, mainly to predict geminates and epenthetic vowel positions, and prepared a duration modeling method. Amharic Text-to-Speech system (AmhTTS) is a parametric and rule-based system that adopts a cepstral method and uses a source filter model for speech production and a Log Magnitude Approximation (LMA) filter as the vocal tract filter. The naturalness of the system after employing the duration modeling was evaluated by sentence listening test and we achieved an average Mean Opinion Score (MOS) 3.4 (68%) which is moderate. By modeling the duration of geminates and controlling the locations of epenthetic vowel, we are able to synthesize good quality speech. Our system is mainly suitable to be customized for other Ethiopian languages with limited resources.

Keywords: Amharic, gemination, speech synthesis, morphology, epenthesis

Procedia PDF Downloads 87