Search results for: text representation
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2463

Search results for: text representation

2433 Semantic Indexing Improvement for Textual Documents: Contribution of Classification by Fuzzy Association Rules

Authors: Mohsen Maraoui

Abstract:

In the aim of natural language processing applications improvement, such as information retrieval, machine translation, lexical disambiguation, we focus on statistical approach to semantic indexing for multilingual text documents based on conceptual network formalism. We propose to use this formalism as an indexing language to represent the descriptive concepts and their weighting. These concepts represent the content of the document. Our contribution is based on two steps. In the first step, we propose the extraction of index terms using the multilingual lexical resource Euro WordNet (EWN). In the second step, we pass from the representation of index terms to the representation of index concepts through conceptual network formalism. This network is generated using the EWN resource and pass by a classification step based on association rules model (in attempt to discover the non-taxonomic relations or contextual relations between the concepts of a document). These relations are latent relations buried in the text and carried by the semantic context of the co-occurrence of concepts in the document. Our proposed indexing approach can be applied to text documents in various languages because it is based on a linguistic method adapted to the language through a multilingual thesaurus. Next, we apply the same statistical process regardless of the language in order to extract the significant concepts and their associated weights. We prove that the proposed indexing approach provides encouraging results.

Keywords: concept extraction, conceptual network formalism, fuzzy association rules, multilingual thesaurus, semantic indexing

Procedia PDF Downloads 141
2432 Anatomical Survey for Text Pattern Detection

Authors: S. Tehsin, S. Kausar

Abstract:

The ultimate aim of machine intelligence is to explore and materialize the human capabilities, one of which is the ability to detect various text objects within one or more images displayed on any canvas including prints, videos or electronic displays. Multimedia data has increased rapidly in past years. Textual information present in multimedia contains important information about the image/video content. However, it needs to technologically testify the commonly used human intelligence of detecting and differentiating the text within an image, for computers. Hence in this paper feature set based on anatomical study of human text detection system is proposed. Subsequent examination bears testimony to the fact that the features extracted proved instrumental to text detection.

Keywords: biologically inspired vision, content based retrieval, document analysis, text extraction

Procedia PDF Downloads 444
2431 Innovative Pictogram Chinese Characters Representation

Authors: J. H. Low, S. H. Hew, C. O. Wong

Abstract:

This paper proposes an innovative approach to represent the pictogram Chinese characters. The advantage of this representation is using an extraordinary to represent the pictogram Chinese character. This extraordinary representation is created accordingly to the original pictogram Chinese characters revolution. The purpose of this innovative creation is to assistant the learner learning Chinese as second language (SCL) in Chinese language learning specifically on memorize Chinese characters. Commonly, the SCL will give up and frustrate easily while memorize the Chinese characters by rote. So, our innovative representation is able to help on memorize the Chinese character by the help of visually storytelling. This innovative representation enhances the Chinese language learning experience of SCL.

Keywords: Chinese e-learning, innovative Chinese character representation, knowledge management, language learning

Procedia PDF Downloads 487
2430 3D Text Toys: Creative Approach to Experiential and Immersive Learning for World Literacy

Authors: Azyz Sharafy

Abstract:

3D Text Toys is an innovative and creative approach that utilizes 3D text objects to enhance creativity, literacy, and basic learning in an enjoyable and gamified manner. By using 3D Text Toys, children can develop their creativity, visually learn words and texts, and apply their artistic talents within their creative abilities. This process incorporates haptic engagement with 2D and 3D texts, word building, and mechanical construction of everyday objects, thereby facilitating better word and text retention. The concept involves constructing visual objects made entirely out of 3D text/words, where each component of the object represents a word or text element. For instance, a bird can be recreated using words or text shaped like its wings, beak, legs, head, and body, resulting in a 3D representation of the bird purely composed of text. This can serve as an art piece or a learning tool in the form of a 3D text toy. These 3D text objects or toys can be crafted using natural materials such as leaves, twigs, strings, or ropes, or they can be made from various physical materials using traditional crafting tools. Digital versions of these objects can be created using 2D or 3D software on devices like phones, laptops, iPads, or computers. To transform digital designs into physical objects, computerized machines such as CNC routers, laser cutters, and 3D printers can be utilized. Once the parts are printed or cut out, students can assemble the 3D texts by gluing them together, resulting in natural or everyday 3D text objects. These objects can be painted to create artistic pieces or text toys, and the addition of wheels can transform them into moving toys. One of the significant advantages of this visual and creative object-based learning process is that students not only learn words but also derive enjoyment from the process of creating, painting, and playing with these objects. The ownership and creation process further enhances comprehension and word retention. Moreover, for individuals with learning disabilities such as dyslexia, ADD (Attention Deficit Disorder), or other learning difficulties, the visual and haptic approach of 3D Text Toys can serve as an additional creative and personalized learning aid. The application of 3D Text Toys extends to both the English language and any other global written language. The adaptation and creative application may vary depending on the country, space, and native written language. Furthermore, the implementation of this visual and haptic learning tool can be tailored to teach foreign languages based on age level and comprehension requirements. In summary, this creative, haptic, and visual approach has the potential to serve as a global literacy tool.

Keywords: 3D text toys, creative, artistic, visual learning for world literacy

Procedia PDF Downloads 64
2429 A Review of Research on Pre-training Technology for Natural Language Processing

Authors: Moquan Gong

Abstract:

In recent years, with the rapid development of deep learning, pre-training technology for natural language processing has made great progress. The early field of natural language processing has long used word vector methods such as Word2Vec to encode text. These word vector methods can also be regarded as static pre-training techniques. However, this context-free text representation brings very limited improvement to subsequent natural language processing tasks and cannot solve the problem of word polysemy. ELMo proposes a context-sensitive text representation method that can effectively handle polysemy problems. Since then, pre-training language models such as GPT and BERT have been proposed one after another. Among them, the BERT model has significantly improved its performance on many typical downstream tasks, greatly promoting the technological development in the field of natural language processing, and has since entered the field of natural language processing. The era of dynamic pre-training technology. Since then, a large number of pre-trained language models based on BERT and XLNet have continued to emerge, and pre-training technology has become an indispensable mainstream technology in the field of natural language processing. This article first gives an overview of pre-training technology and its development history, and introduces in detail the classic pre-training technology in the field of natural language processing, including early static pre-training technology and classic dynamic pre-training technology; and then briefly sorts out a series of enlightening technologies. Pre-training technology, including improved models based on BERT and XLNet; on this basis, analyze the problems faced by current pre-training technology research; finally, look forward to the future development trend of pre-training technology.

Keywords: natural language processing, pre-training, language model, word vectors

Procedia PDF Downloads 57
2428 Graph-Based Semantical Extractive Text Analysis

Authors: Mina Samizadeh

Abstract:

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.

Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis

Procedia PDF Downloads 70
2427 Perceiving Text-Worlds as a Cognitive Mechanism to Understand Surah Al-Kahf

Authors: Awatef Boubakri, Khaled Jebahi

Abstract:

Using Text World Theory (TWT), we attempted to understand how mental representations (text worlds) and perceptions can be construed by readers of Quranic texts. To this end, Surah Al-Kahf was purposefully selected given the fact that while each of its stories is narrated, different levels of discourse intervene, which might result in a confused reader who might find it hard to keep track of which discourse he or she is processing. This surah was studied using specifically-designed text-world diagrams. The findings suggest that TWT can be used to help solve problems of ambiguity at the level of discourse in Quranic texts and to help construct a thinking reader whose cognitive constructs (text worlds / mental representations) are built through reflecting on the various and often changing components of discourse world, text world, and sub-worlds.

Keywords: Al-Kahf, Surah, cognitive, processing, discourse

Procedia PDF Downloads 88
2426 A Quantitative Evaluation of Text Feature Selection Methods

Authors: B. S. Harish, M. B. Revanasiddappa

Abstract:

Due to rapid growth of text documents in digital form, automated text classification has become an important research in the last two decades. The major challenge of text document representations are high dimension, sparsity, volume and semantics. Since the terms are only features that can be found in documents, selection of good terms (features) plays an very important role. In text classification, feature selection is a strategy that can be used to improve classification effectiveness, computational efficiency and accuracy. In this paper, we present a quantitative analysis of most widely used feature selection (FS) methods, viz. Term Frequency-Inverse Document Frequency (tfidf ), Mutual Information (MI), Information Gain (IG), CHISquare (x2), Term Frequency-Relevance Frequency (tfrf ), Term Strength (TS), Ambiguity Measure (AM) and Symbolic Feature Selection (SFS) to classify text documents. We evaluated all the feature selection methods on standard datasets like 20 Newsgroups, 4 University dataset and Reuters-21578.

Keywords: classifiers, feature selection, text classification

Procedia PDF Downloads 458
2425 Identification and Evaluation of Environmental Concepts in Paulo Coelho's "The Alchemist"

Authors: Tooba Sabir, Asima Jaffar, Namra Sabir, Mohammad Amjad Sabir

Abstract:

Ecocriticism is the study of relationship between human and environment which has been represented in literature since the very beginning in pastoral tradition. However, the analysis of such representation is new as compared to the other critical evaluations like Psychoanalysis, Marxism, Post-colonialism, Modernism and many others. Ecocritics seek to find information like anthropocentrism, ecocentrism, ecofeminism, eco-Marxism, representation of environment and environmental concept and several other topics. In the current study the representation of environmental concepts, were ecocritically analyzed in Paulo Coelho’s The Alchemist, one of the most read novels throughout the world, having been translated into many languages. Analysis of the text revealed, the representations of environmental ideas like landscapes and tourism, biodiversity, land-sea displacement, environmental disasters and warfare, desert winds and sand dunes. 'This desert was once a sea' throws light on different theories of land-sea displacement, one being the plate-tectonic theory which proposes Earth’s lithosphere to be divided into different large and small plates, continuously moving toward, away from or parallel to each other, resulting in land-sea displacement. Another theory is the continental drift theory which holds onto the belief that one large landmass—Pangea, broke down into smaller pieces of land that moved relative to each other and formed continents of the present time. The cause of desertification may, however, be natural i.e. climate change or artificial i.e. by human activities. Imagery of the environmental concepts, at some instances in the novel, is detailed and at other instances, is not as striking, but still is capable of arousing readers’ imagination. The study suggests that ecocritical justifications of environmental concepts in the text will increase the interactions between literature and environment which should be encouraged in order to induce environmental awareness among the readers.

Keywords: biodiversity, ecocritical analysis, ecocriticism, environmental disasters, landscapes

Procedia PDF Downloads 264
2424 Literary Theatre and Embodied Theatre: A Practice-Based Research in Exploring the Authorship of a Performance

Authors: Rahul Bishnoi

Abstract:

Theatre, as Ann Ubersfld calls it, is a paradox. At once, it is both a literary work and a physical representation. Theatre as a text is eternal, reproducible, and identical while as a performance, theatre is momentary and never identical to the previous performances. In this dual existence of theatre, who is the author? Is the author the playwright who writes the dramatic text, or the director who orchestrates the performance, or the actor who embodies the text? From the poststructuralist lens of Barthes, the author is dead. Barthes’ argument of discrete temporality, i.e. the author is the before, and the text is the after, does not hold true for theatre. A published literary work is written, edited, printed, distributed and then gets consumed by the reader. On the other hand, theatrical production is immediate; an actor performs and the audience witnesses it instantaneously. Time, so to speak, does not separate the author, the text, and the reader anymore. The question of authorship gets further complicated in Augusto Boal’s “Theatre of the Oppressed” movement where the audience is a direct participant like the actors in the performance. In this research, through an experimental performance, the duality of theatre is explored with the authorship discourse. And the conventional definition of authorship is subjected to additional complexity by erasing the distinction between an actor and the audience. The design/methodology of the experimental performance is as follows: The audience will be asked to produce a text under an anonymous virtual alias. The text, as it is being produced, will be read and performed by the actor. The audience who are also collectively “authoring” the text, will watch this performance and write further until everyone has contributed with one input each. The cycle of writing, reading, performing, witnessing, and writing will continue until the end. The intention is to create a dynamic system of writing/reading with the embodiment of the text through the actor. The actor is giving up the power to the audience to write the spoken word, stage instruction and direction while still keeping the agency of interpreting that input and performing in the chosen manner. This rapid conversation between the actor and the audience also creates a conversion of authorship. The main conclusion of this study is a perspective on the nature of dynamic authorship of theatre containing a critical enquiry of the collaboratively produced text, an individually performed act, and a collectively witnessed event. Using practice as a methodology, this paper contests the poststructuralist notion of the author as merely a ‘scriptor’ and breaks it further by involving the audience in the authorship as well.

Keywords: practice based research, performance studies, post-humanism, Avant-garde art, theatre

Procedia PDF Downloads 110
2423 The Untranslatability of the Qur’an

Authors: Mina Elhjouji

Abstract:

The aim of this paper is to raise awareness of the untranslatability of the Qur’an and to suggest some solutions that can help the translator in the process of transferring the meaning from the source text to the target text as much as possible. After the introduction, the miraculous character of the Qur’an shall be illustrated. Then, the difficulty of translating religious texts will be shown in terms of different causes; thematic, cultural, and linguistic. Some examples shall illustrate each type of these difficulties. Finally, some strategies that can help translate the Quran’s meanings will be suggested.

Keywords: translation, religious text, untranslatability, The Qur’an miracle, communicative theory

Procedia PDF Downloads 11
2422 Towards a Large Scale Deep Semantically Analyzed Corpus for Arabic: Annotation and Evaluation

Authors: S. Alansary, M. Nagi

Abstract:

This paper presents an approach of conducting semantic annotation of Arabic corpus using the Universal Networking Language (UNL) framework. UNL is intended to be a promising strategy for providing a large collection of semantically annotated texts with formal, deep semantics rather than shallow. The result would constitute a semantic resource (semantic graphs) that is editable and that integrates various phenomena, including predicate-argument structure, scope, tense, thematic roles and rhetorical relations, into a single semantic formalism for knowledge representation. The paper will also present the Interactive Analysis​ tool for automatic semantic annotation (IAN). In addition, the cornerstone of the proposed methodology which are the disambiguation and transformation rules, will be presented. Semantic annotation using UNL has been applied to a corpus of 20,000 Arabic sentences representing the most frequent structures in the Arabic Wikipedia. The representation, at different linguistic levels was illustrated starting from the morphological level passing through the syntactic level till the semantic representation is reached. The output has been evaluated using the F-measure. It is 90% accurate. This demonstrates how powerful the formal environment is, as it enables intelligent text processing and search.

Keywords: semantic analysis, semantic annotation, Arabic, universal networking language

Procedia PDF Downloads 582
2421 Social Media Mining with R. Twitter Analyses

Authors: Diana Codat

Abstract:

Tweets' analysis is part of text mining. Each document is a written text. It's possible to apply the usual text search techniques, in particular by switching to the bag-of-words representation. But the tweets induce peculiarities. Some may enrich the analysis. Thus, their length is calibrated (at least as far as public messages are concerned), special characters make it possible to identify authors (@) and themes (#), the tweet and retweet mechanisms make it possible to follow the diffusion of the information. Conversely, other characteristics may disrupt the analyzes. Because space is limited, authors often use abbreviations, emoticons to express feelings, and they do not pay much attention to spelling. All this creates noise that can complicate the task. The tweets carry a lot of potentially interesting information. Their exploitation is one of the main axes of the analysis of the social networks. We show how to access Twitter-related messages. We will initiate a study of the properties of the tweets, and we will follow up on the exploitation of the content of the messages. We will work under R with the package 'twitteR'. The study of tweets is a strong focus of analysis of social networks because Twitter has become an important vector of communication. This example shows that it is easy to initiate an analysis from data extracted directly online. The data preparation phase is of great importance.

Keywords: data mining, language R, social networks, Twitter

Procedia PDF Downloads 184
2420 Non Commutative Lᵖ Spaces as Hilbert Modules

Authors: Salvatore Triolo

Abstract:

We discuss the possibility of extending the well-known Gelfand-Naimark-Segal representation to modules over a C*algebra. We focus our attention on the case of Hilbert modules. We consider, in particular, the problem of the existence of a faithful representation. Non-commutative Lᵖ-spaces are shown to constitute examples of a class of CQ*-algebras. Finally, we have shown that any semisimple proper CQ*-algebra (X, A#), with A# a W*-algebra can be represented as a CQ*-algebra of measurable operators in Segal’s sense.

Keywords: Gelfand-Naimark-Segal representation, CQ*-algebras, faithful representation, non-commutative Lᵖ-spaces, operator in Hilbert spaces

Procedia PDF Downloads 248
2419 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 260
2418 Modeling Generalization in the Acquired Equivalence Paradigm with the Successor Representation

Authors: Troy M. Houser

Abstract:

The successor representation balances flexible and efficient reinforcement learning by learning to predict the future, given the present. As such, the successor representation models stimuli as what future states they lead to. Therefore, two stimuli that are perceptually dissimilar but lead to the same future state will come to be represented more similarly. This is very similar to an older behavioral paradigm -the acquired equivalence paradigm, which measures the generalization of learned associations. Here, we test via computational modeling the plausibility that the successor representation is the mechanism by which people generalize knowledge learned in the acquired equivalence paradigm. Computational evidence suggests that this is a plausible mechanism for acquired equivalence and thus can guide future empirical work on individual differences in associative-based generalization.

Keywords: acquired equivalence, successor representation, generalization, decision-making

Procedia PDF Downloads 27
2417 Socio-Cultural Representations through Lived Religions in Dalrymple’s Nine Lives

Authors: Suman

Abstract:

In the continuous interaction between the past and the present that historiography is, each time when history gets re/written, a new representation emerges. This new representation is a reflection of the earlier archives and their interpretations, fragmented remembrances of the past, as well as the reactions to the present. Memory, or lack thereof, and stereotyping generally play a major role in this representation. William Dalrymple’s Nine Lives: In Search of the Sacred in Modern India (2009) is one such written account that sets out to narrate the representations of religion and culture of India and contemporary reactions to it. Dalrymple’s nine saints belong to different castes, sects, religions, and regions. By dealing with their religions and expressions of those religions, and through the lived mysticism of these nine individuals, the book engages with some important issues like class, caste and gender in the contexts provided by historical as well as present India. The paper studies the development of religion and accompanied feeling of religiosity in modern as well as historical contexts through a study of these elements in the book. Since, the language used in creation of texts and the literary texts thus produced create a new reality that questions the stereotypes of the past, and in turn often end up creating new stereotypes or stereotypical representations at times, the paper seeks to actively engage with the text in order to identify and study such stereotypes, along with their changing representations. Through a detailed examination of the book, the paper seeks to unravel whether some socio-cultural stereotypes existed earlier, and whether there is development of new stereotypes from Dalrymple’s point of view as an outsider writing on issues that are deeply rooted in the cultural milieu of the country. For this analysis, the paper takes help from the psycho-literary theories of stereotyping and representation.

Keywords: stereotyping, representation, William Dalrymple, religion

Procedia PDF Downloads 310
2416 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 174
2415 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 175
2414 Automatic Tagging and Accuracy in Assamese Text Data

Authors: Chayanika Hazarika Bordoloi

Abstract:

This paper is an attempt to work on a highly inflectional language called Assamese. This is also one of the national languages of India and very little has been achieved in terms of computational research. Building a language processing tool for a natural language is not very smooth as the standard and language representation change at various levels. This paper presents inflectional suffixes of Assamese verbs and how the statistical tools, along with linguistic features, can improve the tagging accuracy. Conditional random fields (CRF tool) was used to automatically tag and train the text data; however, accuracy was improved after linguistic featured were fed into the training data. Assamese is a highly inflectional language; hence, it is challenging to standardizing its morphology. Inflectional suffixes are used as a feature of the text data. In order to analyze the inflections of Assamese word forms, a list of suffixes is prepared. This list comprises suffixes, comprising of all possible suffixes that various categories can take is prepared. Assamese words can be classified into inflected classes (noun, pronoun, adjective and verb) and un-inflected classes (adverb and particle). The corpus used for this morphological analysis has huge tokens. The corpus is a mixed corpus and it has given satisfactory accuracy. The accuracy rate of the tagger has gradually improved with the modified training data.

Keywords: CRF, morphology, tagging, tagset

Procedia PDF Downloads 194
2413 Structural Analysis of Kamaluddin Behzad's Works Based on Roland Barthes' Theory of Communication, 'Text and Image'

Authors: Mahsa Khani Oushani, Mohammad Kazem Hasanvand

Abstract:

Text and image have always been two important components in Iranian layout. The interactive connection between text and image has shaped the art of book design with multiple patterns. In this research, first the structure and visual elements in the research data were analyzed and then the position of the text element and the image element in relation to each other based on Roland Barthes theory on the three theories of text and image, were studied and analyzed and the results were compared, and interpreted. The purpose of this study is to investigate the pattern of text and image in the works of Kamaluddin Behzad based on three Roland Barthes communication theories, 1. Descriptive communication, 2. Reference communication, 3. Matched communication. The questions of this research are what is the relationship between text and image in Behzad's works? And how is it defined according to Roland Barthes theory? The method of this research has been done with a structuralist approach with a descriptive-analytical method in a library collection method. The information has been collected in the form of documents (library) and is a tool for collecting online databases. Findings show that the dominant element in Behzad's drawings is with the image and has created a reference relationship in the layout of the drawings, but in some cases it achieves a different relationship that despite the preference of the image on the page, the text is dispersed proportionally on the page and plays a more active role, played within the image. The text and the image support each other equally on the page; Roland Barthes equates this connection.

Keywords: text, image, Kamaluddin Behzad, Roland Barthes, communication theory

Procedia PDF Downloads 192
2412 Emotions in Health Tweets: Analysis of American Government Official Accounts

Authors: García López

Abstract:

The Government Departments of Health have the task of informing and educating citizens about public health issues. For this, they use channels like Twitter, key in the search for health information and the propagation of content. The tweets, important in the virality of the content, may contain emotions that influence the contagion and exchange of knowledge. The goal of this study is to perform an analysis of the emotional projection of health information shared on Twitter by official American accounts: the disease control account CDCgov, National Institutes of Health, NIH, the government agency HHSGov, and the professional organization PublicHealth. For this, we used Tone Analyzer, an International Business Machines Corporation (IBM) tool specialized in emotion detection in text, corresponding to the categorical model of emotion representation. For 15 days, all tweets from these accounts were analyzed with the emotional analysis tool in text. The results showed that their tweets contain an important emotional load, a determining factor in the success of their communications. This exposes that official accounts also use subjective language and contain emotions. The predominance of emotion joy over sadness and the strong presence of emotions in their tweets stimulate the virality of content, a key in the work of informing that government health departments have.

Keywords: emotions in tweets, emotion detection in the text, health information on Twitter, American health official accounts, emotions on Twitter, emotions and content

Procedia PDF Downloads 142
2411 Network Word Discovery Framework Based on Sentence Semantic Vector Similarity

Authors: Ganfeng Yu, Yuefeng Ma, Shanliang Yang

Abstract:

The word discovery is a key problem in text information retrieval technology. Methods in new word discovery tend to be closely related to words because they generally obtain new word results by analyzing words. With the popularity of social networks, individual netizens and online self-media have generated various network texts for the convenience of online life, including network words that are far from standard Chinese expression. How detect network words is one of the important goals in the field of text information retrieval today. In this paper, we integrate the word embedding model and clustering methods to propose a network word discovery framework based on sentence semantic similarity (S³-NWD) to detect network words effectively from the corpus. This framework constructs sentence semantic vectors through a distributed representation model, uses the similarity of sentence semantic vectors to determine the semantic relationship between sentences, and finally realizes network word discovery by the meaning of semantic replacement between sentences. The experiment verifies that the framework not only completes the rapid discovery of network words but also realizes the standard word meaning of the discovery of network words, which reflects the effectiveness of our work.

Keywords: text information retrieval, natural language processing, new word discovery, information extraction

Procedia PDF Downloads 95
2410 Morphological Processing of Punjabi Text for Sentiment Analysis of Farmer Suicides

Authors: Jaspreet Singh, Gurvinder Singh, Prabhsimran Singh, Rajinder Singh, Prithvipal Singh, Karanjeet Singh Kahlon, Ravinder Singh Sawhney

Abstract:

Morphological evaluation of Indian languages is one of the burgeoning fields in the area of Natural Language Processing (NLP). The evaluation of a language is an eminent task in the era of information retrieval and text mining. The extraction and classification of knowledge from text can be exploited for sentiment analysis and morphological evaluation. This study coalesce morphological evaluation and sentiment analysis for the task of classification of farmer suicide cases reported in Punjab state of India. The pre-processing of Punjabi text involves morphological evaluation and normalization of Punjabi word tokens followed by the training of proposed model using deep learning classification on Punjabi language text extracted from online Punjabi news reports. The class-wise accuracies of sentiment prediction for four negatively oriented classes of farmer suicide cases are 93.85%, 88.53%, 83.3%, and 95.45% respectively. The overall accuracy of sentiment classification obtained using proposed framework on 275 Punjabi text documents is found to be 90.29%.

Keywords: deep neural network, farmer suicides, morphological processing, punjabi text, sentiment analysis

Procedia PDF Downloads 326
2409 African Women in Power: An Analysis of the Representation of Nigerian Business Women in Television

Authors: Ifeanyichukwu Valerie Oguafor

Abstract:

Women generally have been categorized and placed under the chain of business industry, sometimes highly regarded and other times merely. The social construction of womanhood does not in all sense support a woman going into business, let alone succeed in it because it is believed that it a man’s world. In a typical patriarchal setting, a woman is expected to know nothing more domestic roles. For some women, this is not the case as they have been able to break these barriers to excel in business amidst these social setting and stereotypes. This study examines media representation of Nigerians business women, using content analysis of TV interviews as media text, framing analysis as an approach in qualitative methodology, The study further aims to analyse media frames of two Nigerian business women: FolorunshoAlakija, a business woman in the petroleum industry with current net worth 1.1 billion U.S dollars, emerging as the richest black women in the world 2014. MosunmolaAbudu, a media magnate in Nigeria who launched the first Africa’s global black entertainment and lifestyle network in 2013. This study used six predefined frames: the business woman, the myth of business women, the non-traditional woman, women in leading roles, the family woman, the religious woman, and the philanthropist woman to analyse the representation of Nigerian business women in the media. The analysis of the aforementioned frames on TV interviews with these women reveals that the media perpetually reproduces existing gender stereotype and do not challenge patriarchy. Women face challenges in trying to succeed in business while trying to keep their homes stable. This study concludes that the media represent and reproduce gender stereotypes in spite of the expectation of empowering women. The media reduces these women’s success insignificant rather than a role model for women in society.

Keywords: representation of business women in the media, business women in Nigeria, framing in the media, patriarchy, women's subordination

Procedia PDF Downloads 161
2408 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 96
2407 Written Argumentative Texts in Elementary School: The Development of Text Structure and Its Relation to Reading Comprehension

Authors: Sara Zadunaisky Ehrlich, Batia Seroussi, Anat Stavans

Abstract:

Text structure is a parameter of text quality. This study investigated the structure of written argumentative texts produced by elementary school age children. We set two objectives: to identify and trace the structural components of the argumentative texts and to investigate whether reading comprehension skills were correlated with text structure. 293 school children from 2nd to 5th grades were asked to write two argumentative texts about informal or everyday life controversial topics and completed two reading tasks that targeted different levels of text comprehension. The findings indicated, on the one hand, significant developmental differences between mature and more novice writers in terms of text length and mean proportion of clauses produced for a better elaboration of the different text components. On the other hand, with certain fluctuations, no meaningful differences were found in terms of presence of text structure: at all grade levels, elementary school children produced the basic and minimal structure that included the writer's argument and reasons or arguments' supports. Counter-arguments were scarce even in the upper grades. While the children captured that essentially an argument must be justified, the more the number of supports produced, the fewer the clauses the children produced. Last, weak to mild relations were found between reading comprehension and argumentative text structure. Nevertheless, children who scored higher on sophisticated questions that require inferential or world knowledge displayed more elaborated structures in terms of text length and size of supports to the writer's argument. These findings indicate how school-age children perceive the basic template of an argument with future implications regarding how to elaborate written arguments.

Keywords: argumentative text, text structure, elementary school children, written argumentations

Procedia PDF Downloads 166
2406 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: bilingual, deviations, morphology, texts

Procedia PDF Downloads 269
2405 Hierarchical Piecewise Linear Representation of Time Series Data

Authors: Vineetha Bettaiah, Heggere S. Ranganath

Abstract:

This paper presents a Hierarchical Piecewise Linear Approximation (HPLA) for the representation of time series data in which the time series is treated as a curve in the time-amplitude image space. The curve is partitioned into segments by choosing perceptually important points as break points. Each segment between adjacent break points is recursively partitioned into two segments at the best point or midpoint until the error between the approximating line and the original curve becomes less than a pre-specified threshold. The HPLA representation achieves dimensionality reduction while preserving prominent local features and general shape of time series. The representation permits course-fine processing at different levels of details, allows flexible definition of similarity based on mathematical measures or general time series shape, and supports time series data mining operations including query by content, clustering and classification based on whole or subsequence similarity.

Keywords: data mining, dimensionality reduction, piecewise linear representation, time series representation

Procedia PDF Downloads 275
2404 “Octopub”: Geographical Sentiment Analysis Using Named Entity Recognition from Social Networks for Geo-Targeted Billboard Advertising

Authors: Oussama Hafferssas, Hiba Benyahia, Amina Madani, Nassima Zeriri

Abstract:

Although data nowadays has multiple forms; from text to images, and from audio to videos, yet text is still the most used one at a public level. At an academical and research level, and unlike other forms, text can be considered as the easiest form to process. Therefore, a brunch of Data Mining researches has been always under its shadow, called "Text Mining". Its concept is just like data mining’s, finding valuable patterns in data, from large collections and tremendous volumes of data, in this case: Text. Named entity recognition (NER) is one of Text Mining’s disciplines, it aims to extract and classify references such as proper names, locations, expressions of time and dates, organizations and more in a given text. Our approach "Octopub" does not aim to find new ways to improve named entity recognition process, rather than that it’s about finding a new, and yet smart way, to use NER in a way that we can extract sentiments of millions of people using Social Networks as a limitless information source, and Marketing for product promotion as the main domain of application.

Keywords: textmining, named entity recognition(NER), sentiment analysis, social media networks (SN, SMN), business intelligence(BI), marketing

Procedia PDF Downloads 589