Search results for: text analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27951

Search results for: text analysis

27921 Intonation Salience as an Underframe to Text Intonation Models

Authors: Tatiana Stanchuliak

Abstract:

It is common knowledge that intonation is not laid over a ready text. On the contrary, intonation forms and accompanies the text on the level of its birth in the speaker’s mind. As a result, intonation plays one of the fundamental roles in the process of transferring a thought into external speech. Intonation structure can highlight the semantic significance of textual elements and become a ranging mark in understanding the information structure of the text. Intonation functions by means of prosodic characteristics, one of which is intonation salience, whose function in texts results in making some textual elements more prominent than others. This function of intonation, therefore, performs as organizing. It helps to form the frame of key elements of the text. The study under consideration made an attempt to look into the inner nature of salience and create a sort of a text intonation model. This general goal brought to some more specific intermediate results. First, there were established degrees of salience on the level of the smallest semantic element - intonation group, as well as prosodic means of creating salience, were examined. Second, the most frequent combinations of prosodic means made it possible to distinguish patterns of salience, which then became constituent elements of a text intonation model. Third, the analysis of the predicate structure allowed to divide the whole text into smaller parts, or units, which performed a specific function in the developing of the general communicative intention. It appeared that such units can be found in any text and they have common characteristics of their intonation arrangement. These findings are certainly very important both for the theory of intonation and their practical application.

Keywords: accentuation , inner speech, intention, intonation, intonation functions, models, patterns, predicate, salience, semantics, sentence stress, text

Procedia PDF Downloads 252
27920 Towards Logical Inference for the Arabic Question-Answering

Authors: Wided Bakari, Patrice Bellot, Omar Trigui, Mahmoud Neji

Abstract:

This article constitutes an opening to think of the modeling and analysis of Arabic texts in the context of a question-answer system. It is a question of exceeding the traditional approaches focused on morphosyntactic approaches. Furthermore, we present a new approach that analyze a text in order to extract correct answers then transform it to logical predicates. In addition, we would like to represent different levels of information within a text to answer a question and choose an answer among several proposed. To do so, we transform both the question and the text into logical forms. Then, we try to recognize all entailment between them. The results of recognizing the entailment are a set of text sentences that can implicate the user’s question. Our work is now concentrated on an implementation step in order to develop a system of question-answering in Arabic using techniques to recognize textual implications. In this context, the extraction of text features (keywords, named entities, and relationships that link them) is actually considered the first step in our process of text modeling. The second one is the use of techniques of textual implication that relies on the notion of inference and logic representation to extract candidate answers. The last step is the extraction and selection of the desired answer.

Keywords: NLP, Arabic language, question-answering, recognition text entailment, logic forms

Procedia PDF Downloads 330
27919 Literature Review on Text Comparison Techniques: Analysis of Text Extraction, Main Comparison and Visual Representation Tools

Authors: Andriana Mkrtchyan, Vahe Khlghatyan

Abstract:

The choice of a profession is one of the most important decisions people make throughout their life. With the development of modern science, technologies, and all the spheres existing in the modern world, more and more professions are being arisen that complicate even more the process of choosing. Hence, there is a need for a guiding platform to help people to choose a profession and the right career path based on their interests, skills, and personality. This review aims at analyzing existing methods of comparing PDF format documents and suggests that a 3-stage approach is implemented for the comparison, that is – 1. text extraction from PDF format documents, 2. comparison of the extracted text via NLP algorithms, 3. comparison representation using special shape and color psychology methodology.

Keywords: color psychology, data acquisition/extraction, data augmentation, disambiguation, natural language processing, outlier detection, semantic similarity, text-mining, user evaluation, visual search

Procedia PDF Downloads 55
27918 Principle Components Updates via Matrix Perturbations

Authors: Aiman Elragig, Hanan Dreiwi, Dung Ly, Idriss Elmabrook

Abstract:

This paper highlights a new approach to look at online principle components analysis (OPCA). Given a data matrix X R,^m x n we characterise the online updates of its covariance as a matrix perturbation problem. Up to the principle components, it turns out that online updates of the batch PCA can be captured by symmetric matrix perturbation of the batch covariance matrix. We have shown that as n→ n0 >> 1, the batch covariance and its update become almost similar. Finally, utilize our new setup of online updates to find a bound on the angle distance of the principle components of X and its update.

Keywords: online data updates, covariance matrix, online principle component analysis, matrix perturbation

Procedia PDF Downloads 185
27917 Perceiving Text-Worlds as a Cognitive Mechanism to Understand Surah Al-Kahf

Authors: Awatef Boubakri, Khaled Jebahi

Abstract:

Using Text World Theory (TWT), we attempted to understand how mental representations (text worlds) and perceptions can be construed by readers of Quranic texts. To this end, Surah Al-Kahf was purposefully selected given the fact that while each of its stories is narrated, different levels of discourse intervene, which might result in a confused reader who might find it hard to keep track of which discourse he or she is processing. This surah was studied using specifically-designed text-world diagrams. The findings suggest that TWT can be used to help solve problems of ambiguity at the level of discourse in Quranic texts and to help construct a thinking reader whose cognitive constructs (text worlds / mental representations) are built through reflecting on the various and often changing components of discourse world, text world, and sub-worlds.

Keywords: Al-Kahf, Surah, cognitive, processing, discourse

Procedia PDF Downloads 72
27916 Glossematics and Textual Structure

Authors: Abdelhadi Nadjer

Abstract:

The structure of the text to the systemic school -(glossématique-Helmslev). At the beginning of the note we have a cursory look around the concepts of general linguistics The science that studies scientific study of human language based on the description and preview the facts away from the trend of education than we gave a detailed overview the founder of systemic school and most important customers and more methods and curriculum theory and analysis they extend to all humanities, practical action each offset by a theoretical and the procedure can be analyzed through the elements that pose as another method we talked to its links with other language schools where they are based on the sharp criticism of the language before and deflected into consideration for the field of language and its erection has outside or language network and its participation in the actions (non-linguistic) and after that we started our Valglosamatik analytical structure of the text is ejected text terminal or all of the words to was put for expression. This text Negotiable divided into types in turn are divided into classes and class should not be carrying a contradiction and be inclusive. It is on the same materials as described relationships that combine language and seeks to describe their relations and identified.

Keywords: text, language schools, linguistics, human language

Procedia PDF Downloads 446
27915 Method of Complex Estimation of Text Perusal and Indicators of Reading Quality in Different Types of Commercials

Authors: Victor N. Anisimov, Lyubov A. Boyko, Yazgul R. Almukhametova, Natalia V. Galkina, Alexander V. Latanov

Abstract:

Modern commercials presented on billboards, TV and on the Internet contain a lot of information about the product or service in text form. However, this information cannot always be perceived and understood by consumers. Typical sociological focus group studies often cannot reveal important features of the interpretation and understanding information that has been read in text messages. In addition, there is no reliable method to determine the degree of understanding of the information contained in a text. Only the fact of viewing a text does not mean that consumer has perceived and understood the meaning of this text. At the same time, the tools based on marketing analysis allow only to indirectly estimate the process of reading and understanding a text. Therefore, the aim of this work is to develop a valid method of recording objective indicators in real time for assessing the fact of reading and the degree of text comprehension. Psychophysiological parameters recorded during text reading can form the basis for this objective method. We studied the relationship between multimodal psychophysiological parameters and the process of text comprehension during reading using the method of correlation analysis. We used eye-tracking technology to record eye movements parameters to estimate visual attention, electroencephalography (EEG) to assess cognitive load and polygraphic indicators (skin-galvanic reaction, SGR) that reflect the emotional state of the respondent during text reading. We revealed reliable interrelations between perceiving the information and the dynamics of psychophysiological parameters during reading the text in commercials. Eye movement parameters reflected the difficulties arising in respondents during perceiving ambiguous parts of text. EEG dynamics in rate of alpha band were related with cumulative effect of cognitive load. SGR dynamics were related with emotional state of the respondent and with the meaning of text and type of commercial. EEG and polygraph parameters together also reflected the mental difficulties of respondents in understanding text and showed significant differences in cases of low and high text comprehension. We also revealed differences in psychophysiological parameters for different type of commercials (static vs. video, financial vs. cinema vs. pharmaceutics vs. mobile communication, etc.). Conclusions: Our methodology allows to perform multimodal evaluation of text perusal and the quality of text reading in commercials. In general, our results indicate the possibility of designing an integral model to estimate the comprehension of reading the commercial text in percent scale based on all noticed markers.

Keywords: reading, commercials, eye movements, EEG, polygraphic indicators

Procedia PDF Downloads 152
27914 Electroencephalogram during Natural Reading: Theta and Alpha Rhythms as Analytical Tools for Assessing a Reader’s Cognitive State

Authors: D. Zhigulskaya, V. Anisimov, A. Pikunov, K. Babanova, S. Zuev, A. Latyshkova, K. Сhernozatonskiy, A. Revazov

Abstract:

Electrophysiology of information processing in reading is certainly a popular research topic. Natural reading, however, has been relatively poorly studied, despite having broad potential applications for learning and education. In the current study, we explore the relationship between text categories and spontaneous electroencephalogram (EEG) while reading. Thirty healthy volunteers (mean age 26,68 ± 1,84) participated in this study. 15 Russian-language texts were used as stimuli. The first text was used for practice and was excluded from the final analysis. The remaining 14 were opposite pairs of texts in one of 7 categories, the most important of which were: interesting/boring, fiction/non-fiction, free reading/reading with an instruction, reading a text/reading a pseudo text (consisting of strings of letters that formed meaningless words). Participants had to read the texts sequentially on an Apple iPad Pro. EEG was recorded from 12 electrodes simultaneously with eye movement data via ARKit Technology by Apple. EEG spectral amplitude was analyzed in Fz for theta-band (4-8 Hz) and in C3, C4, P3, and P4 for alpha-band (8-14 Hz) using the Friedman test. We found that reading an interesting text was accompanied by an increase in theta spectral amplitude in Fz compared to reading a boring text (3,87 µV ± 0,12 and 3,67 µV ± 0,11, respectively). When instructions are given for reading, we see less alpha activity than during free reading of the same text (3,34 µV ± 0,20 and 3,73 µV ± 0,28, respectively, for C4 as the most representative channel). The non-fiction text elicited less activity in the alpha band (C4: 3,60 µV ± 0,25) than the fiction text (C4: 3,66 µV ± 0,26). A significant difference in alpha spectral amplitude was also observed between the regular text (C4: 3,64 µV ± 0,29) and the pseudo text (C4: 3,38 µV ± 0,22). These results suggest that some brain activity we see on EEG is sensitive to particular features of the text. We propose that changes in theta and alpha bands during reading may serve as electrophysiological tools for assessing the reader’s cognitive state as well as his or her attitude to the text and the perceived information. These physiological markers have prospective practical value for developing technological solutions and biofeedback systems for reading in particular and for education in general.

Keywords: EEG, natural reading, reader's cognitive state, theta-rhythm, alpha-rhythm

Procedia PDF Downloads 70
27913 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 219
27912 In-Context Meta Learning for Automatic Designing Pretext Tasks for Self-Supervised Image Analysis

Authors: Toktam Khatibi

Abstract:

Self-supervised learning (SSL) includes machine learning models that are trained on one aspect and/or one part of the input to learn other aspects and/or part of it. SSL models are divided into two different categories, including pre-text task-based models and contrastive learning ones. Pre-text tasks are some auxiliary tasks learning pseudo-labels, and the trained models are further fine-tuned for downstream tasks. However, one important disadvantage of SSL using pre-text task solving is defining an appropriate pre-text task for each image dataset with a variety of image modalities. Therefore, it is required to design an appropriate pretext task automatically for each dataset and each downstream task. To the best of our knowledge, the automatic designing of pretext tasks for image analysis has not been considered yet. In this paper, we present a framework based on In-context learning that describes each task based on its input and output data using a pre-trained image transformer. Our proposed method combines the input image and its learned description for optimizing the pre-text task design and its hyper-parameters using Meta-learning models. The representations learned from the pre-text tasks are fine-tuned for solving the downstream tasks. We demonstrate that our proposed framework outperforms the compared ones on unseen tasks and image modalities in addition to its superior performance for previously known tasks and datasets.

Keywords: in-context learning (ICL), meta learning, self-supervised learning (SSL), vision-language domain, transformers

Procedia PDF Downloads 66
27911 Math and Religion in Arvo Pärt's Out of the Depths

Authors: Ismael Lins Patriota

Abstract:

Arvo Pärt is an Estonian composer who started his musical career under the influence of twelve-tone music and dodecaphonism. From 1968 to 1976, he isolated himself to search for a new path as a composer. In this period, he converted to Russian orthodoxy and changed his composing to tintinnabuli, a musical technique combining triadic chords with simple melodies. The recent analysis of Pärt’s output demonstrates that mathematics remained an influence after the invention of tintinnabuli. The present discussion deals with the relationship between math and religion in his work Out of the Depths (1980), proposing a musical-text approach and examining the minimum elements of the piece, such as motives and sub-phrases, which is the main focus of this work, considering text patterns and the role of the organ, which also uses the tintinnabuli system. The analysis of these elements demonstrates that Pärt uses math as a formal element, and the composer combines musical parameters to execute a personal and innovative interpretation of the text.

Keywords: Arvo Pärt, Out of the Depths, math, religion, analysis

Procedia PDF Downloads 72
27910 Poetics of the Connecting ha’: A Textual Study in the Poetry of Al-Husari Al-Qayrawani

Authors: Mahmoud al-Ashiriy

Abstract:

This paper begins from the idea that the real history of literature is the history of its style. And since the rhyme –as known- is not merely the last letter, that have received a lot of analysis and investigation, but it is a collection of other values in addition to its different markings. This paper will explore the work of the connecting ha’ and its effectiveness in shaping the text of poetry, since it establishes vocal rhythms in addition to its role in indicating references through the pronoun, vertically through the poem through the sequence of its verses, also horizontally through what environs the one verse of sentences. If the scientific formation of prosody stopped at the possibilities and prohibitions; literary criticism and poetry studies should explore what is above the rule of aesthetic horizon of poetic effectiveness that varies from a text to another, a poet to another, a literary period to another, or from a poetic taste to another. Then the paper will explore this poetic essence in the texts of the famous Andalusian Poet Al-Husari Al-Qayrawani through his well-known Daliyya (a poem that its verses end with the letter D), and the role of the connecting ha’ in fulfilling its text and the accomplishment of its poetics, departing from this to the diwan (the big collection of poems) also as a higher text that surpasses the text/poem, and through what it represents of effectiveness the work of the phenomenon in accomplishing the poetics of the poem of Al-Husari Al-Qayrawani who is one of the pillars of Arabic poetics in Andalusia.

Keywords: Al-Husari Al-Qayrawni, poetics, rhyme, stylistics, science of the text

Procedia PDF Downloads 554
27909 Polycode Texts in Communication of Antisocial Groups: Functional and Pragmatic Aspects

Authors: Ivan Potapov

Abstract:

Background: The aim of this paper is to investigate poly code texts in the communication of youth antisocial groups. Nowadays, the notion of a text has numerous interpretations. Besides all the approaches to defining a text, we must take into account semiotic and cultural-semiotic ones. Rapidly developing IT, world globalization, and new ways of coding of information increase the role of the cultural-semiotic approach. However, the development of computer technologies leads also to changes in the text itself. Polycode texts play a more and more important role in the everyday communication of the younger generation. Therefore, the research of functional and pragmatic aspects of both verbal and non-verbal content is actually quite important. Methods and Material: For this survey, we applied the combination of four methods of text investigation: not only intention and content analysis but also semantic and syntactic analysis. Using these methods provided us with information on general text properties, the content of transmitted messages, and each communicants’ intentions. Besides, during our research, we figured out the social background; therefore, we could distinguish intertextual connections between certain types of polycode texts. As the sources of the research material, we used 20 public channels in the popular messenger Telegram and data extracted from smartphones, which belonged to arrested members of antisocial groups. Findings: This investigation let us assert that polycode texts can be characterized as highly intertextual language unit. Moreover, we could outline the classification of these texts based on communicants’ intentions. The most common types of antisocial polycode texts are a call to illegal actions and agitation. What is more, each type has its own semantic core: it depends on the sphere of communication. However, syntactic structure is universal for most of the polycode texts. Conclusion: Polycode texts play important role in online communication. The results of this investigation demonstrate that in some social groups using these texts has a destructive influence on the younger generation and obviously needs further researches.

Keywords: text, polycode text, internet linguistics, text analysis, context, semiotics, sociolinguistics

Procedia PDF Downloads 121
27908 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 249
27907 Research on the Landscape of Xi'an Ancient City Based on the Poetry Text of Tang Dynasty

Authors: Zou Yihui

Abstract:

The integration of the traditional landscape of the ancient city and the poet's emotions and symbolization into ancient poetry is the unique cultural gene and spiritual core of the historical city, and re-understanding the historical landscape pattern from the poetry is conducive to continuing the historical city context and improving the current situation of the gradual decline of the poetry of the modern historical urban landscape. Starting from Tang poetry uses semantic analysis methods、combined with text mining technology, entry mining, word frequency analysis, and cluster analysis of the landscape information of Tang Chang'an City were carried out, and the method framework for analyzing the urban landscape form based on poetry text was constructed. Nearly 160 poems describing the landscape of Tang Chang'an City were screened, and the poetic landscape characteristics of Tang Chang'an City were sorted out locally in order to combine with modern urban spatial development to continue the urban spatial context.

Keywords: Tang Chang'an City, poetic texts, semantic analysis, historical landscape

Procedia PDF Downloads 36
27906 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 159
27905 A Proposed Approach for Emotion Lexicon Enrichment

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Keywords: document analysis, sentimental analysis, emotion detection, WEKA tool, NRC lexicon

Procedia PDF Downloads 416
27904 Developing House’s Model to Assess the Translation of Key Cultural Texts

Authors: Raja Al-Ghamdi

Abstract:

This paper aims to systematically assess the translation of key cultural texts. The paper, therefore, proposes a modification of the discourse analysis model for translation quality assessment introduced by the linguist Juliane House (1977, 1997, 2015). The data for analysis has been chosen from a religious text that has never been investigated before. It is an overt translation of the biography of Prophet Mohammad. The book is written originally in Arabic and translated into English. A soft copy of the translation, entitled The Sealed Nectar, is posted on numerous websites including the Internet Archive library which offers a free access to everyone. The text abounds with linguistic and cultural phenomena relevant to Islamic and Arab lingua-cultural context which make its translation a challenge, as well as its assessment. Interesting findings show that (1) culturemes are rich points and both the translator’s subjectivity and intervention are apparent in mediating them, (2) given the nature of historical narration, the source text reflects the author’s positive shading, whereas the target text reflects the translator’s axiological orientation as neutrally shaded, and, (3) linguistic gaps, metaphorical expressions and intertextuality are major stimuli to compensation strategies.

Keywords: Arabic-English discourse analysis, key cultural texts, overt translation, quality assessment

Procedia PDF Downloads 271
27903 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 317
27902 Role of Gender in Apparel Stores' Consumer Review: A Sentiment Analysis

Authors: Sarif Ullah Patwary, Matthew Heinrich, Brandon Payne

Abstract:

The ubiquity of web 2.0 platforms, in the form of wikis, social media (e.g., Facebook, Twitter, etc.) and online review portals (e.g., Yelp), helps shape today’s apparel consumers’ purchasing decision. Online reviews play important role towards consumers’ apparel purchase decision. Each of the consumer reviews carries a sentiment (positive, negative or neutral) towards products. Commercially, apparel brands and retailers analyze sentiment of this massive amount of consumer review data to update their inventory and bring new products in the market. The purpose of this study is to analyze consumer reviews of selected apparel stores with a view to understand, 1) the difference of sentiment expressed through men’s and woman’s text reviews, 2) the difference of sentiment expressed through men’s and woman’s star-based reviews, and 3) the difference of sentiment between star-based reviews and text-based reviews. A total of 9,363 reviews (1,713 men and 7,650 women) were collected using Yelp Dataset Challenge. Sentiment analysis of collected reviews was carried out in two dimensions: star-based reviews and text-based reviews. Sentiment towards apparel stores expressed through star-based reviews was deemed: 1) positive for 3 or 4 stars 2) negative for 1 or 2 stars and 3) neutral for 3 stars. Sentiment analysis of text-based reviews was carried out using Bing Liu dictionary. The analysis was conducted in IPyhton 5.0. Space. The sentiment analysis results revealed the percentage of positive text reviews by men (80%) and women (80%) were identical. Women reviewers (12%) provided more neutral (e.g., 3 out of 5 stars) star reviews than men (6%). Star-based reviews were more negative than the text-based reviews. In other words, while 80% men and women wrote positive reviews for the stores, less than 70% ended up giving 4 or 5 stars in those reviews. One of the key takeaways of the study is that star reviews provide slightly negative sentiment of the consumer reviews. Therefore, in order to understand sentiment towards apparel products, one might need to combine both star and text aspects of consumer reviews. This study used a specific dataset consisting of selected apparel stores from particular geographical locations (the information was not given for privacy concern). Future studies need to include more data from more stores and locations to generalize the findings of the study.

Keywords: apparel, consumer review, sentiment analysis, gender

Procedia PDF Downloads 149
27901 Social Media Mining with R. Twitter Analyses

Authors: Diana Codat

Abstract:

Tweets' analysis is part of text mining. Each document is a written text. It's possible to apply the usual text search techniques, in particular by switching to the bag-of-words representation. But the tweets induce peculiarities. Some may enrich the analysis. Thus, their length is calibrated (at least as far as public messages are concerned), special characters make it possible to identify authors (@) and themes (#), the tweet and retweet mechanisms make it possible to follow the diffusion of the information. Conversely, other characteristics may disrupt the analyzes. Because space is limited, authors often use abbreviations, emoticons to express feelings, and they do not pay much attention to spelling. All this creates noise that can complicate the task. The tweets carry a lot of potentially interesting information. Their exploitation is one of the main axes of the analysis of the social networks. We show how to access Twitter-related messages. We will initiate a study of the properties of the tweets, and we will follow up on the exploitation of the content of the messages. We will work under R with the package 'twitteR'. The study of tweets is a strong focus of analysis of social networks because Twitter has become an important vector of communication. This example shows that it is easy to initiate an analysis from data extracted directly online. The data preparation phase is of great importance.

Keywords: data mining, language R, social networks, Twitter

Procedia PDF Downloads 164
27900 Multimodal Sentiment Analysis With Web Based Application

Authors: Shreyansh Singh, Afroz Ahmed

Abstract:

Sentiment Analysis intends to naturally reveal the hidden mentality that we hold towards an entity. The total of this assumption over a populace addresses sentiment surveying and has various applications. Current text-based sentiment analysis depends on the development of word embeddings and Machine Learning models that take in conclusion from enormous text corpora. Sentiment Analysis from text is presently generally utilized for consumer loyalty appraisal and brand insight investigation. With the expansion of online media, multimodal assessment investigation is set to carry new freedoms with the appearance of integral information streams for improving and going past text-based feeling examination using the new transforms methods. Since supposition can be distinguished through compelling follows it leaves, like facial and vocal presentations, multimodal opinion investigation offers good roads for examining facial and vocal articulations notwithstanding the record or printed content. These methodologies use the Recurrent Neural Networks (RNNs) with the LSTM modes to increase their performance. In this study, we characterize feeling and the issue of multimodal assessment investigation and audit ongoing advancements in multimodal notion examination in various spaces, including spoken surveys, pictures, video websites, human-machine, and human-human connections. Difficulties and chances of this arising field are additionally examined, promoting our theory that multimodal feeling investigation holds critical undiscovered potential.

Keywords: sentiment analysis, RNN, LSTM, word embeddings

Procedia PDF Downloads 105
27899 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 86
27898 Written Argumentative Texts in Elementary School: The Development of Text Structure and Its Relation to Reading Comprehension

Authors: Sara Zadunaisky Ehrlich, Batia Seroussi, Anat Stavans

Abstract:

Text structure is a parameter of text quality. This study investigated the structure of written argumentative texts produced by elementary school age children. We set two objectives: to identify and trace the structural components of the argumentative texts and to investigate whether reading comprehension skills were correlated with text structure. 293 school children from 2nd to 5th grades were asked to write two argumentative texts about informal or everyday life controversial topics and completed two reading tasks that targeted different levels of text comprehension. The findings indicated, on the one hand, significant developmental differences between mature and more novice writers in terms of text length and mean proportion of clauses produced for a better elaboration of the different text components. On the other hand, with certain fluctuations, no meaningful differences were found in terms of presence of text structure: at all grade levels, elementary school children produced the basic and minimal structure that included the writer's argument and reasons or arguments' supports. Counter-arguments were scarce even in the upper grades. While the children captured that essentially an argument must be justified, the more the number of supports produced, the fewer the clauses the children produced. Last, weak to mild relations were found between reading comprehension and argumentative text structure. Nevertheless, children who scored higher on sophisticated questions that require inferential or world knowledge displayed more elaborated structures in terms of text length and size of supports to the writer's argument. These findings indicate how school-age children perceive the basic template of an argument with future implications regarding how to elaborate written arguments.

Keywords: argumentative text, text structure, elementary school children, written argumentations

Procedia PDF Downloads 153
27897 The Morphology of Sri Lankan Text Messages

Authors: Chamindi Dilkushi Senaratne

Abstract:

Communicating via a text or an SMS (Short Message Service) has become an integral part of our daily lives. With the increase in the use of mobile phones, text messaging has become a genre by itself worth researching and studying. It is undoubtedly a major phenomenon revealing language change. This paper attempts to describe the morphological processes of text language of urban bilinguals in Sri Lanka. It will be a typological study based on 500 English text messages collected from urban bilinguals residing in Colombo. The messages are selected by categorizing the deviant forms of language use apparent in text messages. These stylistic deviations are a deliberate skilled performance by the users of the language possessing an in-depth knowledge of linguistic systems to create new words and thereby convey their linguistic identity and individual and group solidarity via the message. The findings of the study solidifies arguments that the manipulation of language in text messages is both creative and appropriate. In addition, code mixing theories will be used to identify how existing morphological processes are adapted by bilingual users in Sri Lanka when texting. The study will reveal processes such as omission, initialism, insertion and alternation in addition to other identified linguistic features in text language. The corpus reveals the most common morphological processes used by Sri Lankan urban bilinguals when sending texts.

Keywords: bilingual, deviations, morphology, texts

Procedia PDF Downloads 257
27896 Text Analysis to Support Structuring and Modelling a Public Policy Problem-Outline of an Algorithm to Extract Inferences from Textual Data

Authors: Claudia Ehrentraut, Osama Ibrahim, Hercules Dalianis

Abstract:

Policy making situations are real-world problems that exhibit complexity in that they are composed of many interrelated problems and issues. To be effective, policies must holistically address the complexity of the situation rather than propose solutions to single problems. Formulating and understanding the situation and its complex dynamics, therefore, is a key to finding holistic solutions. Analysis of text based information on the policy problem, using Natural Language Processing (NLP) and Text analysis techniques, can support modelling of public policy problem situations in a more objective way based on domain experts knowledge and scientific evidence. The objective behind this study is to support modelling of public policy problem situations, using text analysis of verbal descriptions of the problem. We propose a formal methodology for analysis of qualitative data from multiple information sources on a policy problem to construct a causal diagram of the problem. The analysis process aims at identifying key variables, linking them by cause-effect relationships and mapping that structure into a graphical representation that is adequate for designing action alternatives, i.e., policy options. This study describes the outline of an algorithm used to automate the initial step of a larger methodological approach, which is so far done manually. In this initial step, inferences about key variables and their interrelationships are extracted from textual data to support a better problem structuring. A small prototype for this step is also presented.

Keywords: public policy, problem structuring, qualitative analysis, natural language processing, algorithm, inference extraction

Procedia PDF Downloads 577
27895 Understanding the Qualitative Nature of Product Reviews by Integrating Text Processing Algorithm and Usability Feature Extraction

Authors: Cherry Yieng Siang Ling, Joong Hee Lee, Myung Hwan Yun

Abstract:

The quality of a product to be usable has become the basic requirement in consumer’s perspective while failing the requirement ends up the customer from not using the product. Identifying usability issues from analyzing quantitative and qualitative data collected from usability testing and evaluation activities aids in the process of product design, yet the lack of studies and researches regarding analysis methodologies in qualitative text data of usability field inhibits the potential of these data for more useful applications. While the possibility of analyzing qualitative text data found with the rapid development of data analysis studies such as natural language processing field in understanding human language in computer, and machine learning field in providing predictive model and clustering tool. Therefore, this research aims to study the application capability of text processing algorithm in analysis of qualitative text data collected from usability activities. This research utilized datasets collected from LG neckband headset usability experiment in which the datasets consist of headset survey text data, subject’s data and product physical data. In the analysis procedure, which integrated with the text-processing algorithm, the process includes training of comments onto vector space, labeling them with the subject and product physical feature data, and clustering to validate the result of comment vector clustering. The result shows 'volume and music control button' as the usability feature that matches best with the cluster of comment vectors where centroid comments of a cluster emphasized more on button positions, while centroid comments of the other cluster emphasized more on button interface issues. When volume and music control buttons are designed separately, the participant experienced less confusion, and thus, the comments mentioned only about the buttons' positions. While in the situation where the volume and music control buttons are designed as a single button, the participants experienced interface issues regarding the buttons such as operating methods of functions and confusion of functions' buttons. The relevance of the cluster centroid comments with the extracted feature explained the capability of text processing algorithms in analyzing qualitative text data from usability testing and evaluations.

Keywords: usability, qualitative data, text-processing algorithm, natural language processing

Procedia PDF Downloads 273
27894 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev

Abstract:

Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: data analysis, drug development, sentiment analysis, text-mining

Procedia PDF Downloads 139
27893 Text-to-Speech in Azerbaijani Language via Transfer Learning in a Low Resource Environment

Authors: Dzhavidan Zeinalov, Bugra Sen, Firangiz Aslanova

Abstract:

Most text-to-speech models cannot operate well in low-resource languages and require a great amount of high-quality training data to be considered good enough. Yet, with the improvements made in ASR systems, it is now much easier than ever to collect data for the design of custom text-to-speech models. In this work, our work on using the ASR model to collect data to build a viable text-to-speech system for one of the leading financial institutions of Azerbaijan will be outlined. NVIDIA’s implementation of the Tacotron 2 model was utilized along with the HiFiGAN vocoder. As for the training, the model was first trained with high-quality audio data collected from the Internet, then fine-tuned on the bank’s single speaker call center data. The results were then evaluated by 50 different listeners and got a mean opinion score of 4.17, displaying that our method is indeed viable. With this, we have successfully designed the first text-to-speech model in Azerbaijani and publicly shared 12 hours of audiobook data for everyone to use.

Keywords: Azerbaijani language, HiFiGAN, Tacotron 2, text-to-speech, transfer learning, whisper

Procedia PDF Downloads 28
27892 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation

Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves

Abstract:

Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.

Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP

Procedia PDF Downloads 81