Search results for: short text analysis
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 29381

Search results for: short text analysis

29321 Multi-Class Text Classification Using Ensembles of Classifiers

Authors: Syed Basit Ali Shah Bukhari, Yan Qiang, Saad Abdul Rauf, Syed Saqlaina Bukhari

Abstract:

Text Classification is the methodology to classify any given text into the respective category from a given set of categories. It is highly important and vital to use proper set of pre-processing , feature selection and classification techniques to achieve this purpose. In this paper we have used different ensemble techniques along with variance in feature selection parameters to see the change in overall accuracy of the result and also on some other individual class based features which include precision value of each individual category of the text. After subjecting our data through pre-processing and feature selection techniques , different individual classifiers were tested first and after that classifiers were combined to form ensembles to increase their accuracy. Later we also studied the impact of decreasing the classification categories on over all accuracy of data. Text classification is highly used in sentiment analysis on social media sites such as twitter for realizing people’s opinions about any cause or it is also used to analyze customer’s reviews about certain products or services. Opinion mining is a vital task in data mining and text categorization is a back-bone to opinion mining.

Keywords: Natural Language Processing, Ensemble Classifier, Bagging Classifier, AdaBoost

Procedia PDF Downloads 207
29320 A Socio-Pragmatic Investigation of Gender Enactment in New Month Text Messages

Authors: Esther Robert, Romanus Aboh

Abstract:

This paper undertakes a socio-pragmatic investigation of gender enactment in new month text messages. This study employs Gumperz’s Interactional Sociolinguistics as its theoretical point of reference to investigate how people create meaning through social interaction. This theory attempts to analyse any social interaction based on contextualization cues and presuppositions. This study explores the appropriateness of language used in texting. The text messages are collected from different mobile phones from different genders, which form the data for this paper. The study observes remarkable differences between genders in the use of informal language. The study reveals that men and women differ remarkably in conversational interaction as well as in writing. While it is observed that women are emotional, orderly, and meticulous, detailed and observed certain grammatical rules, men are casual, brief and appear to show evidence that less attention is paid to grammatical rules. Also, the study shows women as relaxing, showing love, care, concern with their emotive, spirit-raising and touching language, while mean are direct, short, and straight to the point. It is discovered through the study that women behave this way because of their brain-wiring. That is why language and communication matter more to women than to men and this reflects in their new month text messages.

Keywords: difference, emotionalised expressions, gender, texting

Procedia PDF Downloads 220
29319 In-Context Meta Learning for Automatic Designing Pretext Tasks for Self-Supervised Image Analysis

Authors: Toktam Khatibi

Abstract:

Self-supervised learning (SSL) includes machine learning models that are trained on one aspect and/or one part of the input to learn other aspects and/or part of it. SSL models are divided into two different categories, including pre-text task-based models and contrastive learning ones. Pre-text tasks are some auxiliary tasks learning pseudo-labels, and the trained models are further fine-tuned for downstream tasks. However, one important disadvantage of SSL using pre-text task solving is defining an appropriate pre-text task for each image dataset with a variety of image modalities. Therefore, it is required to design an appropriate pretext task automatically for each dataset and each downstream task. To the best of our knowledge, the automatic designing of pretext tasks for image analysis has not been considered yet. In this paper, we present a framework based on In-context learning that describes each task based on its input and output data using a pre-trained image transformer. Our proposed method combines the input image and its learned description for optimizing the pre-text task design and its hyper-parameters using Meta-learning models. The representations learned from the pre-text tasks are fine-tuned for solving the downstream tasks. We demonstrate that our proposed framework outperforms the compared ones on unseen tasks and image modalities in addition to its superior performance for previously known tasks and datasets.

Keywords: in-context learning (ICL), meta learning, self-supervised learning (SSL), vision-language domain, transformers

Procedia PDF Downloads 48
29318 The Analysis of Defects Prediction in Injection Molding

Authors: Mehdi Moayyedian, Kazem Abhary, Romeo Marian

Abstract:

This paper presents an evaluation of a plastic defect in injection molding before it occurs in the process; it is known as the short shot defect. The evaluation of different parameters which affect the possibility of short shot defect is the aim of this paper. The analysis of short shot possibility is conducted via SolidWorks Plastics and Taguchi method to determine the most significant parameters. Finite Element Method (FEM) is employed to analyze two circular flat polypropylene plates of 1 mm thickness. Filling time, part cooling time, pressure holding time, melt temperature and gate type are chosen as process and geometric parameters, respectively. A methodology is presented herein to predict the possibility of the short-shot occurrence. The analysis determined melt temperature is the most influential parameter affecting the possibility of short shot defect with a contribution of 74.25%, and filling time with a contribution of 22%, followed by gate type with a contribution of 3.69%. It was also determined the optimum level of each parameter leading to a reduction in the possibility of short shot are gate type at level 1, filling time at level 3 and melt temperature at level 3. Finally, the most significant parameters affecting the possibility of short shot were determined to be melt temperature, filling time, and gate type.

Keywords: injection molding, plastic defects, short shot, Taguchi method

Procedia PDF Downloads 198
29317 A Stylistic Analysis of the Short Story ‘The Escape’ by Qaisra Shahraz

Authors: Huma Javed

Abstract:

Stylistics is a broad term that is concerned with both literature and linguistics, due to which the significance of the stylistics increases. This research aims to analyze Qaisra Shahraz's short story ‘The Escape’ from the stylistic analysis viewpoint. The focus of this study is on three aspects grammar category, lexical category, and figure of speech of the short story. The research designs for this article are both explorative and descriptive. The analysis of the data shows that the writer has used more nouns in the story as compared to other lexical items, which suggests that story has a descriptive style rather than narrative.

Keywords: The Escape, stylistics, grammatical category, lexical category, figure of speech

Procedia PDF Downloads 194
29316 Use of Short Piles for Stabilizing the Side Slope of the Road Embankment along the Canal

Authors: Monapat Sasingha, Suttisak Soralump

Abstract:

This research presents the behavior of slope of the road along the canal stabilized by short piles. In this investigation, the centrifuge machine was used, modelling the condition of the water levels in the canal. The centrifuge tests were performed at 35 g. To observe the movement of the soil, visual analysis was performed to evaluate the failure behavior. Conclusively, the use of short piles to stabilize the canal slope proved to be an effective solution. However, the certain amount of settlement was found behind the short pile rows.

Keywords: centrifuge test, slope failure, embankment, stability of slope

Procedia PDF Downloads 235
29315 Math and Religion in Arvo Pärt's Out of the Depths

Authors: Ismael Lins Patriota

Abstract:

Arvo Pärt is an Estonian composer who started his musical career under the influence of twelve-tone music and dodecaphonism. From 1968 to 1976, he isolated himself to search for a new path as a composer. In this period, he converted to Russian orthodoxy and changed his composing to tintinnabuli, a musical technique combining triadic chords with simple melodies. The recent analysis of Pärt’s output demonstrates that mathematics remained an influence after the invention of tintinnabuli. The present discussion deals with the relationship between math and religion in his work Out of the Depths (1980), proposing a musical-text approach and examining the minimum elements of the piece, such as motives and sub-phrases, which is the main focus of this work, considering text patterns and the role of the organ, which also uses the tintinnabuli system. The analysis of these elements demonstrates that Pärt uses math as a formal element, and the composer combines musical parameters to execute a personal and innovative interpretation of the text.

Keywords: Arvo Pärt, Out of the Depths, math, religion, analysis

Procedia PDF Downloads 54
29314 Polycode Texts in Communication of Antisocial Groups: Functional and Pragmatic Aspects

Authors: Ivan Potapov

Abstract:

Background: The aim of this paper is to investigate poly code texts in the communication of youth antisocial groups. Nowadays, the notion of a text has numerous interpretations. Besides all the approaches to defining a text, we must take into account semiotic and cultural-semiotic ones. Rapidly developing IT, world globalization, and new ways of coding of information increase the role of the cultural-semiotic approach. However, the development of computer technologies leads also to changes in the text itself. Polycode texts play a more and more important role in the everyday communication of the younger generation. Therefore, the research of functional and pragmatic aspects of both verbal and non-verbal content is actually quite important. Methods and Material: For this survey, we applied the combination of four methods of text investigation: not only intention and content analysis but also semantic and syntactic analysis. Using these methods provided us with information on general text properties, the content of transmitted messages, and each communicants’ intentions. Besides, during our research, we figured out the social background; therefore, we could distinguish intertextual connections between certain types of polycode texts. As the sources of the research material, we used 20 public channels in the popular messenger Telegram and data extracted from smartphones, which belonged to arrested members of antisocial groups. Findings: This investigation let us assert that polycode texts can be characterized as highly intertextual language unit. Moreover, we could outline the classification of these texts based on communicants’ intentions. The most common types of antisocial polycode texts are a call to illegal actions and agitation. What is more, each type has its own semantic core: it depends on the sphere of communication. However, syntactic structure is universal for most of the polycode texts. Conclusion: Polycode texts play important role in online communication. The results of this investigation demonstrate that in some social groups using these texts has a destructive influence on the younger generation and obviously needs further researches.

Keywords: text, polycode text, internet linguistics, text analysis, context, semiotics, sociolinguistics

Procedia PDF Downloads 106
29313 A Comparative Study about the Use of SMS in Formal Writing of the Students in Universities

Authors: Sajjad Hussain

Abstract:

Technology has revolutionized the way of communication around the globe. Its use and users are multiplying with every passing minute. The current study reveals the effect of SMS on the formal writing of the students. Students are the regular users of this service and have become addict to short language. This short language is understandable to a particular community and not to the whole as it does not adhere to the Standard English writing practices. Data has been collected from quiz, assignments text and through questionaries’ which supports this postulate that students are frequently practicing it in their formal writing. Certain corrosive measures needs to be taken to address the issue. Second language learners have been found it practicing to greater extent.

Keywords: information technology, SMS, messaging, communication, social media, internet, language

Procedia PDF Downloads 515
29312 Poetics of the Connecting ha’: A Textual Study in the Poetry of Al-Husari Al-Qayrawani

Authors: Mahmoud al-Ashiriy

Abstract:

This paper begins from the idea that the real history of literature is the history of its style. And since the rhyme –as known- is not merely the last letter, that have received a lot of analysis and investigation, but it is a collection of other values in addition to its different markings. This paper will explore the work of the connecting ha’ and its effectiveness in shaping the text of poetry, since it establishes vocal rhythms in addition to its role in indicating references through the pronoun, vertically through the poem through the sequence of its verses, also horizontally through what environs the one verse of sentences. If the scientific formation of prosody stopped at the possibilities and prohibitions; literary criticism and poetry studies should explore what is above the rule of aesthetic horizon of poetic effectiveness that varies from a text to another, a poet to another, a literary period to another, or from a poetic taste to another. Then the paper will explore this poetic essence in the texts of the famous Andalusian Poet Al-Husari Al-Qayrawani through his well-known Daliyya (a poem that its verses end with the letter D), and the role of the connecting ha’ in fulfilling its text and the accomplishment of its poetics, departing from this to the diwan (the big collection of poems) also as a higher text that surpasses the text/poem, and through what it represents of effectiveness the work of the phenomenon in accomplishing the poetics of the poem of Al-Husari Al-Qayrawani who is one of the pillars of Arabic poetics in Andalusia.

Keywords: Al-Husari Al-Qayrawni, poetics, rhyme, stylistics, science of the text

Procedia PDF Downloads 537
29311 Research on the Landscape of Xi'an Ancient City Based on the Poetry Text of Tang Dynasty

Authors: Zou Yihui

Abstract:

The integration of the traditional landscape of the ancient city and the poet's emotions and symbolization into ancient poetry is the unique cultural gene and spiritual core of the historical city, and re-understanding the historical landscape pattern from the poetry is conducive to continuing the historical city context and improving the current situation of the gradual decline of the poetry of the modern historical urban landscape. Starting from Tang poetry uses semantic analysis methods、combined with text mining technology, entry mining, word frequency analysis, and cluster analysis of the landscape information of Tang Chang'an City were carried out, and the method framework for analyzing the urban landscape form based on poetry text was constructed. Nearly 160 poems describing the landscape of Tang Chang'an City were screened, and the poetic landscape characteristics of Tang Chang'an City were sorted out locally in order to combine with modern urban spatial development to continue the urban spatial context.

Keywords: Tang Chang'an City, poetic texts, semantic analysis, historical landscape

Procedia PDF Downloads 9
29310 The Acquisition of Case in Biological Domain Based on Text Mining

Authors: Shen Jian, Hu Jie, Qi Jin, Liu Wei Jie, Chen Ji Yi, Peng Ying Hong

Abstract:

In order to settle the problem of acquiring case in biological related to design problems, a biometrics instance acquisition method based on text mining is presented. Through the construction of corpus text vector space and knowledge mining, the feature selection, similarity measure and case retrieval method of text in the field of biology are studied. First, we establish a vector space model of the corpus in the biological field and complete the preprocessing steps. Then, the corpus is retrieved by using the vector space model combined with the functional keywords to obtain the biological domain examples related to the design problems. Finally, we verify the validity of this method by taking the example of text.

Keywords: text mining, vector space model, feature selection, biologically inspired design

Procedia PDF Downloads 229
29309 A Proposed Approach for Emotion Lexicon Enrichment

Authors: Amr Mansour Mohsen, Hesham Ahmed Hassan, Amira M. Idrees

Abstract:

Document Analysis is an important research field that aims to gather the information by analyzing the data in documents. As one of the important targets for many fields is to understand what people actually want, sentimental analysis field has been one of the vital fields that are tightly related to the document analysis. This research focuses on analyzing text documents to classify each document according to its opinion. The aim of this research is to detect the emotions from text documents based on enriching the lexicon with adapting their content based on semantic patterns extraction. The proposed approach has been presented, and different experiments are applied by different perspectives to reveal the positive impact of the proposed approach on the classification results.

Keywords: document analysis, sentimental analysis, emotion detection, WEKA tool, NRC lexicon

Procedia PDF Downloads 399
29308 Developing House’s Model to Assess the Translation of Key Cultural Texts

Authors: Raja Al-Ghamdi

Abstract:

This paper aims to systematically assess the translation of key cultural texts. The paper, therefore, proposes a modification of the discourse analysis model for translation quality assessment introduced by the linguist Juliane House (1977, 1997, 2015). The data for analysis has been chosen from a religious text that has never been investigated before. It is an overt translation of the biography of Prophet Mohammad. The book is written originally in Arabic and translated into English. A soft copy of the translation, entitled The Sealed Nectar, is posted on numerous websites including the Internet Archive library which offers a free access to everyone. The text abounds with linguistic and cultural phenomena relevant to Islamic and Arab lingua-cultural context which make its translation a challenge, as well as its assessment. Interesting findings show that (1) culturemes are rich points and both the translator’s subjectivity and intervention are apparent in mediating them, (2) given the nature of historical narration, the source text reflects the author’s positive shading, whereas the target text reflects the translator’s axiological orientation as neutrally shaded, and, (3) linguistic gaps, metaphorical expressions and intertextuality are major stimuli to compensation strategies.

Keywords: Arabic-English discourse analysis, key cultural texts, overt translation, quality assessment

Procedia PDF Downloads 253
29307 Text Similarity in Vector Space Models: A Comparative Study

Authors: Omid Shahmirzadi, Adam Lugowski, Kenneth Younge

Abstract:

Automatic measurement of semantic text similarity is an important task in natural language processing. In this paper, we evaluate the performance of different vector space models to perform this task. We address the real-world problem of modeling patent-to-patent similarity and compare TFIDF (and related extensions), topic models (e.g., latent semantic indexing), and neural models (e.g., paragraph vectors). Contrary to expectations, the added computational cost of text embedding methods is justified only when: 1) the target text is condensed; and 2) the similarity comparison is trivial. Otherwise, TFIDF performs surprisingly well in other cases: in particular for longer and more technical texts or for making finer-grained distinctions between nearest neighbors. Unexpectedly, extensions to the TFIDF method, such as adding noun phrases or calculating term weights incrementally, were not helpful in our context.

Keywords: big data, patent, text embedding, text similarity, vector space model

Procedia PDF Downloads 142
29306 Social Media Mining with R. Twitter Analyses

Authors: Diana Codat

Abstract:

Tweets' analysis is part of text mining. Each document is a written text. It's possible to apply the usual text search techniques, in particular by switching to the bag-of-words representation. But the tweets induce peculiarities. Some may enrich the analysis. Thus, their length is calibrated (at least as far as public messages are concerned), special characters make it possible to identify authors (@) and themes (#), the tweet and retweet mechanisms make it possible to follow the diffusion of the information. Conversely, other characteristics may disrupt the analyzes. Because space is limited, authors often use abbreviations, emoticons to express feelings, and they do not pay much attention to spelling. All this creates noise that can complicate the task. The tweets carry a lot of potentially interesting information. Their exploitation is one of the main axes of the analysis of the social networks. We show how to access Twitter-related messages. We will initiate a study of the properties of the tweets, and we will follow up on the exploitation of the content of the messages. We will work under R with the package 'twitteR'. The study of tweets is a strong focus of analysis of social networks because Twitter has become an important vector of communication. This example shows that it is easy to initiate an analysis from data extracted directly online. The data preparation phase is of great importance.

Keywords: data mining, language R, social networks, Twitter

Procedia PDF Downloads 152
29305 Role of Gender in Apparel Stores' Consumer Review: A Sentiment Analysis

Authors: Sarif Ullah Patwary, Matthew Heinrich, Brandon Payne

Abstract:

The ubiquity of web 2.0 platforms, in the form of wikis, social media (e.g., Facebook, Twitter, etc.) and online review portals (e.g., Yelp), helps shape today’s apparel consumers’ purchasing decision. Online reviews play important role towards consumers’ apparel purchase decision. Each of the consumer reviews carries a sentiment (positive, negative or neutral) towards products. Commercially, apparel brands and retailers analyze sentiment of this massive amount of consumer review data to update their inventory and bring new products in the market. The purpose of this study is to analyze consumer reviews of selected apparel stores with a view to understand, 1) the difference of sentiment expressed through men’s and woman’s text reviews, 2) the difference of sentiment expressed through men’s and woman’s star-based reviews, and 3) the difference of sentiment between star-based reviews and text-based reviews. A total of 9,363 reviews (1,713 men and 7,650 women) were collected using Yelp Dataset Challenge. Sentiment analysis of collected reviews was carried out in two dimensions: star-based reviews and text-based reviews. Sentiment towards apparel stores expressed through star-based reviews was deemed: 1) positive for 3 or 4 stars 2) negative for 1 or 2 stars and 3) neutral for 3 stars. Sentiment analysis of text-based reviews was carried out using Bing Liu dictionary. The analysis was conducted in IPyhton 5.0. Space. The sentiment analysis results revealed the percentage of positive text reviews by men (80%) and women (80%) were identical. Women reviewers (12%) provided more neutral (e.g., 3 out of 5 stars) star reviews than men (6%). Star-based reviews were more negative than the text-based reviews. In other words, while 80% men and women wrote positive reviews for the stores, less than 70% ended up giving 4 or 5 stars in those reviews. One of the key takeaways of the study is that star reviews provide slightly negative sentiment of the consumer reviews. Therefore, in order to understand sentiment towards apparel products, one might need to combine both star and text aspects of consumer reviews. This study used a specific dataset consisting of selected apparel stores from particular geographical locations (the information was not given for privacy concern). Future studies need to include more data from more stores and locations to generalize the findings of the study.

Keywords: apparel, consumer review, sentiment analysis, gender

Procedia PDF Downloads 139
29304 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: hidden markov model, natural language processing, POS tagging, viterbi algorithm

Procedia PDF Downloads 303
29303 Multimodal Sentiment Analysis With Web Based Application

Authors: Shreyansh Singh, Afroz Ahmed

Abstract:

Sentiment Analysis intends to naturally reveal the hidden mentality that we hold towards an entity. The total of this assumption over a populace addresses sentiment surveying and has various applications. Current text-based sentiment analysis depends on the development of word embeddings and Machine Learning models that take in conclusion from enormous text corpora. Sentiment Analysis from text is presently generally utilized for consumer loyalty appraisal and brand insight investigation. With the expansion of online media, multimodal assessment investigation is set to carry new freedoms with the appearance of integral information streams for improving and going past text-based feeling examination using the new transforms methods. Since supposition can be distinguished through compelling follows it leaves, like facial and vocal presentations, multimodal opinion investigation offers good roads for examining facial and vocal articulations notwithstanding the record or printed content. These methodologies use the Recurrent Neural Networks (RNNs) with the LSTM modes to increase their performance. In this study, we characterize feeling and the issue of multimodal assessment investigation and audit ongoing advancements in multimodal notion examination in various spaces, including spoken surveys, pictures, video websites, human-machine, and human-human connections. Difficulties and chances of this arising field are additionally examined, promoting our theory that multimodal feeling investigation holds critical undiscovered potential.

Keywords: sentiment analysis, RNN, LSTM, word embeddings

Procedia PDF Downloads 90
29302 The Syntactic Features of Islamic Legal Texts and Their Implications for Translation

Authors: Rafat Y. Alwazna

Abstract:

Certain religious texts are deemed part of legal texts that are characterised by high sensitivity and sacredness. Amongst such religious texts are Islamic legal texts that are replete with Islamic legal terms that designate particular legal concepts peculiar to Islamic legal system and legal culture. However, from the syntactic perspective, Islamic legal texts prove lengthy, condensed and convoluted, with little use of punctuation system, but with an extensive use of subordinations and co-ordinations, which separate the main verb from the subject, and which, of course, carry a heavy load of legal detail. The present paper seeks to examine the syntactic features of Islamic legal texts through analysing a short text of Islamic jurisprudence in an attempt at exploring the syntactic features that characterise this type of legal text. A translation of this text into legal English is then exercised to find the translation implications that have emerged as a result of the English translation. Based on these implications, the paper compares and contrasts the syntactic features of Islamic legal texts to those of legal English texts. Finally, the present paper argues that there are a number of syntactic features of Islamic legal texts, such as nominalisation, passivisation, little use of punctuation system, the use of the Arabic cohesive device, etc., which are also possessed by English legal texts except for the last feature and with some variations. The paper also claims that when rendering an Islamic legal text into legal English, certain implications emerge, such as the necessity of a sentence break, the omission of the cohesive device concerned and the increase in the use of nominalisation, passivisation, passive participles, and so on.

Keywords: English legal texts, Islamic legal texts, nominalisation, participles, passivisation, syntactic features, translation implications

Procedia PDF Downloads 189
29301 Text Analysis to Support Structuring and Modelling a Public Policy Problem-Outline of an Algorithm to Extract Inferences from Textual Data

Authors: Claudia Ehrentraut, Osama Ibrahim, Hercules Dalianis

Abstract:

Policy making situations are real-world problems that exhibit complexity in that they are composed of many interrelated problems and issues. To be effective, policies must holistically address the complexity of the situation rather than propose solutions to single problems. Formulating and understanding the situation and its complex dynamics, therefore, is a key to finding holistic solutions. Analysis of text based information on the policy problem, using Natural Language Processing (NLP) and Text analysis techniques, can support modelling of public policy problem situations in a more objective way based on domain experts knowledge and scientific evidence. The objective behind this study is to support modelling of public policy problem situations, using text analysis of verbal descriptions of the problem. We propose a formal methodology for analysis of qualitative data from multiple information sources on a policy problem to construct a causal diagram of the problem. The analysis process aims at identifying key variables, linking them by cause-effect relationships and mapping that structure into a graphical representation that is adequate for designing action alternatives, i.e., policy options. This study describes the outline of an algorithm used to automate the initial step of a larger methodological approach, which is so far done manually. In this initial step, inferences about key variables and their interrelationships are extracted from textual data to support a better problem structuring. A small prototype for this step is also presented.

Keywords: public policy, problem structuring, qualitative analysis, natural language processing, algorithm, inference extraction

Procedia PDF Downloads 561
29300 Predicting Success and Failure in Drug Development Using Text Analysis

Authors: Zhi Hao Chow, Cian Mulligan, Jack Walsh, Antonio Garzon Vico, Dimitar Krastev

Abstract:

Drug development is resource-intensive, time-consuming, and increasingly expensive with each developmental stage. The success rates of drug development are also relatively low, and the resources committed are wasted with each failed candidate. As such, a reliable method of predicting the success of drug development is in demand. The hypothesis was that some examples of failed drug candidates are pushed through developmental pipelines based on false confidence and may possess common linguistic features identifiable through sentiment analysis. Here, the concept of using text analysis to discover such features in research publications and investor reports as predictors of success was explored. R studios were used to perform text mining and lexicon-based sentiment analysis to identify affective phrases and determine their frequency in each document, then using SPSS to determine the relationship between our defined variables and the accuracy of predicting outcomes. A total of 161 publications were collected and categorised into 4 groups: (i) Cancer treatment, (ii) Neurodegenerative disease treatment, (iii) Vaccines, and (iv) Others (containing all other drugs that do not fit into the 3 categories). Text analysis was then performed on each document using 2 separate datasets (BING and AFINN) in R within the category of drugs to determine the frequency of positive or negative phrases in each document. A relative positivity and negativity value were then calculated by dividing the frequency of phrases with the word count of each document. Regression analysis was then performed with SPSS statistical software on each dataset (values from using BING or AFINN dataset during text analysis) using a random selection of 61 documents to construct a model. The remaining documents were then used to determine the predictive power of the models. Model constructed from BING predicts the outcome of drug performance in clinical trials with an overall percentage of 65.3%. AFINN model had a lower accuracy at predicting outcomes compared to the BING model at 62.5% but was not effective at predicting the failure of drugs in clinical trials. Overall, the study did not show significant efficacy of the model at predicting outcomes of drugs in development. Many improvements may need to be made to later iterations of the model to sufficiently increase the accuracy.

Keywords: data analysis, drug development, sentiment analysis, text-mining

Procedia PDF Downloads 125
29299 Understanding the Qualitative Nature of Product Reviews by Integrating Text Processing Algorithm and Usability Feature Extraction

Authors: Cherry Yieng Siang Ling, Joong Hee Lee, Myung Hwan Yun

Abstract:

The quality of a product to be usable has become the basic requirement in consumer’s perspective while failing the requirement ends up the customer from not using the product. Identifying usability issues from analyzing quantitative and qualitative data collected from usability testing and evaluation activities aids in the process of product design, yet the lack of studies and researches regarding analysis methodologies in qualitative text data of usability field inhibits the potential of these data for more useful applications. While the possibility of analyzing qualitative text data found with the rapid development of data analysis studies such as natural language processing field in understanding human language in computer, and machine learning field in providing predictive model and clustering tool. Therefore, this research aims to study the application capability of text processing algorithm in analysis of qualitative text data collected from usability activities. This research utilized datasets collected from LG neckband headset usability experiment in which the datasets consist of headset survey text data, subject’s data and product physical data. In the analysis procedure, which integrated with the text-processing algorithm, the process includes training of comments onto vector space, labeling them with the subject and product physical feature data, and clustering to validate the result of comment vector clustering. The result shows 'volume and music control button' as the usability feature that matches best with the cluster of comment vectors where centroid comments of a cluster emphasized more on button positions, while centroid comments of the other cluster emphasized more on button interface issues. When volume and music control buttons are designed separately, the participant experienced less confusion, and thus, the comments mentioned only about the buttons' positions. While in the situation where the volume and music control buttons are designed as a single button, the participants experienced interface issues regarding the buttons such as operating methods of functions and confusion of functions' buttons. The relevance of the cluster centroid comments with the extracted feature explained the capability of text processing algorithms in analyzing qualitative text data from usability testing and evaluations.

Keywords: usability, qualitative data, text-processing algorithm, natural language processing

Procedia PDF Downloads 253
29298 Intertextuality in Choreography: Investigation of Text and Movements in Making Choreography

Authors: Muhammad Fairul Azreen Mohd Zahid

Abstract:

Speech, text, and movement intensify aspects of creating choreography by connecting with emotional entanglements, tradition, literature, and other texts. This research focuses on the practice as research that will prioritise the choreography process as an inquiry approach. With the driven context, the study intervenes in critical conjunctions of choreographic theory, bringing together new reflections on the moving body, spaces of action, as well as intertextuality between text and movements in making choreography. Throughout the process, the researcher will introduce the level of deliberation from speech through movements and text to express emotion within a narrative context of an “illocutionary act.” This practice as research will produce a different meaning from the “utterance text” to “utterance movements” in the perspective of speech acts theory by J.L Austin based on fragmented text from “pidato adat” which has been used as opening speech in Randai. Looking at the theory of deconstruction by Jacque Derrida also will give a different meaning from the text. Nevertheless, the process of creating the choreography will also help to lay the basic normative structure implicit in “constative” (statement text/movement) and “performative” (command text/movement). Through this process, the researcher will also look at several methods of using text from two works by Joseph Gonzales, “Becoming King-The Pakyung Revisited” and Crystal Pite's “The Statement,” as references to produce different methods in making choreography. The perspective from the semiotic foundation will support how occurrences within dance discourses as texts through a semiotic lens. The method used in this research is qualitative, which includes an interview and simulation of the concept to get an outcome.

Keywords: intertextuality, choreography, speech act, performative, deconstruction

Procedia PDF Downloads 68
29297 Written Argumentative Texts in Elementary School: The Development of Text Structure and Its Relation to Reading Comprehension

Authors: Sara Zadunaisky Ehrlich, Batia Seroussi, Anat Stavans

Abstract:

Text structure is a parameter of text quality. This study investigated the structure of written argumentative texts produced by elementary school age children. We set two objectives: to identify and trace the structural components of the argumentative texts and to investigate whether reading comprehension skills were correlated with text structure. 293 school children from 2nd to 5th grades were asked to write two argumentative texts about informal or everyday life controversial topics and completed two reading tasks that targeted different levels of text comprehension. The findings indicated, on the one hand, significant developmental differences between mature and more novice writers in terms of text length and mean proportion of clauses produced for a better elaboration of the different text components. On the other hand, with certain fluctuations, no meaningful differences were found in terms of presence of text structure: at all grade levels, elementary school children produced the basic and minimal structure that included the writer's argument and reasons or arguments' supports. Counter-arguments were scarce even in the upper grades. While the children captured that essentially an argument must be justified, the more the number of supports produced, the fewer the clauses the children produced. Last, weak to mild relations were found between reading comprehension and argumentative text structure. Nevertheless, children who scored higher on sophisticated questions that require inferential or world knowledge displayed more elaborated structures in terms of text length and size of supports to the writer's argument. These findings indicate how school-age children perceive the basic template of an argument with future implications regarding how to elaborate written arguments.

Keywords: argumentative text, text structure, elementary school children, written argumentations

Procedia PDF Downloads 129
29296 Identification of Text Domains and Register Variation through the Analysis of Lexical Distribution in a Bangla Mass Media Text Corpus

Authors: Mahul Bhattacharyya, Niladri Sekhar Dash

Abstract:

The present research paper is an experimental attempt to investigate the nature of variation in the register in three major text domains, namely, social, cultural, and political texts collected from the corpus of Bangla printed mass media texts. This present study uses a corpus of a moderate amount of Bangla mass media text that contains nearly one million words collected from different media sources like newspapers, magazines, advertisements, periodicals, etc. The analysis of corpus data reveals that each text has certain lexical properties that not only control their identity but also mark their uniqueness across the domains. At first, the subject domains of the texts are classified into two parameters namely, ‘Genre' and 'Text Type'. Next, some empirical investigations are made to understand how the domains vary from each other in terms of lexical properties like both function and content words. Here the method of comparative-cum-contrastive matching of lexical load across domains is invoked through word frequency count to track how domain-specific words and terms may be marked as decisive indicators in the act of specifying the textual contexts and subject domains. The study shows that the common lexical stock that percolates across all text domains are quite dicey in nature as their lexicological identity does not have any bearing in the act of specifying subject domains. Therefore, it becomes necessary for language users to anchor upon certain domain-specific lexical items to recognize a text that belongs to a specific text domain. The eventual findings of this study confirm that texts belonging to different subject domains in Bangla news text corpus clearly differ on the parameters of lexical load, lexical choice, lexical clustering, lexical collocation. In fact, based on these parameters, along with some statistical calculations, it is possible to classify mass media texts into different types to mark their relation with regard to the domains they should actually belong. The advantage of this analysis lies in the proper identification of the linguistic factors which will give language users a better insight into the method they employ in text comprehension, as well as construct a systemic frame for designing text identification strategy for language learners. The availability of huge amount of Bangla media text data is useful for achieving accurate conclusions with a certain amount of reliability and authenticity. This kind of corpus-based analysis is quite relevant for a resource-poor language like Bangla, as no attempt has ever been made to understand how the structure and texture of Bangla mass media texts vary due to certain linguistic and extra-linguistic constraints that are actively operational to specific text domains. Since mass media language is assumed to be the most 'recent representation' of the actual use of the language, this study is expected to show how the Bangla news texts reflect the thoughts of the society and how they leave a strong impact on the thought process of the speech community.

Keywords: Bangla, corpus, discourse, domains, lexical choice, mass media, register, variation

Procedia PDF Downloads 154
29295 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation

Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves

Abstract:

Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.

Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP

Procedia PDF Downloads 67
29294 Off-Topic Text Detection System Using a Hybrid Model

Authors: Usama Shahid

Abstract:

Be it written documents, news columns, or students' essays, verifying the content can be a time-consuming task. Apart from the spelling and grammar mistakes, the proofreader is also supposed to verify whether the content included in the essay or document is relevant or not. The irrelevant content in any document or essay is referred to as off-topic text and in this paper, we will address the problem of off-topic text detection from a document using machine learning techniques. Our study aims to identify the off-topic content from a document using Echo state network model and we will also compare data with other models. The previous study uses Convolutional Neural Networks and TFIDF to detect off-topic text. We will rearrange the existing datasets and take new classifiers along with new word embeddings and implement them on existing and new datasets in order to compare the results with the previously existing CNN model.

Keywords: off topic, text detection, eco state network, machine learning

Procedia PDF Downloads 56
29293 Evaluating 8D Reports Using Text-Mining

Authors: Benjamin Kuester, Bjoern Eilert, Malte Stonis, Ludger Overmeyer

Abstract:

Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.

Keywords: 8D report, complaint management, evaluation system, text-mining

Procedia PDF Downloads 280
29292 One Plus One is More than Two: Why Nurse Recruiters Need to Use Various Multivariate Techniques to Understand the Limitations of the Concept of Emotional Intelligence

Authors: Austyn Snowden

Abstract:

Aim: To examine the construct validity of the Trait Emotional Intelligence Questionnaire Short form. Background: Emotional intelligence involves the identification and regulation of our own emotions and the emotions of others. It is therefore a potentially useful construct in the investigation of recruitment and retention in nursing and many questionnaires have been constructed to measure it. Design: Secondary analysis of existing dataset of responses to TEIQue-SF using concurrent application of Rasch analysis and confirmatory factor analysis. Method: First year undergraduate nursing and computing students completed Trait Emotional Intelligence Questionnaire-Short Form. Responses were analysed by synthesising results of Rasch analysis and confirmatory factor analysis.

Keywords: emotional intelligence, rasch analysis, factor analysis, nurse recruiters

Procedia PDF Downloads 436