Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5

topic model Related Abstracts

5 Improving Topic Quality of Scripts by Using Scene Similarity Based Word Co-Occurrence

Authors: Seong-Bae Park, Yunseok Noh, Chang-Uk Kwak, Sun-Joong Kim


Scripts are one of the basic text resources to understand broadcasting contents. Since broadcast media wields lots of influence over the public, tools for understanding broadcasting contents are more required. Topic modeling is the method to get the summary of the broadcasting contents from its scripts. Generally, scripts represent contents descriptively with directions and speeches. Scripts also provide scene segments that can be seen as semantic units. Therefore, a script can be topic modeled by treating a scene segment as a document. Because scripts consist of speeches mainly, however, relatively small co-occurrences among words in the scene segments are observed. This causes inevitably the bad quality of topics based on statistical learning method. To tackle this problem, we propose a method of learning with additional word co-occurrence information obtained using scene similarities. The main idea of improving topic quality is that the information that two or more texts are topically related can be useful to learn high quality of topics. In addition, by using high quality of topics, we can get information more accurate whether two texts are related or not. In this paper, we regard two scene segments are related if their topical similarity is high enough. We also consider that words are co-occurred if they are in topically related scene segments together. In the experiments, we showed the proposed method generates a higher quality of topics from Korean drama scripts than the baselines.

Keywords: broadcasting contents, scripts, text similarity, topic model

Procedia PDF Downloads 186
4 Mining User-Generated Contents to Detect Service Failures with Topic Model

Authors: Sung Ho Ha, Kyung Bae Park


Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.

Keywords: Text Mining, Visualization, latent dirichlet allocation, topic model, R program, user generated contents

Procedia PDF Downloads 80
3 AIPM:An Integrator and Pull Request Matching Model in Github

Authors: Li Xu, Yan Zhang, Zhifang Liao, Yanbing Li, Xiaoping Fan, Jinsong Wu


Pull Request (PR) is the primary method for code contributions from the external contributors in Github. PR review is an essential part of open source software developments for maintaining the quality of software. Matching a new PR of an appropriate integrator will make the PR review more effective. However, PR and integrator matching are now organized manually in Github. To reduce this cost, we presented an AIPM model to predict highly relevant integrator of incoming PRs. AIPM uses topic model to extract topics from the PRs, and builds a one-to-one correspondence between topics and integrators. Then, AIPM finds the most suitable integrator according to the maximum entry of the topic-document distribution. On average, AIPM can reach a precision of 60%, and even in some projects, can reach a precision of 80%.

Keywords: topic model, pull Request, integrator matching, Github, open source project

Procedia PDF Downloads 135
2 Emotion Oriented Students' Opinioned Topic Detection for Course Reviews in Massive Open Online Course

Authors: Zhi Liu, Xian Peng, Monika Domanska, Lingyun Kang, Sannyuya Liu


Massive Open education has become increasingly popular among worldwide learners. An increasing number of course reviews are being generated in Massive Open Online Course (MOOC) platform, which offers an interactive feedback channel for learners to express opinions and feelings in learning. These reviews typically contain subjective emotion and topic information towards the courses. However, it is time-consuming to artificially detect these opinions. In this paper, we propose an emotion-oriented topic detection model to automatically detect the students’ opinioned aspects in course reviews. The known overall emotion orientation and emotional words in each review are used to guide the joint probabilistic modeling of emotion and aspects in reviews. Through the experiment on real-life review data, it is verified that the distribution of course-emotion-aspect can be calculated to capture the most significant opinioned topics in each course unit. This proposed technique helps in conducting intelligent learning analytics for teachers to improve pedagogies and for developers to promote user experiences.

Keywords: Emotion recognition, topic model, Massive Open Online Course (MOOC), course reviews, topical aspects

Procedia PDF Downloads 111
1 Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis

Authors: Haiyi Zhang, Sidi Yang


Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.

Keywords: Text Mining, sentiment analysis, Twitter, topic model

Procedia PDF Downloads 35