Text Mining of Twitter Data Using a Latent Dirichlet Allocation Topic Model and Sentiment Analysis
Authors: Sidi Yang, Haiyi Zhang
Abstract:
Twitter is a microblogging platform, where millions of users daily share their attitudes, views, and opinions. Using a probabilistic Latent Dirichlet Allocation (LDA) topic model to discern the most popular topics in the Twitter data is an effective way to analyze a large set of tweets to find a set of topics in a computationally efficient manner. Sentiment analysis provides an effective method to show the emotions and sentiments found in each tweet and an efficient way to summarize the results in a manner that is clearly understood. The primary goal of this paper is to explore text mining, extract and analyze useful information from unstructured text using two approaches: LDA topic modelling and sentiment analysis by examining Twitter plain text data in English. These two methods allow people to dig data more effectively and efficiently. LDA topic model and sentiment analysis can also be applied to provide insight views in business and scientific fields.
Keywords: Text mining, Twitter, topic model, sentiment analysis.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1317350
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1808References:
[1] 2017 Social Media Monitor | Insights West. (2017). Insightswest.com. Retrieved from https://insightswest.com/reports/2017-social-media-monitor.
[2] Chakraborty, Goutam. "Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining" (PDF). SAS. Retrieved June 24, 2016.
[3] David M. Blei. Probabilistic topic models. In Communications of the ACM, volume 55, pages 77-84, 2012.
[4] Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). Lafferty, John, ed. "Latent Dirichlet Allocation". Journal of Machine Learning Research. 3 (4–5): pp. 993–1022. doi:10.1162/jmlr.2003.3.4-5.993.
[5] David M. Blei and John D. Lafferty. Topic models. In Text Mining: Classification, Clustering, and Applications, pages 71-94, 2009.
[6] Martin Ponweiser. Latent dirichlet allocation in r. Diploma thesis, Vienna University of Business and Economics, 2012.
[7] Venables, W. N., Smith, D. M., and R Development Core Team. (2010) An Introduction to R. Version 2.11.1, cran.r-project.org/doc/manuals/R-intro.pdf.
[8] Jockers, M. L. (2015). Syuzhet: Extract Sentiment and Plot Arcs from Text. https://github.com/mjockers/syuzhet.
[9] Mohammad, S. and Turney, P. (2013). Crowdsourcing a Word-Emotion Association Lexicon, Computational Intelligence, 29 (3), 436-465.
[10] Thor: Ragnarok (2017). (n.d.). Retrieved January 14, 2018, from http://www.imdb.com/title/tt3501632/?ref_=nv_sr_1.
[11] Pearson, E. (2018, January 08). Thor: Ragnarok. Retrieved January 14, 2018, from https://www.rottentomatoes.com/m/thor_ragnarok_2017.