Sentiment Analysis of Fake Health News Using Naive Bayes Classification Models

Danielle Shackley; Yetunde Folajimi

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Sentiment Analysis of Fake Health News Using Naive Bayes Classification Models

Authors: Danielle Shackley, Yetunde Folajimi

Abstract:

As more people turn to the internet seeking health related information, there is more risk of finding false, inaccurate, or dangerous information. Sentiment analysis is a natural language processing technique that assigns polarity scores of text, ranging from positive, neutral and negative. In this research, we evaluate the weight of a sentiment analysis feature added to fake health news classification models. The dataset consists of existing reliably labeled health article headlines that were supplemented with health information collected about COVID-19 from social media sources. We started with data preprocessing, tested out various vectorization methods such as Count and TFIDF vectorization. We implemented 3 Naive Bayes classifier models, including Bernoulli, Multinomial and Complement. To test the weight of the sentiment analysis feature on the dataset, we created benchmark Naive Bayes classification models without sentiment analysis, and those same models were reproduced and the feature was added. We evaluated using the precision and accuracy scores. The Bernoulli initial model performed with 90% precision and 75.2% accuracy, while the model supplemented with sentiment labels performed with 90.4% precision and stayed constant at 75.2% accuracy. Our results show that the addition of sentiment analysis did not improve model precision by a wide margin; while there was no evidence of improvement in accuracy, we had a 1.9% improvement margin of the precision score with the Complement model. Future expansion of this work could include replicating the experiment process, and substituting the Naive Bayes for a deep learning neural network model.

Keywords: Sentiment analysis, Naive Bayes model, natural language processing, topic analysis, fake health news classification model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 500

References:

[1] P. Coate, “Remote work before, during, and after the pandemic,” Jan 2021.
[2] G. Juravle, A. Boudouraki, M. Terziyska, and C. Rezlescu, “Trust in artificial intelligence for medical diagnoses,” in PubMed. National Library of Medicine, 2020.
[Online]. Available: https://doi.org/10.1016/bs.pbr.2020.06.006
[3] T. Treharne and A. Papanikitas, “Defining and detecting fake news in health and medicine reporting,” Aug 2020.
[Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7509617/
[4] K. Jiang and X. Lu, “Natural language processing and its applications in machine translation: A diachronic review,” in 2020 IEEE 3rd International Conference of Safe Production and Informatization (IICSPI), 2020, pp. 210–214.
[5] S. Bird, E. Klein, and E. Loper, Natural language processing with Python: analyzing text with the natural language toolkit. ” O’Reilly Media, Inc.”, 2009.
[6] M. Razno, “Machine learning text classification model with nlp approach,” Computational Linguistics and Intelligent Systems, vol. 2, pp. 71–73, 2019.
[7] M. Granik and V. Mesyura, “Fake news detection using naive bayes classifier,” in 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 2017, pp. 900–903.
[8] S. Dhoju, M. Main Uddin Rony, M. Ashad Kabir, and N. Hassan, “Differences in health news from reliable and unreliable media,” in Companion Proceedings of The 2019 World Wide Web Conference, ser. WWW ’19. New York, NY, USA: Association for Computing Machinery, 2019, pp. 981–987.
[Online]. Available: https://doi.org/10.1145/3308560.3316741
[9] S. Hakak, M. Alazab, S. Khan, T. R. Gadekallu, P. K. R. Maddikunta, and W. Z. Khan, “An ensemble machine learning approach through effective feature extraction to classify fake news,” Future Generation Computer Systems, vol. 117, pp. 47–58, 2021.
[Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X20330466
[10] S. Elbagir and J. Yang, Sentiment Analysis on Twitter with Python’s Natural Language Toolkit and VADER Sentiment Analyzer. Proceedings of the International MultiConference of Engineers and Computer Scientists 2019, 2019, pp. 63–80.
[11] E. Dai, Y. Sun, and S. Wang, “Ginger cannot cure cancer: Battling fake health news with a comprehensive data repository,” CoRR, vol. abs/2002.00837, 2020.
[Online]. Available: https://arxiv.org/abs/2002.00837
[12] E. Dai, “Fakehealth,” 2020.
[Online]. Available: https://github.com/EnyanDai/FakeHealth/pulls
[13] M. Porter, “Healthnewsreviews.org.”
[Online]. Available: https://tartarus.org/martin/PorterStemmer/
[14] H. S. Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval. Cambridge University Press, 2008.
[15] S. Teja, “Stop words in nlp,” 2020.
[Online]. Available: https://medium.com/@saitejaponugoti/stop-words-in-nlp-5b248dadad47
[16] A. Beri. (2020) Sentimental analysis using vader.
[Online]. Available: https://towardsdatascience.com/sentimental-analysis-using-vadera3415fef7664
[17] S. Panchal. (2020) Sentiment analysis with vaderlabel the unlabelled data.
[Online]. Available: https://medium.com/analytics-vidhya/sentiment-analysis-with-vaderlabel- the-unlabeled-data-8dd785225166
[18] S. Saket. (2020) Count vectorizer vs tfidf vectorizer — natural language processing.
[Online]. Available: https://www.linkedin.com/pulse/count-vectorizers-vs-tfidf-naturallanguage- processing-sheel-saket/
[19] S. Kannan, S. Saravanan, P. Chandirasekeran, and S. Rani Patra, “Detection of fake news related to covid-19 using natural language processing,” in 2021 Asian Conference on Innovation in Technology (ASIANCON), 2021, pp. 1–6.
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[21] Z. Luvsandorj. (2021) Featureunion, columntransformer and pipeline for preprocessing text data.
[Online]. Available: https://towardsdatascience.com/featureunion-columntransformer-pipelinefor- preprocessing-text-data-9dcb233dbcb6
[22] T. W. L. Jenq Haur Wang, X. Luo, and L. Wang, “An lstm approach to short text sentiment classification with word embeddings,” in The Association for Computational Linguistics and Chinese Language Processing, 2018.