Fake Account Detection in Twitter Based on Minimum Weighted Feature set

Ahmed El Azab; Amira M. Idrees; Mahmoud A. Mahmoud; Hesham Hefny

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32799

Fake Account Detection in Twitter Based on Minimum Weighted Feature set

Authors: Ahmed El Azab, Amira M. Idrees, Mahmoud A. Mahmoud, Hesham Hefny

Abstract:

Social networking sites such as Twitter and Facebook attracts over 500 million users across the world, for those users, their social life, even their practical life, has become interrelated. Their interaction with social networking has affected their life forever. Accordingly, social networking sites have become among the main channels that are responsible for vast dissemination of different kinds of information during real time events. This popularity in Social networking has led to different problems including the possibility of exposing incorrect information to their users through fake accounts which results to the spread of malicious content during life events. This situation can result to a huge damage in the real world to the society in general including citizens, business entities, and others. In this paper, we present a classification method for detecting the fake accounts on Twitter. The study determines the minimized set of the main factors that influence the detection of the fake accounts on Twitter, and then the determined factors are applied using different classification techniques. A comparison of the results of these techniques has been performed and the most accurate algorithm is selected according to the accuracy of the results. The study has been compared with different recent researches in the same area; this comparison has proved the accuracy of the proposed study. We claim that this study can be continuously applied on Twitter social network to automatically detect the fake accounts; moreover, the study can be applied on different social network sites such as Facebook with minor changes according to the nature of the social network which are discussed in this paper.

Keywords: Fake accounts detection, classification algorithms, twitter accounts analysis, features based techniques.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1110582

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5780

References:

[1] Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro, "Aiding the detection of fake accounts in large scale social online services," in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2012.
[2] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete, "Information credibility on twitter," in Proceedings of the 20th international conference on Worldwide web, 2011.
[3] Manish Gupta, Peixiang Zhao, and Jiawei Han, "Evaluating Event Credibility on Twitter," Siam, 2012.
[4] P. Heymann, G. Koutrika, and H. Garcia-Molina, "Fighting spam on social web sites: A survey of approaches and future challenges," IEEE Internet Computing, 11, 2007.
[5] Aditi Gupta, Hemank Lamba, and Ponnurangam Kumaraguru, "$1.00 per RT #BostonMarathon #PrayForBoston: Analyzing Fake Content on Twitter," Eigth IEEE APWG eCrime Research Summit (eCRS), 12, 2013.
[6] Yazan Boshmaf et al., "Íntegro: Leveraging Victim Prediction for Robust Fake Account Detection in OSNs," in NDSS ’15, 8-11 , San Diego, CA, USA, February 2015.
[7] Vladislav Kontsevoi, Naim Lujan, and Adrian Orozco, "Detecting Subversion of Twitter," May 14, 2014.
[8] Fabr´ıcio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virg´ılio Almeida, "Detecting spammers on twitter," Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). Vol. 6, 2010.
[9] Supraja Gurajala, Joshua S. White, Brian Hudson, and Jeanna N. Matthews, "Fake Twitter accounts: Profile characteristics obtained using an activity-based pattern detection approach," in SMSociety '15, July 27 - 29, Toronto, ON, Canada, 2015
[10] G. Stringhini, C. Kruegel, and G. Vigna, "Detecting spammers on social networks," in Proceedings of the 26th Annual Computer Security Applications Conference, 2010, pp. 1–9.
[11] L. Breiman, "Random forests," Machine Learning, 2001.
[12] Zhi Yang et al., "Uncovering Social Network Sybils in the Wild," in Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, November 02-04, 2011, Berlin, Germany, 2011.
[13] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Boston: Kluwer Academic Publishers, 2002.
[14] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval. New York: Cambridge University, 2008.
[15] SocialBakers. (Online) http://www.socialbakers.com/products/ analytics?ref=fakefollowers-top-bar, last retrieved on 30-10-2015
[16] M. Camisani-Calzolari. (2012, August ) Analysis of Twitter followers of the US Presidential Election candidates: Barack Obama and Mitt Romney. (Online). http://digitalevaluations.com/
[17] The Fake project. (Online). http://wafi.iit.cnr.it/theFakeProject/ (last retrieved on 30-10-2015).
[18] Asha Gowda Karegowda, A. S. Manjunath, and M.A. Jayaram, "Comparative Study of Attribute Selection Using Gain Ratio," International Journal of Information Technology and Knowledge Management, vol. 2, no. 2, pp. 271-277, July-December 2010.
[19] Tatsunori Mori, Miwa Kikuchi, and Kazufumi Yoshida, "ermWeighting Method based on Information Gain Ratio for Summarizing Documents retrieved by IR systems," Journal of Natural Language Processing, vol. 9, no. 4, pp. 3--32, 2002.
[20] S. Cresci, M. Petrocchi, and R. Di Pietro, "A criticism to Society (As seen by Twitter analystics)," in IEEE 34th international conference on distributes computing systems workshops, 2014.
[21] S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, and M. Tesconi, "A Fake Follower Story: improving fake accounts detection on Twitter," 2014.
[22] Bas Van Den Beld. (2012, September) Stateofsearch.com. (Online). http://goo.gl/YZbVf
[23] Manuel Fern_andez Delgado, Eva Cernadas, Sen_en Barro, and Dinani Amorim, "Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?," Journal of Machine Learning Research, vol. 15, pp. 3133-3181, 2014.
[24] Lior Rokach and Oded Maimon, Data Mining and Knowledge Discovery Handbook - Chapter 9 (Decision Trees), Oded Maimon and Lior Rokach, Eds., 2005.
[25] David Kriesel, A Brief Introduction to Neural Networks.: dkriesel.com, 2005. (Online) http://www.dkriesel.com/en/science/neural_networks (last retrieved 30-10-2015).
[26] Jesse Davis and Mark Goadrich, "The Relationship between Precision- Recall and ROC Curves," in Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, 2006.