Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32220
Extracting Attributes for Twitter Hashtag Communities

Authors: Ashwaq Alsulami, Jianhua Shao


Various organisations often need to understand discussions on social media, such as what trending topics are and characteristics of the people engaged in the discussion. A number of approaches have been proposed to extract attributes that would characterise a discussion group. However, these approaches are largely based on supervised learning, and as such they require a large amount of labelled data. We propose an approach in this paper that does not require labelled data, but rely on lexical sources to detect meaningful attributes for online discussion groups. Our findings show an acceptable level of accuracy in detecting attributes for Twitter discussion groups.

Keywords: Attributed community, attribute detection, community, social network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 286


[1] K. H. Lim and A. Datta, “Finding twitter communities with common interests using following links of celebrities,” in Proceedings of the 3rd international workshop on Modeling social media, 2012, pp. 25–32.
[2] ——, “Following the follower: Detecting communities with common interests on twitter,” in Proceedings of the 23rd ACM conference on Hypertext and social media, 2012, pp. 317–318.
[3] M. Bakillah, R.-Y. Li, and S. H. Liang, “Geo-located community detection in twitter with enhanced fast-greedy optimization of modularity: the case study of typhoon haiyan,” International Journal of Geographical Information Science, vol. 29, no. 2, pp. 258–279, 2015.
[4] B. R. Amor, S. I. Vuik, R. Callahan, A. Darzi, S. N. Yaliraki, and M. Barahona, “Community detection and role identification in directed networks: understanding the twitter network of the care. data debate,” in Dynamic networks and cyber-security. World Scientific, 2016, pp. 111–136.
[5] N. Cao, L. Lu, Y.-R. Lin, F. Wang, and Z. Wen, “Socialhelix: visual analysis of sentiment divergence in social media,” Journal of visualization, vol. 18, no. 2, pp. 221–235, 2015.
[6] P. Vijayaraghavan, S. Vosoughi, and D. Roy, “Twitter demographic classification using deep multi-modal multi-task learning,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017, pp. 478–483.
[7] H. A. Schwartz, J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones, M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman et al., “Personality, gender, and age in the language of social media: The open-vocabulary approach,” PloS one, vol. 8, no. 9, p. e73791, 2013.
[8] T. Yo and K. Sasahara, “Inference of personal attributes from tweets using machine learning,” in 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017, pp. 3168–3174.
[9] L. Sloan, J. Morgan, P. Burnap, and M. Williams, “Who tweets? deriving the demographic characteristics of age, occupation and social class from twitter user meta-data,” PloS one, vol. 10, no. 3, p. e0115545, 2015.
[10] T. Hu, H. Xiao, J. Luo, and T.-v. T. Nguyen, “What the language you tweet says about your occupation,” in Tenth International AAAI Conference on Web and Social Media, 2016.
[11] Z. Wood-Doughty, N. Andrews, R. Marvin, and M. Dredze, “Predicting twitter user demographics from names alone,” in Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, 2018, pp. 105–111.
[12] T. Georgiou, A. El Abbadi, and X. Yan, “Extracting topics with focused communities for social content recommendation,” in Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 2017, pp. 1432–1443.
[13] A. Culotta, N. R. Kumar, and J. Cutler, “Predicting the demographics of twitter users from website traffic data,” in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
[14] S. Volkova, Y. Bachrach, and B. Van Durme, “Mining user interests to predict perceived psycho-demographic traits on twitter,” in 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService). IEEE, 2016, pp. 36–43.
[15] J. Messias, P. Vikatos, and F. Benevenuto, “White, man, and highly followed: Gender and race inequalities in twitter,” in Proceedings of the International Conference on Web Intelligence. ACM, 2017, pp. 266–274.
[16] N. Cesare, C. Grant, and E. O. Nsoesie, “Detection of user demographics on social media: A review of methods and recommendations for best practices,” arXiv preprint arXiv:1702.01807, 2017.
[17] M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
[18] S. Fortunato, “Community detection in graphs,” Physics reports, vol. 486, no. 3-5, pp. 75–174, 2010.
[19] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng, “A model-based approach to attributed graph clustering,” in Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, 2012, pp. 505–516.
[20] Y. Ruan, D. Fuhry, and S. Parthasarathy, “Efficient community detection in large networks using content and links,” in Proceedings of the 22nd international conference on World Wide Web. ACM, 2013, pp. 1089–1098.
[21] J. Pan, R. Bhardwaj, W. Lu, H. L. Chieu, X. Pan, and N. Y. Puay, “Twitter homophily: Network based prediction of user’s occupation,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 2633–2638.
[22] A. Chakraborty, J. Messias, F. Benevenuto, S. Ghosh, N. Ganguly, and K. P. Gummadi, “Who makes trends? understanding demographic biases in crowdsourced recommendations,” in Eleventh International AAAI Conference on Web and Social Media, 2017.
[23] M. Vicente, F. Batista, and J. P. Carvalho, “Gender detection of twitter users based on multiple information sources,” in Interactions Between Computational Intelligence and Mathematics Part 2. Springer, 2019, pp. 39–54.
[24] X. Huang, L. Xing, F. Dernoncourt, and M. J. Paul, “Multilingual twitter corpus and baselines for evaluating demographic bias in hate speech recognition,” arXiv preprint arXiv:2002.10361, 2020.
[25] A. Mueller, Z. Wood-Doughty, S. Amir, M. Dredze, and A. L. Nobles, “Demographic representation and collective storytelling in the me too twitter hashtag activism movement,” arXiv preprint arXiv:2010.06472, 2020.
[26] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[27] G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
[28] Y. Li, Z. A. Bandar, and D. McLean, “An approach for measuring semantic similarity between words using multiple information sources,” IEEE Transactions on knowledge and data engineering, vol. 15, no. 4, pp. 871–882, 2003.
[29] Z. Gong, M. Muyeba, and J. Guo, “Business information query expansion through semantic network,” Enterprise Information Systems, vol. 4, no. 1, pp. 1–22, 2010.