Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 31100
Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal


By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.

Keywords: Machine Learning, Politics, Feature selection, LIWC

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1623


[1] M. Kaschesky, S. Pavel, and B. Guillaume, "Opinion Mining in Social Media: Modeling, Simulating, andVisualizing Political Opinion Formation in the Web," 2012.
[2] Y. Inbar and L. Joris, "Perspectives on Psychological Science," 2012.
[3] J. W. Pennebaker, R. E. Boot, and M. E. Francis, "Linguistic inquiry and word count: LIWC2007 - Operator's manual," Austin, TX, 2007.
[4] R. Inglehart and C. Welzel, "Modernization, Cultural Changeand Democracy", Cambridge UK, 2005.
[5] M. Griffiths, "E-citizens: Blogging as democratic practice", 2004.
[6] Y. Fang, L. Si, N. Somasundaram, and Z. Yu, "Mining Contrastive Opinions on Political Texts using Cross-Perspective Topic Model," in ACM, 2012, pp. 1-15.
[7] D. W. Van,”Shockmd: a neurostimulating blog”. (Online). , 2009.
[8] S. Alan Gerber, A. Gregory Huber, David Doherty, and Conor M. Dowling, "Personality and Political Attitudes: Relationships across Issue Domains and Political Context", vol. 104(1), 2010, pp. 111-133.
[9] F. Mairesse, M. A. Walker, M. R. Mehl, and R. K. Moore, "Using Linguistic Cues for the Automatic Recognition of Personality in conversation and text," Journal of Artificial Intelligence Research, 2007 pp. 457-500.
[10] S. Marina and P.W.D. Robert, "Combining feature subsets in feature selection," Multiple classifier systems, 2005 pp. 165-175.
[11] E. Ozbilen, "improving text categorization performance by combining feature selection methods," Istanbul, 2008.
[12] G. Forman, "An extensive empirical study of feature selection metrics for text classification," Journal of Machine Learning Research, 2003 vol. 3, pp. 1289–1305.
[13] E. R. Dougherty, J. Hua, and C. Sima, "Performance of Feature Selection Methods ," Current Genomics, vol. 10(6), 2009, pp. 365–374.
[14] J. Lee, M. Zhou, and X. Liu, "Detection of non-native sentences using machine-translated training data," in Proceedings of the 2007 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2007, pp. 93-96
[15] K. T Kotani, Yoshimi, and M. Uchida, "Automatic Classification of Texts Written by Learners of English as a Foreign Language based on Linguistic Features and Learner Features," 2013, pp. 6305-6314.
[16] J. W. Pennebaker and L. A. King, ""Linguistic styles: Language use as an individual difference"," Journal of Personality and Social Psychology, 1999, vol. 77, pp. 1296-1312.
[17] J. W. Pennebaker, M. E. Francis, and R. J. Booth, “Linguistic Inquiry” Mahwah, NJ, USA: Erlbaum Publishers, 2001. (Online).
[18] M. Coltheart, "The MRC psycholinguistic database," The Quarterly Journal of Experimental Psychology, 1981, vol. 33(4), pp. 497-505.
[19] M. Pennacchiotti and A.M. Popescu, "A Machine Learning Approach to Twitter User Classification," in ICWSM 11, 2011, pp. 281-288.
[20] B. L. Monroe, M. P. Colaresi, and K. M. Quinn, "Fighting words: Lexical feature selection and evaluation for identifying the content of political conflict," in Political Analysis, 2008, vol. 16(4), pp. 372-403.
[21] F. Heylighen and J. M. Dewaele, "Variation in the contextuality of language: an empirical measure", Context in Context, Special issue of Foundations of Science, 2002, vol. 7(3), pp. 293-340.
[22] M. R Mehl, S. D. Gosling, and J. W. Pennebaker, ""Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life", vol. 90, pp. 862-877, 2006.
[23] T. Yarkoni, "Personality in 100,000 Words: A Large-Scale Analysis of Personality and Word Use among Bloggers," National Institute of Health Public Access, 2010, pp. 1-23.
[24] C. Moral, A. d. Antonio, R. Imbert, and J. Ramirez, "A survey of stemming algorithms in information retrieval," in Information Research, 2014vol. 19(1).
[25] D. Maynard and A. Funk, "Automatic detection of political opinions in tweets," 2010, pp. 1-12.
[26] B. Liu, M. Hu, and J. Cheng, "Analyzing and comparing opinions on the web," in Proceedings of the 14th international conference on World Wide Web, 2005, pp. 342–351.