Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques
Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas
Abstract:
The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.
Keywords: Artificial neural network, competitive dynamics, logistic regression, text classification, text mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 535References:
[1] A. Karnani and B. Wernerfelt, “Multiple point competition,” Strateg. Manag. J., vol. 6, no. 1, pp. 87–96, Jan. 1985.
[2] P. Pita Barros, “Multimarket competition in banking, with an example from the Portuguese market,” Int. J. Ind. Organ., vol. 17, no. 3, pp. 335–352, Apr. 1999.
[3] T. Yu and A. A. Cannella Jr., “Rivalry between multinational enterprises: An event history approach,” Acad. Manag. J., vol. 50, no. 3, pp. 665–686, Jun. 2007.
[4] T. Yu, M. Subramiam, and A. A. Cannella Jr., “Rivalry Deterrence in International Markets : Contingencies Governing the Mutual Forbearance Hypothesis,” vol. 52, no. 1, pp. 127–147, 2009.
[5] T. Yu, M. Subramaniam, and A. A. Cannella Jr., “Competing globally, allying locally: Alliances between global rivals and host-country factors,” J. Int. Bus. Stud., vol. 44, no. 2, pp. 117–137, 2013.
[6] T. Yu and A. A. Cannella Jr., “A Comprehensive Review of Multimarket Competition Research,” J. Manage., vol. 39, no. 1, pp. 76–109, 2012.
[7] M.-J. Chen and I. C. MacMillan, “Nonresponse and Delayed Response to Competitive Moves: The Roles of Competitor Dependence and Action Irreversibility,” Acad. Manag. J., vol. 35, no. 3, pp. 539–570, Aug. 1992.
[8] J. W. Pennebaker, M. R. Mehl, and K. G. Niederhoffer, “Psychological Aspects of Natural Language Use: Our Words, Our Selves,” Annu. Rev. Psychol., vol. 54, no. 1, pp. 547–577, 2003.
[9] S. R. Das and M. Y. Chen, “Yahoo! for amazon: Sentiment extraction from small talk on the Web,” Manage. Sci., vol. 53, no. 9, pp. 1375–1388, 2007.
[10] R. Belderbos, M. Grabowska, B. Leten, S. Kelchtermans, and N. Ugur, “On the Use of Computer-Aided Text Analysis in International Business Research,” Glob. Strateg. J., vol. 7, no. 3, pp. 312–331, 2017.
[11] I. Bose and R. K. Mahapatra, “Business data mining - A machine learning perspective,” Inf. Manag., vol. 39, no. 3, pp. 211–225, 2001.
[12] C. Coulombe, “Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs.”
[13] A. Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow. O’Reilly Media, 2017.