Application of Data Mining Techniques for Tourism Knowledge Discovery
Application of five implementations of three data mining classification techniques was experimented for extracting important insights from tourism data. The aim was to find out the best performing algorithm among the compared ones for tourism knowledge discovery. Knowledge discovery process from data was used as a process model. 10-fold cross validation method is used for testing purpose. Various data preprocessing activities were performed to get the final dataset for model building. Classification models of the selected algorithms were built with different scenarios on the preprocessed dataset. The outperformed algorithm tourism dataset was Random Forest (76%) before applying information gain based attribute selection and J48 (C4.5) (75%) after selection of top relevant attributes to the class (target) attribute. In terms of time for model building, attribute selection improves the efficiency of all algorithms. Artificial Neural Network (multilayer perceptron) showed the highest improvement (90%). The rules extracted from the decision tree model are presented, which showed intricate, non-trivial knowledge/insight that would otherwise not be discovered by simple statistical analysis with mediocre accuracy of the machine using classification algorithms.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1129277Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
 Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
 Glusman, G., Bahar, A., Sharon, D., Pilpel, Y., White, J., & Lancet, D. (2000). The olfactory receptor gene superfamily: data mining, classification, and nomenclature. Mammalian genome, 11(11), 1016-1023.
 Wu, Xindong, et al. (2008). "Top 10 algorithms in data mining." Knowledge and information systems 14.1: 1-37.
 Hong, T. P., Kuo, C. S., & Chi, S. C. (2001). Trade-off between computation time and number of rules for fuzzy mining from quantitative data. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(05), 587-604.
 Caruana, R., & Freitag, D. (1994, July). Greedy Attribute Selection. In ICML (pp. 28-36).
 Gupta, A. K., & Gupta, C. (2012). Analyzing Customer Behavior using Data Mining Techniques: Optimizing Relationships with Customer. Management Insight, 6(1).
 Rodríguez, I., Williams, A. M., & Hall, C. M. (2014). Tourism innovation policy: Implementation and outcomes. Annals of Tourism Research, 49, 76-93.
 South Korea Tourist Arrivals 1993-2016 available at http://www.tradingeconomics.com/south-korea/tourist-arrivals accessed on 2016.08.09
 OECD Tourism Trends and Policies 2014 accessed http://www.keepeek.com/Digital-Asset-Management/oecd/industry-and-services/oecd-tourism-trends-and-policies-2014_tour-2014-en#page1 on 2016.08.10
 Sabou, M., Onder, I., Brasoveanu, A. M., & Scharl, A. (2016). Towards cross-domain data analytics in tourism: a linked data based approach. Information Technology & Tourism, 16(1), 71-101.
 Bach, M. P. (2003, June). Data mining applications in public organizations. In Proceedings of the 25th international conference on information technology interfaces (pp. 211-216).
 Olmeda, I., & Sheldon, P. J. (2002). Data mining techniques and applications for tourism Internet marketing. Journal of Travel & Tourism Marketing, 11(2-3), 1-20
 Bose, I. (2009). Data Mining in Tourism. Encyclopedia of Information Science And Technology.
 Aghdam, A. R., Kamalpour, M., Chen, D., Sim, A. T. H., & Hee, J. M. (2014, August). Identifying places of interest for tourists using knowledge discovery techniques. In Industrial Automation, Information and Communications Technology (IAICT), 2014 International Conference on (pp. 130-134). IEEE.
 Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27-34.
 Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016
 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.