Commenced in January 2007
Paper Count: 30521
Analysis of Web User Identification Methods
Abstract:Web usage mining has become a popular research area, as a huge amount of data is available online. These data can be used for several purposes, such as web personalization, web structure enhancement, web navigation prediction etc. However, the raw log files are not directly usable; they have to be preprocessed in order to transform them into a suitable format for different data mining tasks. One of the key issues in the preprocessing phase is to identify web users. Identifying users based on web log files is not a straightforward problem, thus various methods have been developed. There are several difficulties that have to be overcome, such as client side caching, changing and shared IP addresses and so on. This paper presents three different methods for identifying web users. Two of them are the most commonly used methods in web log mining systems, whereas the third on is our novel approach that uses a complex cookie-based method to identify web users. Furthermore we also take steps towards identifying the individuals behind the impersonal web users. To demonstrate the efficiency of the new method we developed an implementation called Web Activity Tracking (WAT) system that aims at a more precise distinction of web users based on log data. We present some statistical analysis created by the WAT on real data about the behavior of the Hungarian web users and a comprehensive analysis and comparison of the three methods
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1328332Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3994
 M. S. Chen, J. S. Park, and P. S. Yu, "Data mining for path traversal patterns in a web environment," in Sixteenth International Conference on Distributed Computing Systems, 1996, pp. 385-392.
 J. Punin, M. Krishnamoorthy, and M. Zaki, "Web usage mining: Languages and algorithms," in Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag, 2001.
 P. Batista, M. ario, and J. Silva, "Mining web access logs of an on-line newspaper," 2002
 O. R. Zaiane, M. Xin, and J. Han, "Discovering web access patterns and trends by applying olap and data mining technology on web logs," in ADL -98: Proceedings of the Advances in Digital Libraries Conference. Washington, DC, USA: IEEE Computer Society, 1998, pp. 1-19.
 M. Eirinaki and M. Vazirgiannis, "Web mining for web personalization," ACM Trans. Inter. Tech., vol. 3, no. 1, pp. 1-27, 2003.
 J. Pei, J. Han, B. Mortazavi-Asl, and H. Zhu, "Mining access patterns efficiently from web logs," in PADKK -00: Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications. London, UK: Springer-Verlag, 2000, pp. 396-407.
 Z. Pabarskaite and A. Raudys, A process of knowledge discovery from web log data: Systematization and critical review, Journal of Intelligent Informatin Systems, Vol. 28. No. 1. 2007. pp. 79-104.
 J. Zhang and A. A. Ghorbani, "The reconstruction of user sessions from a server log using improved timeoriented heuristics." in CNSR. IEEE Computer Society, 2004, pp. 315-322.
 Robert Cooley and Bamshad Mobasher and Jaideep Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, Knowledge and Information Systems, Vol. 1. No. 1. 1999, pp. 5-32
 M. Spiliopoulou and C. Pohle and L. Faulstich, Improving the Effectiveness of a Web Site with Web Usage Mining, WEBKDD '99: Revised Papers from the International Workshop on Web Usage Analysis and User Profiling, 2000. pp. 142-162.
 M. Gery, H. Haddad: "Evaluation of Web Usage Mining Approaches for User-s Next Request Prediction", Fifth International Workshop on Web Information and Data Management (WIDM'03), 2003. pp. 74-81.
 O. Nasraoui, H. Frigui, A. Joshi, and R. Krishnapuram, Mining Web Access Logs Using Relational Competitive Fuzzy Clustering, Eight International Fuzzy Systems Association World Congress - IFSA 99, 1999
 M. Spiliopoulou and B. Mobasher and B. Berendt and M. Nakagawa, A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis, INFORMS Journal on Computing, 15, 2003.
 T. Morzy, M. Wojciechowski, and M. Zakrzewicz. Web users clustering. International Symposium on Computer and Information Sciences 2000.
 Brandt Dainow, ÔÇ×3rd Party Cookies Are Dead,", Web Analytics Associations, http://www.webanalyticsassociation.org/en/art/?2
 W3C, Common Log Format, http://www.w3.org/Daemon/User/Config/Logging.html
 Ansari, S., Kohavi, R., Mason, L., & Zheng, Z. (2001). Integrating ecommerce and data mining: Architecture and challenges. Data mining. San Jose, CA: IEEE Computer Society.