Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Haider A Ramadhan; Khalil Shihab

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32804

Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1060549

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1408

References:

[1] http://citeseer.nj.nec.com/armstrong97webwatcher.html
[2] Jones D. H. IndustryNet: A model for Commerce on the Web, IEEE Expert, Oct., pp 54-59, 1995.
[3] Willmot D. Alexa, PC Magazine Online, January, 1999.
[4] http://lieber.www.media.mit.edu/people/lieber/Lieberary/Letizia/Letizia. html
[5] Balabanovic M. and Shoham Fab Y. Content-based collaborative recommendation, Communications of the ACM, 40(3): 66-72, 1997.
[6] Tan B. Web information monitoring for competitive intelligence, Cybernetics and Systems, 33, 3, 225-235, 2000.
[7] Srivastava J., Cooley R., Deshpande M., and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2), 2000.
[8] Salton G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, Reading, Mass., USA, 1999.
[9] Karypis G. Multilevel hypergraph partitioning: Application in VLSI domain, Proceedings of ACM/IEEE Design Automation Conference, 1997.
[10] Chang C. Customizable multi-engine search tool with clustering. Proceedings of 6th International Web Conference, 1997.
[11] Jain A. Algorithms for Clustering Data. Prentice Hall, 1998.
[12] Titterington D. Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, 1985.
[13] Lu S. and Fu K. A sentence-to-sentence clustering procedure for pattern analysis. IEEE Transactions on Systems, Man, and Cybernetics, 8, 381- 389, 1987.
[14] Moore J. Web Page Categorization and Feature Selection Using Association Rule and Principal Component Clustering, TR 9405380, Department of Computer Science, University of Minnesota, 2001.
[15] Cheung D. Discovering User Access Patterns on the Web, Knowledge Based Systems, 10, 463-470, 1998.