Web Log Mining by an Improved AprioriAll Algorithm
Authors: Wang Tong, He Pi-lian
Abstract:
This paper sets forth the possibility and importance about applying Data Mining in Web logs mining and shows some problems in the conventional searching engines. Then it offers an improved algorithm based on the original AprioriAll algorithm which has been used in Web logs mining widely. The new algorithm adds the property of the User ID during the every step of producing the candidate set and every step of scanning the database by which to decide whether an item in the candidate set should be put into the large set which will be used to produce next candidate set. At the meantime, in order to reduce the number of the database scanning, the new algorithm, by using the property of the Apriori algorithm, limits the size of the candidate set in time whenever it is produced. Test results show the improved algorithm has a more lower complexity of time and space, better restrain noise and fit the capacity of memory.
Keywords: Candidate Sets Pruning, Data Mining, ImprovedAlgorithm, Noise Restrain, Web Log
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075232
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2285References:
[1] Margaret H Dunham. Data Mining Introductory and Advanced Topics (M). Beijing: Tsinghua University Press, 2003, p195-220.
[2] Lin jie bin, Liu ming de, Chen xiang. Data mining and OLAP Theory & Practice (M). Beijing: Tsinghua University Press, 2003, p 194-244.
[3] Tony Bain. SQL Server 2000 Data Warehouse and Analysis Services (M). Beijing: China Electric Power Press, 2003, p 443-470.
[4] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techniques (M). Beijing: China Machine Press, 2001, p 290-297.
[5] Claude Seidman. Data Mining With SQL Server 2000 Technical Reference (M). Beijing: China Machine Press, 2002, p 100-118.
[6] Lan H Witten, Eibe Frank. Practical Machine Learning Tools and Techniques with JAVA Implementations (M). Beijing: China Machine Press, 2003, p 77-118.