Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30069
A Comparative Study of Page Ranking Algorithms for Information Retrieval

Authors: Ashutosh Kumar Singh, Ravi Kumar P

Abstract:

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank (PR), WPR (Weighted PageRank), HITS (Hyperlink-Induced Topic Search), DistanceRank and DirichletRank algorithms are discussed and compared. PageRanks are calculated for PageRank and Weighted PageRank algorithms for a given hyperlink structure. Simulation Program is developed for PageRank algorithm because PageRank is the only ranking algorithm implemented in the search engine (Google). The outputs are shown in a table and chart format.

Keywords: Web Mining, Web Structure, Web Graph, LinkAnalysis, PageRank, Weighted PageRank, HITS, DistanceRank, DirichletRank,

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1058595

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2325

References:


[1] M. G. da Gomes Jr. and Z.Gong, "Web Structure Mining: An Introduction", Proceedings of the IEEE International Conference on Information Acquisition, 2005.
[2] N. Duhan, A. K. Sharma and K. K. Bhatia, "Page Ranking Algorithms: A Survey, Proceedings of the IEEE International Conference on Advance Computing, 2009.
[3] R. Kosala, H. Blockeel, "Web Mining Research: A Survey", SIGKDD Explorations, Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining Vol. 2, No. 1 pp 1-15, 2000.
[4] R. Cooley, B. Mobasher and J. Srivastava, "Web Minig: Information and Pattern Discovery on the World Wide Web". Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, pp. (ICTAI-97), 1997.
[5] http://googleblog.blogspot.com/2008/07.
[6] E. Horowitz, S. Sahni and S. Rajasekaran, "Fundamentals of Computer Algorithms", Galgotia Publications Pvt. Ltd., pp. 112-118, 2008.
[7] A. Broder, R. Kumar, F Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, "Graph Structure in the Web", Computer Networks: The International Journal of Computer and telecommunications Networking, Vol. 33, Issue 1-6, pp 309-320, 2000.
[8] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tompkins and E. Upfal, "Web as a Graph", Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Database systems, 2000.
[9] J. Kleinberg, R. Kumar, P. Raghavan, P. Rajagopalan and A. Tompkins, "Web as a Graph: Measurements, models and methods", Proceedings of the International Conference on Combinatorics and Computing, pp. 1- 18, 1999.
[10] E. Garfield, "Citation Analysis as a tool in journal evaluation", Science 178, pp. 471-479, 1972.
[11] G. Pinski and F. .Narin, "Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics", Information Processing and Management, 1976.
[12] D. Gibson, J. Kleinberg, P. Raghavan, "Inferring Web Communities from Link Topology", Proc. of the 9th ACM Conference on Hypertext and Hypermedia, 1998.
[13] R. Kumar, P .Raghavan, S .Rajagopalan, A. Tomkins, "Trawling the Web for Emerging Cyber-Communities", Proc. of the 8th WWW Conference (WWW8), 1999.
[14] J. Dean and M. Henzinger, "Finding Related Pages in the World Wide Web", Proc. Eight Int-l World Wide Web Conf., pp. 389-401, 1999.
[15] J. Hou and Y. Zhang, "Effectively Finding Relevant Web Pages from Linkage Information", IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, 2003.
[16] S. Brin, L. Page, "The Anatomy of a Large Scale Hypertextual Web search engine," Computer Network and ISDN Systems, Vol. 30, Issue 1- 7, pp. 107-117, 1998.
[17] W. Xing and Ali Ghorbani, "Weighted PageRank Algorithm", Proc. of the Second Annual Conference on Communication Networks and Services Research (CNSR -04), IEEE, 2004.
[18] J. Kleinberg, "Authoritative Sources in a Hyper-Linked Environment", Journal of the ACM 46(5), pp. 604-632, 1999.
[19] L. Page, S. Brin, R. Motwani, and T. Winograd, "The Pagerank Citation Ranking: Bringing order to the Web". Technical Report, Stanford Digital Libraries SIDL-WP-1999-0120, 1999.
[20] C. Ridings and M. Shishigin, "PageRank Convered". Technical Report, 2002.
[21] J. Kleinberg, "Hubs, Authorities and Communities", ACM Computing Surveys, 31(4), 1999.
[22] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, "Mining the Link Structure of the World Wide Web", IEEE Computer Society Press, Vol 32, Issue 8 pp. 60 - 67, 1999.
[23] A. M. Zareh Bidoki and N. Yazdani, "DistanceRank: An intelligent ranking algorithm for web pages" Information Processing and Management, Vol 44, No. 2, pp. 877-892, 2008.
[24] R.S. Sutton and A.G. Barto, "Reinforcement Learning: An Introduction". Cambridge, MA: MIT Press, 1998
[25] J. Cho, S. Roy and R. E. Adams, "Page Quality: In search of an unbiased web ranking". Proc. of ACM International Conference on Management of Data". Pp. 551-562, 2005.
[26] J. Cho and S. Roy, "Impact of Search Engines on Page Popularity". Proc. of the 13th International Conference on WWW, pp. 20-29, 2004.
[27] X. Wang, T. Tao, J. T. Sun, A. Shakery and C. Zhai, "DirichletRank: Solving the Zero-One Gap Problem of PageRank". ACM Transaction on Information Systems, Vol. 26, Issue 2, 2008.
[28] Z. Gyongyi and H. Garcia-Molina, "Web Spam Taxonomy". Proc. of the First International Workshop on Adversarial Information Retrieval on the Web", 2005.
[29] Z.. Gyongyi and H. Garcia-Molina, "Link Spam Alliances". Proc. of the 31st International Conference on Very Large DataBases (VLDB), pp. 517-528, 2005.
[30] M. Bianchini, M.. Gori and F. Scarselli, "Inside PageRank". ACM Transactions on Internet Technology, Vol. 5, Issue 1, 2005
[31] C.. H. Q. Ding, X. He, P. Husbands, H. Zha and H. D. Simon, "PageRank: HITS and a Unified Framework for Link Analysis". Proc. of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002.
[32] http://toolbar.google.com/.