Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 30855
Personalization of Web Search Using Web Page Clustering Technique

Authors: Pradeep M. Patil, Amol Bapuso Rajmane, Prakash J. Kulkarni


The Information Retrieval community is facing the problem of effective representation of Web search results. When we organize web search results into clusters it becomes easy to the users to quickly browse through search results. The traditional search engines organize search results into clusters for ambiguous queries, representing each cluster for each meaning of the query. The clusters are obtained according to the topical similarity of the retrieved search results, but it is possible for results to be totally dissimilar and still correspond to the same meaning of the query. People search is also one of the most common tasks on the Web nowadays, but when a particular person’s name is queried the search engines return web pages which are related to different persons who have the same queried name. By placing the burden on the user of disambiguating and collecting pages relevant to a particular person, in this paper, we have developed an approach that clusters web pages based on the association of the web pages to the different people and clusters that are based on generic entity search.

Keywords: Information Retrieval, Clustering, Entity resolution, graph based disambiguation, web people search

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 916


[1] R. Al-Kamha and D.W. Embley, “Grouping Search-Engine Returned Citations for Person-Name Queries,” Proc. Int’l Workshop Web Information and Data Management (WIDM), 2004.
[2] D.V. Kalashnikov, S. Mehrotra, Z. Chen, R. Nuray-Turan, and N. Ashish, “Disambiguation Algorithm for People Search on the Web,” Proc. IEEE Int’l Conf. Data Eng. (ICDE ’07), Apr. 2007.
[3] D.V. Kalashnikov, R. Nuray-Turan, and S. Mehrotra, “Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search,” Proc. SIGIR, July 2008.
[4] J. Artiles, J. Gonzalo, and F. Verdejo, “A Testbed for PeopleSearching Strategies in the WWW,” Proc. SIGIR, 2005.
[5] R. Bekkerman, S. Zilberstein, and J. Allan, ―Web Page Clustering Using Heuristic Search in the Web Graph, Proc. Int’l Joint Conf. Artificial Intelligence (IJCAI), 2007.
[6] Z. Chen, D.V. Kalashnikov, and S. Mehrotra, ―Adaptive Graphical Approach to Entity Resolution, Proc. ACM IEEE Joint Conf. Digital Libraries (JCDL), 2007.
[7] J. Artiles, J. Gonzalo, and F. Verdejo, “A Testbed for People Searching Strategies in the WWW,” Proc. SIGIR, 2005.
[8] R. Bekkerman and A. McCallum, “Disambiguating Web Appearances of People in a Social Network,” Proc. Int’l World Wide Web Conf. (WWW), 2005.
[9] D. Bollegala, Y. Matsuo, and M. Ishizuka, “Extracting Key Phrases to Disambiguate Personal Names on the Web,” Proc. Int’l Conf. Intelligent Text Processing and Computational Linguistics (CICLing),2006.
[10] D.V. Kalashnikov, S. Mehrotra, R.N. Turen and Z. Chen, “Web People Search via Connection Analysis” IEEE Transactions on Knowledge and data engg. Vol 20, No11, Novr 2008.
[11] N. Bansal, A. Blum, and S. Chawla, “Correlation Clustering,” Foundations of Computer Science, pp. 238-247, 2002.
[12] I. Bhattacharya and L. Getoor, “Iterative Record Linkage for Cleaning and Integration,” Proc. ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD), 2004.
[13] I. Bhattacharya and L. Getoor, “A Latent Dirichlet Model for Unsupervised Entity Resolution,” Proc. SIAM Data Mining Conf. (SDM), 2006.
[14] Z. Chen, D.V. Kalashnikov, and S. Mehrotra, “Exploiting Relationships for Object Consolidation,” Proc. Int’l ACM SIGMOD Workshop Information Quality in Information Systems (IQIS), 2005.
[15] I. Fellegi and A. Sunter, “A Theory for Record Linkage,” J. Am. Statistical Assoc., vol. 64, no. 328, pp. 1183-1210, 1969.
[16] D.V. Kalashnikov and S. Mehrotra, “Domain-Independent Data Cleaning via Analysis of Entity-Relationship Graph,” ACM Trans. Database Systems, vol. 31, no. 2, pp. 716-767, June 2006.
[17] D.V. Kalashnikov, S. Mehrotra, and Z. Chen, “Exploiting Relationships for Domain-Independent Data Cleaning,” Proc. SIAM Int’l Conf. Data Mining (SDM ’05), Apr. 2005.
[18] A. McCallum and B. Wellner, “Conditional Models of Identity Uncertainty with Application to Noun Coreference,” Proc. Ann. Conf. Neural Information Processing Systems (NIPS), 2004.
[19] H. Newcombe, J. Kennedy, S. Axford, and A. James, “Automatic Linkage of Vital Records,” Science, vol. 130, pp. 954-959, 1959.
[20] R. Nuray-Turan, D.V. Kalashnikov, and S. Mehrotra, “Self-Tuning in Graph-Based Reference Disambiguation,” Proc. Int’l Conf. Database Systems for Advanced Applications (DASFAA ’07), Apr. 2007.
[21] P. Singla and P. Domingos, “Multi-Relational Record Linkage,” Proc. Workshop Multi-Relational Data Mining (MRDM), 2004.
[22] C. Tiwari, “Entity Identification on the Web,” technical report, Indian Inst. Technology, 2006.
[23] X. Wan, J. Gao, M. Li, and B. Ding, “Person Resolution in Person Search Results: Webhawk,” Proc. ACM Conf. Information and Knowledge Management (CIKM), 2005.
[24] R. Guha and A. Garg, Disambiguating People in Search. Stanford Univ., 2004.
[25] D. Bollegala, Y. Matsuo, and M. Ishizuka, “Extracting Key Phrasesto Disambiguate Personal Names on the Web,” Proc. Int’l Conf. Intelligent Text Processing and Computational Linguistics (CICLing), 2006.