Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

Doru Anastasiu Popescu; Dan Rădulescu

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33132

Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees

Authors: Doru Anastasiu Popescu, Dan Rădulescu

Abstract:

In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.

Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1124925

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1316

References:

[1] Koza J.R., Genetic Programming, MIT Press, Cambridge, MA, 1992
[2] N. M. Ciobanu (Iacob), Proposed Algorithm for Solving Queries in a Dynamic System of Distributed Databases, Global Journal on Technology, Vol. 03 pp. 535-540, 2013
[3] C.L Defta, A. Şerb, N.M. Iacob, C. Baron, Threats analysis for E-learning platforms, Knowledge Horizons – Economics, Vol. 6 / Nr. 1, pp. 132–135, 2014
[4] D. A. Popescu, D. Radulescu, Monitoring of irrigation Systems Using Genetic Algorithm, ICMSAO, IEEE Xplore, pp.1-4, 2015.
[5] D. A. Popescu, C. M. Danauta, Similarity Measurement of Web Sites Using Sink Web Pages, 34th International Conference on Telecommunications and Signal Processing, TSP, IEEE Xplore, pp.24-26, 2011.
[6] D. A. Popescu, D. Nicolae, Determining the similarity of two web applications using the edit distance, SOFA, LNCS, 2014, pg.12-20
[7] D. A. Popescu, D. Radulescu, Approximately Similarity Measurement of Web Sites, ICONIP, 2015
[8] Guadalupe J. Torres, Ram B. Basnet, Andrew H. Sung, Srinivas Mukkamala, Bernardete M. Ribeiro, A Similarity Measure for Clustering and Its Applications, ICASA, pp. 1712-1718, 2008.
[9] G. Jeh, J. Windom, SimRank: A measure of Structural-Context Similarity, KDD, ACM, pp. 538-543, 2002.
[10] C. N. Pushpa, J. Thriveni, K. R. Venugopal, L. M. Patnaik, Web Search Engine Based Semantic Similarity Measure Between Words Using Pattern Retrieval Algorithm, CS & IT-CSCP, pp. 1-11, 2013.
[11] P. Zhao, J. Han, Y. Sun, P-Rank: A Comprehensive Structural Similarity Measure over Information Networks, CIKM, ACM, pp. 1-10, 2009.
[12] D. Bollegata, Y. Matsuo, M. Ishizuka, Measuring Semantic Similarity between Words Using Web Search Engines, IW3C2, pp. 757-766, 2007.
[13] D. Lin, An Information-Theoretic Definition of Similarity, ICML, ACM pg. 296-304, 1998.
[14] Journal of Theoretical and Applied Information Technology, http://www.jatit.org (Accessed 10 March 2016)
[15] International SOFA Workshop, http://trivent.hu/2012/ieeesofa2012/ (Accesed 10 March 2016)
[16] International SOFA Workshop, http://trivent.hu/2010/ieeesofa2010/ (Accesed 10 March 2016)
[17] International SOFA Workshop, http://trivent.hu/2009/ieeesofa2010/ (Accesed 10 March 2016)
[18] International SOFA Workshop, http://trivent.hu/2007/ieeesofa2007/ (Accesed 10 March 2016)
[19] International SOFA Workshop, http://trivent.hu/2005/ieeesofa2005/ (Accesed 10 March 2016)