Tool for Fast Detection of Java Code Snippets
Authors: Tomáš Bublík, Miroslav Virius
Abstract:
This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dynamically with the Scripthon language while preserving an acceptable speed. We used several optimizations to achieve very low number of comparisons during the matching algorithm.
Keywords: AST, Java, tree matching, Scripthon, source code recognition
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1096687
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1962References:
[1] Christophe-André M. Irniger, “Graph Matching - Filtering Databases of Graphs Using Machine Learning Techniques,” 2005, ISBN 1-58603- 557-6.
[2] B.D. McKay, “ractical graph isomorphism,” In Congressus Numerantium, volume 30, 1981, pages 45–87.
[3] J.R. Ullmann, “An algorithm for subgraph isomorphism,” Journal ofthe Association for Computing Machinery, 1976, pages 31-42.
[4] G. Levi, “A note on the derivation of maximal common subgraphs oftwo directed or undirected graphs,” Calcolo, 1972, pages 341-354.
[5] J. McGregor, “Backtrack search algorithms and the maximal common subgraph problem,” Software-Practice and Experience, 1982, pages 23- 34.
[6] G.A. Stephen, “String Searching Algorithms,” World Scientific, 1994.
[7] H. Bunke and K. Shearer, “A graph distance metric based on the maximalcommon subgraph,” Pattern Recognition Letters, 1998, pages 255-259.
[8] M.-L. Fernandez and G. Valiente, “A graph distance metric combining maximum common subgraph and minimum common supergraph,” Pattern Recognition Letters, 2001, pages 753-758.
[9] W.D. Wallis, P. Shoubridge, M. Kraetz, and D. Ray, “Graph distances using graph union,” Pattern Recognition Letters, May 2001, pages 701- 704.
[10] A. Sanfeliu and K.S. Fu, “A distance measure between attributed relationalgraphs for pattern recognition,” IEEE Transactions on Systems, Man, and Cybernetics, 1983, 353-363.
[11] W.H. Tsai and K.S. Fu, “Error-correcting isomorphisms of attributedrelational graphs for pattern recognition.” IEEE Transactions on Systems, Man, and Cybernetics, 1979, pages 757-768.
[12] M.A. Eshera and K.S. Fu, “A graph distance measure for image analysis,” IEEE Transactions on Systems, Man, and Cybernetics, 1984, pages 398-408.
[13] E.K. Wong, “Three-dimensional object recognition by attributed graphs,” In H. Bunke and A. Sanfeliu, editors, Syntactic and Structural Pattern Recognition- Theory and Applications, World Scientific, 1990, pages 381-414.
[14] R. Wilson and E.R. Hancock, “Levenshtein distance for graph spectral features,” In J. Kittler, M. Petrou, and M. Nixon, editors, Proc. 17th Int. Conference on Pattern Recognition, volume 2, 2004, pages 489-492.
[15] A. Robles-Kelly and E.R. Hancock, “Graph edit distance from spectral seriation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, pages 365-378.
[16] T. Caelli and S. Kosinov, “Inexact graph matching using eigensubspace projection clustering,” Int. Journal of Pattern Recognition and Artificial Intelligence, 2004, pages 329-355.
[17] T. Caelli and S. Kosinov, “An eigenspace projection clustering method for inexact graph matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, pages 515-519.
[18] Ch. K. Roy, J. R. Cordy, and R. Koschke. 2009. “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Sci. Comput. Program. 74, 7 (May 2009), pp. 470-495.
[19] I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. 1998. “Clone Detection Using Abstract Syntax Trees,” in Proceedings of the International Conference on Software Maintenance (ICSM '98). IEEE Computer Society, Washington, DC, USA, pp. 368-377.
[20] V. Wahler, D. Seipel, J. W. von Gudenberg, and G. Fischer, "Clone detection in source code by frequent itemset techniques," In SCAM, 2004.
[21] R. Koschke, R.Falke, P. Frenzel, “Clone Detection Using Abstract Syntax Suffix Trees, Reverse Engineering,” 2006. WCRE '06. 13th Working Conference on, ISBN 0-7695-2719-1, 2006, pages 253-262.
[22] Tomáš Bublík., Miroslav Virius.: “New language for searching Java code snippets,” in: ITAT 2012. Proc. of the 12th national conference ITAT. diar, Sep 17 – 21 2012. Pavol Jozef Safrik University in Kosice. p. 35 – 40.
[23] J. T. Yao and M. Zhang. 2004. “A Fast Tree Pattern Matching Algorithm for XML Query,” in Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI ’04). IEEE Computer Society, Washington, DC, USA, pp. 235-241.
[24] G. Valiente, "Algorithms on Trees and Graphs," Springer, ISBN 3540435506, 2002, page 170.
[25] R. Shamir, D. Tsur, "Faster Subtree Isomorphism," In Journal of Algorithms, Volume 33 Issue 2, 1999, pages 267-280, doi:10.1006/jagm.1999.104