Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32727
Syntax Sensitive and Language Independent Detection of Code Clones

Authors: Kazuaki Maeda


This paper proposes a new technique to detect code clones from the lexical and syntactic point of view, which is based on PALEX source code representation. The PALEX code contains the recorded parsing actions and also lexical formatting information including white spaces and comments. We can record a list of parsing actions (shift, reduce, and reading a token) during a compiling process after a compiler finishes analyzing the source code. The proposed technique has advantages for syntax sensitive approach and language independency.

Keywords: Code Clones, Source Code Representation, XML, Parser, Parser Generator

Digital Object Identifier (DOI):

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1396


[1] Bill Moggridge, "Designing Interactions," The MIT Press, 2007.
[2] Brenda .S. Baker, "On Finding Duplication and Near-Duplication in Large Software Systems," Working Conferneceo on Reverse Engineering, pp.86-95, 1995.
[3] Ira D. Baxter, Andrew Yahin, et al., "Clone Detection Using Abstract Syntax Trees," International Conference on Software Maintenance, pp.368- 377, 1998.
[4] St'ephane Ducasse, Matthias Rieger, Serge Demeyer, "A Language Independent Approach for Detecting Duplicated Code," 15th IEEE International Conference on Software Maintenance, pp.109-118,1999.
[5] Cory Kapser and Michael W. Godfrey, "-Cloning Considered Harmful- Considered Harmful," Working Conference on Reverse Engineering, pp.19-28, 2006.
[6] Kazuaki Maeda, "XML-Based Source Code Representation with Parsing Actions," The International Conference on Software Engineering Research and Practice, 2007.
[7] PMD: Finding copied and pasted code, available from (accessed 2009-11-28).
[8] Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code," IEEE Transactions on Software Engineering, pp.654-670, vol.28, no.7, Jul. 2002.
[9] Vera Wahler, Dietmar Seipel, et al., "Clone Detection in Source Code by Frequent Itemset Techniques," IEEE International Workshop on Source Code Analysis and Manipulation, pp.128-135, 2004.
[10] William S. Evans, Christopher W. Fraser, Fei Ma, "Clone Detection via Structural Abstraction," Software Quality Journal, vol.17, no.4, pp.309- 330, 2009.
[11] Raghavan Komondoor, Susan Horwitz, "Using Slicing to Identify Duplication in Source Code," pp.40-56, LNCS vol.2126, 2001.
[12] Jens Krinke, "Identifying Similar Code with Program Dependence Graphs," Working Conference on Reverse Engineering, pp.301-309, 2001.
[13] Chao Liu, Chen Chen, et al., "GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis," The 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.872-881, 2006.
[14] Steven C. Johnson. "Yacc: Yet Another Compiler Compiler," UNIX Programmer-s Manual, vol. 2, pp. 353-387, 1979.
[15] Charles Donnelly, Richard Stallman, "Bison - The Yacc-Compatible Parser Generator," Free Software Foundation, 2006.
[16] Maxime Crochmore, Christphe Hancart, Thierry Lecroq, "Algorithms on Strings," Cambridge University Press, 2001.