A Multilanguage Source Code Retrieval System Using Structural-Semantic Fingerprints
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33156
A Multilanguage Source Code Retrieval System Using Structural-Semantic Fingerprints

Authors: Mohamed Amine Ouddan, Hassane Essafi

Abstract:

Source code retrieval is of immense importance in the software engineering field. The complex tasks of retrieving and extracting information from source code documents is vital in the development cycle of the large software systems. The two main subtasks which result from these activities are code duplication prevention and plagiarism detection. In this paper, we propose a Mohamed Amine Ouddan, and Hassane Essafi source code retrieval system based on two-level fingerprint representation, respectively the structural and the semantic information within a source code. A sequence alignment technique is applied on these fingerprints in order to quantify the similarity between source code portions. The specific purpose of the system is to detect plagiarism and duplicated code between programs written in different programming languages belonging to the same class, such as C, Cµ, Java and CSharp. These four languages are supported by the actual version of the system which is designed such that it may be easily adapted for any programming language.

Keywords: Source code retrieval, plagiarism detection, clonedetection, sequence alignment.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1079118

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1801

References:


[1] M. Fowler and K. Beck, Improving the Design of Existing Code, Addison-Wesley Professional, 1999.
[2] J. Kerievsky, Refactoring to Patterns, Addison-Wesley Professional, 2004.
[3] J.-P. Retaillé, Refactoring des applications Java/J2EE, Eyrolles, 2005.
[4] E.L. Burd and M. Munro, "Investigating the Maintenance Implications of the Replication of Code", International Conference on Software Maintenance, IEEE Computer Society, Bari, Italy, 1-3 October1997.
[5] C. Kapser and M.W. Godfrey, "Toward a taxonomy of clones in source code: A case study", In Proceedings of the First International Workshop on Evolution of Large-scale Industrial Software Applications (ELISA), IEEE, September, 2003.
[6] S. Ducasse, M. Rieger, and S. Demeyer. "A language independent approach for detecting duplicated code", International Conference on Software Maintenance, IEEE Computer Society, Oxford, England, 1999, pages 109-118.
[7] B.S. Baker, "On finding duplication and near-duplication in large software system", Proceedings of Second Working Conference on Reverse Engineering, 1995.
[8] M. Halstead, "Elements of Software Science". Elsevier, New York, 1977.
[9] K. Ottenstein, "An algorithmic approach to the detection and prevention of plagiarism", ACM SIGCSE Bull, Vol 8, 1976, pages 30-41.
[10] J. Donaldson, A. Lancaster, and P. Sposato, "A plagiarism detection system", ACM SIGCSE Bull, vol 13, 1981, pages15-20.
[11] J.A. Faidhi and S.K. Robinson, "An empirical approach for detecting program similarity and plagiarism within a university programming environment", Computer Education, Vol. 11, 1987, pages 11-19.
[12] K. Verco and M. Wise, "Software for detecting suspected plagiarism: comparing structure and attribute counting systems", Proceedings of the First Australian Conference on Computer Science Education, In J. Rosenberg, editor, ACM Press, 1996.
[13] J.F. Sowa, "Conceptual structures: information processing in mind and machine", In Proceedings of the 1993 ACM/SIGAPP symposium on applied computing, ACM Press, 1993, pages 476-481.
[14] G. Mishne and M. Rijke, "Source Code Retrieval using Conceptual Similarity", Language & Inference Technology Group University of Amsterdam, 2004.
[15] M. Wise, "YAP3: improved detection of similarities in computer program and other text", In Proc. 27th SIGCSE Technical Symp. on Computer Science Education, Philadelphia USA, February 15-18, 1996, pages 130-134.
[16] L. Prechelt, G. Malpohl, and M. Philippsen, "Finding plagiarisms among a set of programs with Jplag", Technical Report No. 1/00, University of Karlsruhe, Department of Informatics, March 2000.
[17] A. Aiken, "MOSS: a system for detecting software plagiarism", University of Berkeley, CA, available http://www.cs.berkeley.edu/~aiken/moss.html,1998.
[18] C.A.R. Hoare, "Some Properties of Predicate Transformers", Journal of the ACM, 25(3), July, 1978, pages 461-480.
[19] M.A. Ouddan and H. Essafi, " Caractérisation de Documents Code Source Basée sur un Dictionnaire de Grammaire: Application ├á la Détection de Plagiats", International Conference on Sciences of Electronic, Technology of Information and Telecommunications, SETIT 2007, IEEE, Tunisia, 25-29 Mars, 2007.
[20] J. Helfman, "Dotplot Patterns: A Literal Look at Pattern Languages", TAPOS, 2(1), 1995, pages 31-41.
[21] M.A. Ouddan, S. Sayah, M. Taïleb and H.Essafi, "Audio Database Retrieval Based on Sequence Alignment", ICSES'06, International Conference on Signals and Electronic Systems, Poland 17-20 Septembre 2006.
[22] http://www.antlr.org/