A Comparison and Analysis of Name Matching Algorithms
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32804
A Comparison and Analysis of Name Matching Algorithms

Authors: Chakkrit Snae

Abstract:

Names are important in many societies, even in technologically oriented ones which use e.g. ID systems to identify individual people. Names such as surnames are the most important as they are used in many processes, such as identifying of people and genealogical research. On the other hand variation of names can be a major problem for the identification and search for people, e.g. web search or security reasons. Name matching presumes a-priori that the recorded name written in one alphabet reflects the phonetic identity of two samples or some transcription error in copying a previously recorded name. We add to this the lode that the two names imply the same person. This paper describes name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search. The implementation contains algorithms for computing a range of fuzzy matching based on different types of algorithms, e.g. composite and hybrid methods and allowing us to test and measure algorithms for accuracy. NYSIIS, LIG2 and Phonex have been shown to perform well and provided sufficient flexibility to be included in the linkage/matching process for optimising name searching.

Keywords: Data mining, name matching algorithm, nominaldata, searching system.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1071312

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11012

References:


[1] B. M. Diaz, ''Nominal data visualisation: The Star-Trek Paradigm,'' Computers in Genealogy, vol. 5, no. 1, 1994, pp. 23-34.
[2] I. Winchester, ''The linkage of Historical Records by Man and Computer: Techniques and problems,'' Journal of Interdisciplinary History, vol. 1, 1970, pp. 107-124.
[3] L. E. Gill, ''OX-LINK: The Oxford Medical Record Linkage System, Complex linkage made easy, Record Linkage Techniques,'' in: Proc. of an International Workshop and Exposition, 1997, pp. 15-33.
[4] C. Snae and M. Brueckner, ''Concept and Rule Based Naming System,'' The Information Universe: Journal of Issues in Informing Science and Information Technology, vol. 3, 2006, pp. 619-634.
[5] R. W. Hamming, Coding and Information Theory, 2nd Ed. Englewood Cliffs, NJ: Prentice Hall, 1986.
[6] G. Bouchard and C. Pouyez, ''Name Variations and Computerised Record Linkage,'' Historical Methods, vol. 13, no. 2, 119-125, 1980.
[7] L. K. Branting, ''Name-Matching Algorithms for Legal Case-Management Systems,'' Refereed article in: The Journal of Information, Law and Technology (JILT), 2002. Available: http://www2.warwick.ac.uk/fac/soc/law/elj/jilt/2002_1/branting/
[8] D. Jurafsky and J.H. Martin, Speech and Language Processing, Prentice Hall, 2000.
[9] I.P. Fellegi and A. B. Sumter, ''A Theory for Record Linkage,'' Journal of the American Statistical Association, vol. 64, pp. 1183-1210, 1969.
[10] G. Bouchard, ''The processing of ambiguous links in computerised family reconstruction,'' Historical Methods, vol. 19, no. 1, pp. 9-19, 1986.
[11] D. De Brou and M. Olsen, ''The Guth Algorithm and the Nominal Record Linkage of Multi-Ethnic Populations,'' Historical Methods, vol. 19, no. 1, pp. 20-24, 1986.
[12] G. J. A. Guth, ''Surname Spellings and Computerised Record Linkage,'' Historical Methods. Newsletter, vol. 10, no. 1, pp. 10-19, 1976.
[13] V. I. Levenshtein, ''Binary codes capable of correcting deletions, insertions and reversals,'' Sov. Phys. Dokl., vol. 6, pp. 707-710, 1966.
[14] K. M. Odell and R. C. Russell, Soundex phonetic comparison system
[cf. U.S. Patents 1261167 (1918), 1435663 (1922)].
[15] A. Binstock and J. Rex, Practical Algorithms for Programmers. Addison-Wesley, Reading, Mass., pp. 158-160, 1995.
[16] A. J. Lait and B. Randell, ''An Assessment of Name Matching Algorithm,'' Society of Indexers Genealogical Group, Newsletter Contents, SIGGNL issues 17, 1998.
[17] C. Snae and B. M. Diaz, ''Name Matching for Linkage Among English Parish Register Records,'' in Proc. of the Human and Computer Conf, pp. 218-224, Japan, 2001.
[18] C. Snae and B. M. Diaz, ''An Interface for Mining Genealogical Nominal Data Using the Concept of linkage and a Hybrid Name Matching Algorithm,'' Journal of 3D-Forum Society, vol. 16, no. 1, 2002, pp. 142-147.
[19] L. E. Gill, M. J. Goldacre, H. M. Simmons, G. A. Bettley, and M. Griffith, ''Computerised Linkage of Medical Records: Methodological Guidelines,'' Journal of Epidemiology and Community Health, vol. 47, pp. 316-319, 1993.
[20] P. H. Reaney and R. M. Wilson, A Dictionary of English Surnames, Oxford University Press, 1997.
[21] C. Snae and M. Brucker, '' LOWCOST: Local Organisation Search With Consolidated Ontoplogies for name, Space and Time,'' in Proc. of the International Conf. on Software Engineering, Innsbruck, Austria, February 13 - 15, 2007.