Identity Verification Using k-NN Classifiers and Autistic Genetic Data
Authors: Fuad M. Alkoot
DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1131904Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
 Peter M. Vallone; Carolyn R. Hill; John M. Butler, “Demonstration of Rapid Multiplex PCR Amplification Involving 16 Genetic Loci”. Forensic Science International, Vol. 3, 1, pp42:45. 2008.
 John Ashcroft, Deborah J. Daniels Sarah V. Hart. Using DNA to Solve Cold Cases, U.S. Department of Justice Office of Justice Programs National Institute of Justice special report, July 2nd 2002.
 Abdullah Alqallaf and Ahmed Tewfik, “Maximum Likelihood Principle for DNA Copy Number Analysis,” IEEE Int’l Conference on Acoustics, Speech, and Signal Processing, IEEE/ICASSP, Taipei, Taiwan, April, 2009.
 Alexey Tsymbal, Padraig Cunningham, Mycola Pechenizkiy and seppo Puuronen, Search strategies for ensemble feature selection in medical diagnosis, 16th IEEE symposium on computer based medical systems, 2003, June 2003, 124-129.
 M. P. Sampat, et. al., Supervised parametric and non parametric classification of chromosome images, Pattern Recognition 38(2005) 1209-1223.
 Hyunseok kook et. al., Multi-stimuli multi-channel data and decision fusion strategies for dyslexia prediction using neonatal ERPS, Pattern Recognition vol. 38, no 11, 2005, 2174-2184.
 Hojin Moon, Hongshik Ahn, Ralph L Kodell, Chien-Ju Lin, Songjoon Baek, and James J Chen, Classification methods for the development of genomic signatures from high-dimensional data, Genome Biol. 2006; 7(12): R121.
 Fuad M. Alkoot, Abdullah K. Alqallaf. Investigating machine learning techniques for the detection of autism. International Journal of Data Mining and Bioinformatics, 2016; 16 (2): 141 DOI: 10.1504/IJDMB.2016.10000981.