A Novel Prediction Method for Tag SNP Selection using Genetic Algorithm based on KNN
Single nucleotide polymorphisms (SNPs) hold much promise as a basis for disease-gene association. However, research is limited by the cost of genotyping the tremendous number of SNPs. Therefore, it is important to identify a small subset of informative SNPs, the so-called tag SNPs. This subset consists of selected SNPs of the genotypes, and accurately represents the rest of the SNPs. Furthermore, an effective evaluation method is needed to evaluate prediction accuracy of a set of tag SNPs. In this paper, a genetic algorithm (GA) is applied to tag SNP problems, and the K-nearest neighbor (K-NN) serves as a prediction method of tag SNP selection. The experimental data used was taken from the HapMap project; it consists of genotype data rather than haplotype data. The proposed method consistently identified tag SNPs with considerably better prediction accuracy than methods from the literature. At the same time, the number of tag SNPs identified was smaller than the number of tag SNPs in the other methods. The run time of the proposed method was much shorter than the run time of the SVM/STSA method when the same accuracy was reached.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1075336Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1474
 D. Brinza and A. Zelikovsky, "2SNP: scalable phasing based on 2-SNP haplotypes," Bioinformatics, vol. 22, pp. 371-3, Feb 1 2006.
 S. Buch, C. Schafmayer, H. Volzke, C. Becker, A. Franke, H. von Eller-Eberstein, C. Kluck, I. Bassmann, M. Brosch, F. Lammert, J. F. Miquel, F. Nervi, M. Wittig, D. Rosskopf, B. Timm, C. Holl, M. Seeger, A. ElSharawy, T. Lu, J. Egberts, F. Fandrich, U. R. Folsch, M. Krawczak, S. Schreiber, P. Nurnberg, J. Tepel, and J. Hampe, "A genome-wide association scan identifies the hepatic cholesterol transporter ABCG8 as a susceptibility factor for human gallstone disease," Nat Genet, vol. 39, pp. 995-9, Aug 2007.
 B. W. Zanke, C. M. Greenwood, J. Rangrej, R. Kustra, A. Tenesa, S. M. Farrington, J. Prendergast, S. Olschwang, T. Chiang, E. Crowdy, V. Ferretti, P. Laflamme, S. Sundararajan, S. Roumy, J. F. Olivier, F. Robidoux, R. Sladek, A. Montpetit, P. Campbell, S. Bezieau, A. M. O'Shea, G. Zogopoulos, M. Cotterchio, P. Newcomb, J. McLaughlin, B. Younghusband, R. Green, J. Green, M. E. Porteous, H. Campbell, H. Blanche, M. Sahbatou, E. Tubacher, C. Bonaiti-Pellie, B. Buecher, E. Riboli, S. Kury, S. J. Chanock, J. Potter, G. Thomas, S. Gallinger, T. J. Hudson, and M. G. Dunlop, "Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24," Nat Genet, vol. 39, pp. 989-94, Aug 2007.
 Y. J. Yoo, J. Tang, R. A. Kaslow, and K. Zhang, "Haplotype inference for present absent genotype data using previously identified haplotypes and haplotype patterns." vol. 23: Oxford Univ Press, 2007, p. 2399.
 O. Delaneau, C. Coulonges, P. Y. Boelle, G. Nelson, J. L. Spadoni, and J. F. Zagury, "ISHAPE: new rapid and accurate software for haplotyping," BMC Bioinformatics, vol. 8, p. 205, 2007.
 S. B. Gabriel, S. F. Schaffner, H. Nguyen, J. M. Moore, J. Roy, B. Blumenstiel, J. Higgins, M. DeFelice, A. Lochner, M. Faggart, S. N. Liu-Cordero, C. Rotimi, A. Adeyemo, R. Cooper, R. Ward, E. S. Lander, M. J. Daly, and D. Altshuler, "The structure of haplotype blocks in the human genome," Science, vol. 296, pp. 2225-9, Jun 21 2002.
 M. J. Daly, J. D. Rioux, S. F. Schaffner, T. J. Hudson, and E. S. Lander, "High-resolution haplotype structure in the human genome," Nat Genet, vol. 29, pp. 229-32, Oct 2001.
 N. Patil, A. J. Berno, D. A. Hinds, W. A. Barrett, J. M. Doshi, C. R. Hacker, C. R. Kautzer, D. H. Lee, C. Marjoribanks, D. P. McDonough, B. T. Nguyen, M. C. Norris, J. B. Sheehan, N. Shen, D. Stern, R. P. Stokowski, D. J. Thomas, M. O. Trulson, K. R. Vyas, K. A. Frazer, S. P. Fodor, and D. R. Cox, "Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21," Science, vol. 294, pp. 1719-23, Nov 23 2001.
 X. Ke and L. R. Cardon, "Efficient selective screening of haplotype tag SNPs," Bioinformatics, vol. 19, pp. 287-8, Jan 22 2003.
 K. Zhang, M. Deng, T. Chen, M. S. Waterman, and F. Sun, "A dynamic programming algorithm for haplotype block partitioning," Proc Natl Acad Sci U S A, vol. 99, pp. 7335-9, May 28 2002.
 K. Zhang and L. Jin, "HaploBlockFinder: haplotype block analyses," Bioinformatics, vol. 19, pp. 1300-1, Jul 1 2003.
 G. C. Johnson, L. Esposito, B. J. Barratt, A. N. Smith, J. Heward, G. Di Genova, H. Ueda, H. J. Cordell, I. A. Eaves, F. Dudbridge, R. C. Twells, F. Payne, W. Hughes, S. Nutland, H. Stevens, P. Carr, E. Tuomilehto-Wolf, J. Tuomilehto, S. C. Gough, D. G. Clayton, and J. A. Todd, "Haplotype tagging for the identification of common disease genes," Nat Genet, vol. 29, pp. 233-7, Oct 2001.
 T. U. M. Phuong, Z. Lin, and R. B. Altman, "CHOOSING SNPs USING FEATURE SELECTION." vol. 4, 2006, pp. 241-257.
 V. Bafna, B. V. Halldorsson, R. Schwartz, A. G. Clark, and S. Istrail, "Haplotypes and informative SNP selection algorithms: don't block out information," ACM New York, NY, USA, 2003, pp. 19-27.
 B. V. Halldorsson, V. Bafna, R. Lippert, R. Schwartz, F. M. De La Vega, A. G. Clark, and S. Istrail, "Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies," Genome Res, vol. 14, pp. 1633-40, Aug 2004.
E. Halperin, G. Kimmel, and R. Shamir, "Tag SNP selection in genotype data for maximizing SNP prediction accuracy," Bioinformatics, vol. 21 Suppl 1, pp. i195-203, Jun 2005.
P. H. Lee and H. Shatkay, "BNTagger: improved tagging SNP selection using Bayesian networks," Bioinformatics, vol. 22, pp. e211-9, Jul 15 2006.
 Z. Liu, S. Lin, and M. Tan, "Genome-wide tagging SNPs with entropy-based Monte Carlo method," J Comput Biol, vol. 13, pp. 1606-14, Nov 2006.
 C. S. Carlson, M. A. Eberle, M. J. Rieder, Q. Yi, L. Kruglyak, and D. A. Nickerson, "Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium," Am J Hum Genet, vol. 74, pp. 106-20, Jan 2004.
 K. Zhang, Z. Qin, T. Chen, J. S. Liu, M. S. Waterman, and F. Sun, "HapBlock: haplotype block partitioning and tag SNP selection software using a set of dynamic programming algorithms," Bioinformatics, vol. 21, pp. 131-4, Jan 1 2005.
 J. He and A. Zelikovsky, "MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression," Bioinformatics, vol. 22, pp. 2558-61, Oct 15 2006.
 J. He and A. Zelikovsky, "Informative SNP selection methods based on SNP prediction," IEEE Trans Nanobioscience, vol. 6, pp. 60-7, Mar 2007.
 K. Zhang, T. Chen, M. S. Waterman, and F. Sun, "A Set of Dynamic Programming Algorithms for Haplotype Block Partitioning and Tag SNP Selection via Haplotype Data or Genotype Data," pp. 1-26.
 B. Devlin and N. Risch, "A comparison of linkage disequilibrium measures for fine-scale mapping," Genomics, vol. 29, pp. 311-22, Sep 20 1995.
 H. I. Avi-Itzhak, X. Su, and F. M. De La Vega, "Selection of minimum subsets of single nucleotide polymorphisms to capture haplotype block diversity," Pac Symp Biocomput, pp. 466-77, 2003.
 J. H. Holland, Adaptation in natural and artificial systems: MIT Press Cambridge, MA, USA, 1992.
 E. Fix and J. Hodges, "Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties," Storming Media, 1951.
 D. E. Goldberg and K. Deb, "A comparative analysis of selection schemes used in genetic algorithms." vol. 1, 1991, pp. 69-93.
 G. A. Thorisson, A. V. Smith, L. Krishnan, and L. D. Stein, "The International HapMap Project Web site," Genome Res, vol. 15, pp. 1592-3, Nov 2005.
 J. He, K. Westbrooks, and A. Zelikovsky, "Linear reduction method for predictive and informative tag SNP selection," Int J Bioinform Res Appl, vol. 1, pp. 249-60, 2005.