Weighted k-Nearest-Neighbor Techniques for High Throughput Screening Data

Kozak K; M. Kozak; K. Stapor

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32797

Weighted k-Nearest-Neighbor Techniques for High Throughput Screening Data

Authors: Kozak K, M. Kozak, K. Stapor

Abstract:

The k-nearest neighbors (knn) is a simple but effective method of classification. In this paper we present an extended version of this technique for chemical compounds used in High Throughput Screening, where the distances of the nearest neighbors can be taken into account. Our algorithm uses kernel weight functions as guidance for the process of defining activity in screening data. Proposed kernel weight function aims to combine properties of graphical structure and molecule descriptors of screening compounds. We apply the modified knn method on several experimental data from biological screens. The experimental results confirm the effectiveness of the proposed method.

Keywords: biological screening, kernel methods, KNN, QSAR

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062064

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201

References:

[1] Burden, F.R. 1989. "Molecular Identification Number For Substructure Searches", Journal of Chemical Information and Computer Sciences, 29, 225-227.
[2] D. Hand, H. Mannila, P. Smyth.: Principles of Data Mining. The MIT Press. (2001)
[3] Domeniconi, C., Peng, J., Gunopulos, D.: Locally adaptive metric nearest-neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002) 1281-1285
[4] D.P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret and J.-P. Vert, "Extensions of Marginalized Graph Kernels," Proc. 21st Int'l Conf. Machine Learning, 2004
[5] Friedman, J.: Flexible metric nearest neighbor classification. Technical Report 113, Stanford University Statistics Department (1994)
[6] Graham W. Richards. Virtual screening using grid computing: the screensaver project. Nature Reviews: Drug Discovery, 1:551-554, July 2002.
[7] Gregory A Landrum, Julie E Penzotti and Santosh Putta, Machinelearning models for combinatorial catalyst discovery. Rational Discovery LLC, 555 Bryant St 467, Palo Alto, CA 94301, USA
[8] Hastie, T., Tibshirani, R.: Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (1996) 607- 615
[9] H. Froehlich, J. K. Wegner, A. Zell, QSAR Comb. Sci. 2004, 23, 311 - 318.
[10] Hawkins, D.M., Young, S.S., and Rusinko, A. 1997. "Analysis of a Large Structure- Activity Data Set Using Recursive Partitioning", Quantitative Structure Activity Relationships 16, 296-302.
[11] http://cran.r-project.org/src/contrib/Descriptions/exactRankTests.html. "exactRankTests": Exact Distributions for Rank and Permutation Tests
[12] J. Kandola, J. Shawe-Taylor, and N. Cristianini. On the application of diffusion kernel to text data. Technical report, Neurocolt, 2002. NeuroCOLT Technical Report NC-TR-02- 122.
[13] Joachims T., 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features (A). In: Proceedings of the European Conference on Machine Learning (C).
[14] J.P. Myles and D.J. Hand, "The Multi-Class Metric Problem in Nearestneighbor Discrimination Rules," Pattern Recognition, vol. 723, pp. 1291-1297, 1990.
[15] J. Peng, D. Heisterkamp, and H.K. Dai, "LDA/SVM Driven Nearest Neighbor Classification," Proc. IEEE Computer Soc. Conf. Computer Vision and Pattern Recognition, pp. 58-63, 2001.
[16] Klopman, G. 1984. "Artificial Intelligence Approach to Structure- Activity Studies. Computer Automated Structure Evaluation of Biological Activity of Organic Molecules", American Chemical Society, Vol. 106, No. 24, 7315-7321.
[17] Li Baoli, Chen Yuzhong, and Yu Shiwen, 2002. A Comparative Study on Automatic Categorization Methods for Chinese Search Engine (A). In: Proceedings of the Eighth Joint International Computer Conference
[C]. Hangzhou: Zhejiang University Press, 117-120.
[18] Pearlman, R. S. and Smith, K. M. 1998. "Novel software tools for chemical diversity", Perspectives in Drug Discovery and Design, 9/10/11, 339-353.
[19] S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in hiv data. In 7th International Conference on Knowledge Discovery and Data Mining, 2001.
[20] R.D. Short and K. Fukunaga, "Optimal Distance Measure for Nearest Neighbor Classification," IEEE Trans. Information Theory, vol. 27, pp. 622-627, 1981.
[21] Yang Y. and Liu X., 1999. A Re-examination of Text Categorization Methods (A). In: Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (C). 42-49.
[22] Westfall, P. H. & Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment, John Wiley & Sons.
[23] W. Hechenbichler, K., Schliep, K.: Weighted k-Nearest-Neighbor Techniques and Ordinal Classification. SFB Discussion paper 399. (2004)