Visualization and Indexing of Spectral Databases
Authors: Tibor Kulcsar, Gabor Sarossy, Gabor Bereznai, Robert Auer, Janos Abonyi
Abstract:
On-line (near infrared) spectroscopy is widely used to support the operation of complex process systems. Information extracted from spectral database can be used to estimate unmeasured product properties and monitor the operation of the process. These techniques are based on looking for similar spectra by nearest neighborhood algorithms and distance based searching methods. Search for nearest neighbors in the spectral space is an NP-hard problem, the computational complexity increases by the number of points in the discrete spectrum and the number of samples in the database. To reduce the calculation time some kind of indexing could be used. The main idea presented in this paper is to combine indexing and visualization techniques to reduce the computational requirement of estimation algorithms by providing a two dimensional indexing that can also be used to visualize the structure of the spectral database. This 2D visualization of spectral database does not only support application of distance and similarity based techniques but enables the utilization of advanced clustering and prediction algorithms based on the Delaunay tessellation of the mapped spectral space. This means the prediction has not to use the high dimension space but can be based on the mapped space too. The results illustrate that the proposed method is able to segment (cluster) spectral databases and detect outliers that are not suitable for instance based learning algorithms.
Keywords: indexing high dimensional databases, dimensional reduction, clustering, similarity, k-nn algorithm.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1062634
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774References:
[1] H. Yamamotoa, H. Yamaji, E. Fukusaki, H. Fukuda, Canonical correlation analysis for multivariate regression and its application to metabolic fingerprinting, Biochemical Engineerin Journal 40 (2008) 199-204.
[2] J. Parkkinen, A.C. Legrand, Index and Search in a Spectral Imaging Database by PCA and NMF for Archiving Paintings Application, Department of Computer Science and Statistics, University of Joensuu, 2007.
[3] J.F. MacGregor, T. Kourtl, Statistical process control of multivariate processes, Control Fag. Practice, VoL 3, No. 3, pp. 403-414, 1995.
[4] I. Kopanakis, B. Theodoulidis, Visual data mining modeling techniques for the visualization of mining outcomes, Journal of Visual Languages and Computing 14 (2003) 543-589, 2003.
[5] X. Blasco, J.M. Herrero, J. Sanchis, M. Martnez, A new graphical visualization of n-dimensional Pareto front for decision-making in multiobjective optimization, JInformation Sciences 178 (2008) 39083924.
[6] N.Krmer, A.L. Boulesteix, G. Tutz, Penalized Partial Least Squares with applications to B-spline transformations and functional data, Chemometrics and Intelligent Laboratory Systems 94 (2008) 6069.
[7] Rolf Ergon, Informative PLS score-loading plots for process understanding and monitoring, Journal of Process Control 14 (2004) 889897.
[8] M. Greenacrea, T. Hastieb, Dynamic visualization of statistical learning in the context of high-dimensional textual data, Web Semantics: Science, Services and Agents on the World Wide Web 8 (2010) 163168.
[9] W.R. Browett, M.J. Stillman, DComputer-aided chemistryII. A spectraldatabase management program for use with microcomputers, Computers and Chemistry 11 (1987) 7382.
[10] Ehud Gudes, A uniform indexing scheme for object-oriented databases, Information Systems 22 (1997) 199-221.
[11] B. Descales, D. Lambert, J.R. Llinas, A. Martens, S. Osta, M. Sanchez, S. Bages, Method for determining properties using near infra-red (NIR) spectroscopy, Eutech Engineering Solutions (2000) US6.070.128.
[12] Yaser R. Sonbul, Topological near infrared analysis modeling of petroleum refinery products, Saudi Arabien Oil Company (2005) US6.897.071 B2.
[13] L. Jin, J.A. Fernndez Pierna, Q. Xu, F. Wahl, O.E. de Noord, C.A. Saby, D.L. Massart, Delaunay triangulation method for mutivariate calibration, Analytica Chimica Acta 488 (2003) 114.
[14] I. Lee, J. Yang, Common Clustering Algorithms, Comprehensive Chemometrics (2009) 577-618.
[15] S. Mimaroglu, E. Erdil, Combining multiple clusterings using similarity graph, Pattern Recognition 44-3 (2011) 694-703.
[16] Y. Wu, K. Ianakiev, V. Govindaraju, Improved k-nearest neighbor classification, Pattern Recognition 35-1 (2002) 2311-2318
[17] K.H. Esbensen, P. Geladi, Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background, Algorithms, History, Practice, Comprehensive Chemometrics (2009) 211-226.
[18] Ian R. Greenshields, Joel A. Rosiene, A fast wavelet-based Karhunen- Loeve transform , Pattern Recognition 31-77 (1998) 839-845.
[19] Jarkko Venna, Samuel Kaski, Local multidimensional scaling, Neural Networks 19 67 (2006) 889-899.
[20] F. Westad, M. Kermit, Independent Component Analysis, Comprehensive Chemometrics (2009) 227-248.
[21] I. Marn Carrin, E. Arias Antnez, M.M. Artigao Castillo, J.J. guila Guerrero, J.J. Miralles Canals Thread-based implementations of the false nearest neighbors method, Parallel Computing, Volume 35 1011 (2009) 523-534.