Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32922
Observations about the Principal Components Analysis and Data Clustering Techniques in the Study of Medical Data

Authors: Cristina G. Dascâlu, Corina Dima Cozma, Elena Carmen Cotrutz


The medical data statistical analysis often requires the using of some special techniques, because of the particularities of these data. The principal components analysis and the data clustering are two statistical methods for data mining very useful in the medical field, the first one as a method to decrease the number of studied parameters, and the second one as a method to analyze the connections between diagnosis and the data about the patient-s condition. In this paper we investigate the implications obtained from a specific data analysis technique: the data clustering preceded by a selection of the most relevant parameters, made using the principal components analysis. Our assumption was that, using the principal components analysis before data clustering - in order to select and to classify only the most relevant parameters – the accuracy of clustering is improved, but the practical results showed the opposite fact: the clustering accuracy decreases, with a percentage approximately equal with the percentage of information loss reported by the principal components analysis.

Keywords: Data clustering, medical data, principal components analysis.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1085331

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479


[1] Chernick, M.R., Friis, R.H., Introductory Biostatistics for the Health Sciences, John Wiley & Sons Publ., 2003.
[2] Zhou, X.H., Obuchowski, N.A., McClish, D.K., Statistical Methods in Diagnostic Medicine, John Wiley & Sons Publ., 2002.
[3] Saporta, G., ┼×tefânescu, M.V., Analiza datelor ┼ƒi informaticâ, Ed. Economicâ, 1996 (in romanian).
[4] C. Dascâlu, Boiculese, L., "The Usefulness of Algorithms Based on Clustering in the Diagnosis Finding in Medical Practice", in Lecture Notes of the ICB Seminars - Statistics and Clinical Practice, editors: L. Bobrowski, J. Doroszewski, E. Marubini, N. Victor, Warsaw, 2000, pg. 53 - 56.
[5] Alsabti, K., Ranka, S., Singh, V., "An Efficient K-Means Clustering Algorithm", in Proceedings of the 1st Workshop on High-Performance Data Mining, 1998.
[6] Dumitrescu, D., Teoria clasificârii, Babe┼ƒ - Bolyai University, Cluj - Napoca, 1991 (in romanian).