Clustering of Variables Based On a Probabilistic Approach Defined on the Hypersphere

Paulo Gomes; Adelaide Figueiredo

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33126

Clustering of Variables Based On a Probabilistic Approach Defined on the Hypersphere

Authors: Paulo Gomes, Adelaide Figueiredo

Abstract:

We consider n individuals described by p standardized variables, represented by points of the surface of the unit hypersphere S_n-1. For a previous choice of n individuals we suppose that the set of observables variables comes from a mixture of bipolar Watson distribution defined on the hypersphere. EM and Dynamic Clusters algorithms are used for identification of such mixture. We obtain estimates of parameters for each Watson component and then a partition of the set of variables into homogeneous groups of variables. Additionally we will present a factor analysis model where unobservable factors are just the maximum likelihood estimators of Watson directional parameters, exactly the first principal component of data matrix associated to each group previously identified. Such alternative model it will yield us to directly interpretable solutions (simple structure), avoiding factors rotations.

Keywords: Dynamic Clusters algorithm, EM algorithm, Factor analysis model, Hierarchical Clustering, Watson distribution.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1088416

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1632

References:

[1] B. S. Everitt. Cluster Analysis, London: Arnold, 1993.
[2] E. M. Qannari, E. Vigneau, P. Luscan, A. C. Lefebvre and F. Vey. Clustering of variables: application in consumer and sensory studies. Food Quality and Preference, 8, 5/6, 423-428, 1997.
[3] E. Vigneau and E. M. Qannari. Clustering of variables around latent components. Communications in Statistics - Simulation and Computation, 32, 4, pp. 1131-1150, 2003.
[4] H. Hotelling. Analysis of a complex of statistical variables into principal components. J. Educational Psychology, 24, pp. 417-441, 1933.
[5] Y. Escoufier. Le traitement des variables vectorielles. Biometrics, 29, pp. 751-760, 1973.
[6] P. Gomes. Distribution de Bingham sur la n-sphere: une nouvelle approche de l’ Analyse~Factorielle, Thèse D’ État Université des Sciences et Techniques du Languedoc-Montpellier, 1987.
[7] A. Figueiredo. Classificação de variáveis no contexto de um modelo probabilístico definido na n-esfera. Tese de Doutoramento em Estatística e Investigação Operacional na especialidade de Estatística Experimental e Análise de Dados, Faculdade de Ciências, Universidade de Lisboa, 2000.
[8] K. Mardia and P. E. Jupp. Directional Statistics, 2nd edition, Wiley: Chichester, 2000.
[9] A. Figueiredo and P. Gomes. Power of tests of uniformity defined on the hypersphere. Communications in Statistics -Simulation and Computation, 22, 1, pp. 87-94, 2003.
[10] A. Figueiredo and P. Gomes. Performance of the EM algorithm on the identification of a mixture of Watson distributions defined on the hypersphere. REVSTAT-Statistical Journal, 4, 2, p. 19, 2006,
[11] A. Figueiredo and P. Gomes. Goodness-of-fit methods for the bipolar Watson distribution defined on the hypersphere. Statistics and Probability Letters, 76, pp. 142-152, 2006.
[12] P. Gomes and A. Figueiredo. “A new probabilistic approach for the classification of normalized variables”. In Contributed Papers of the Bulletin of the 52nd Session of the International Statistical Institute, vol. LVIII, Book 1, pp. 403-404, 1999.