Categorical Missing Data Imputation Using Fuzzy Neural Networks with Numerical and Categorical Inputs
There are many situations where input feature vectors are incomplete and methods to tackle the problem have been studied for a long time. A commonly used procedure is to replace each missing value with an imputation. This paper presents a method to perform categorical missing data imputation from numerical and categorical variables. The imputations are based on Simpson-s fuzzy min-max neural networks where the input variables for learning and classification are just numerical. The proposed method extends the input to categorical variables by introducing new fuzzy sets, a new operation and a new architecture. The procedure is tested and compared with others using opinion poll data.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333518Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF
 J. L. Schafer, Analysis of Incomplete Data, Chapman & Hall, London,1997.
 P. Allison, Missing Data, Sage Publications, Inc, 2002.
 R. J. Little, and D. B. Rubin, Statistical Analysis with Missing Data, 2nd ed. , John Wiley and Sons, New York, 2002.
 A. P. Dempster, and D. B. Rubin, "Incomplete data in sample surveys" in W. G. Madow, I. Olkin, and D. B. Rubin, Eds., Sample Surveys, Vol. II: Theory and Annotated Bibliography, New York, Academic Press,1983.
 S. Mitra, S. K. Pal, and P. Mitra, "Data mining in soft computing framework: a survey", IEEE Transactions on Neural Networks, vol. 13, issue 1, pp. 3-14, Jan. 2002.
 P. K. Simpson, "Fuzzy min-max neural networks- Part 1: classification", IEEE Transactions on Neural Networks, vol. 3, Sep. 1992, pp. 776-786.
 P. K. Simpson, "Fuzzy min-max neural networks- Part 2: clustering", IEEE Transactions on Fuzzy Systems, vol. 1, pp. 32-45, Feb. 1993.
 D. R. Cox, Principles of Statistical Inference, Cambridge University Press, 2006.
 J. Carde├▒osa, and P. Rey-del-Castillo, "A fuzzy control approach for vote estimation", Proceedings of the Fifth International Conference on Information Technologies and Applications, vol. 1. Varna, Bulgaria, June 2007.
 M. Abdella, and T. Marwala, "The Use of Genetic Algorithms and Neural Networks to Approximate Missing Data in Database", ICCC 2005, IEEE 3rd International Conference on Computational Cybernetics, pp. 207-212, 2005.
 F. V. Nelwamondo, S. Mohamed, and T. Marwala, "Missing Data: A Comparison of Neural Network and Expectation Maximization Techniques", Current Science, vol. 93, no. 11, pp. 1514-1521, Dec. 2007.
 P. Lingras, M. Zhong, and S. Sharma, "Evolutionary Regression and Neural Imputations of Missing Values", Soft Computing Applications in Industry, Studies in Fuzziness and Soft Computing Series, vol. 226, Springer, Berlin/Heidelberg, pp. 151-163, 2008.
 B. Gabrys, and A. Bargiela, "General Fuzzy Min-Max Neural Network for Clustering and Classification", IEEE Transactions on Neural Networks, vol. 11, pp. 769-783, May 2000.
 B. Gabrys, "Neuro-Fuzzy Approach to Processing Inputs with Missing Values in Pattern Recognition Problems". International Journal of Approximate Reasoning, vol. 30, pp. 149-179, September 2002.
 M. J. Greenacre, Theory and Applications of Correspondence Analysis, Academic Press, London, 1984
 T. J. Santner, and D. E. Duffy, "A Note on A. Albert and J. A. Anderson-s Conditions for the Existence of Maximum Likelihood Estimates in Logistic Regression Models", Biometrika, vol. 73, pp. 755- 758, 1986.