Neural Network Imputation in Complex Survey Design
Authors: Safaa R. Amer
Abstract:
Missing data yields many analysis challenges. In case of complex survey design, in addition to dealing with missing data, researchers need to account for the sampling design to achieve useful inferences. Methods for incorporating sampling weights in neural network imputation were investigated to account for complex survey designs. An estimate of variance to account for the imputation uncertainty as well as the sampling design using neural networks will be provided. A simulation study was conducted to compare estimation results based on complete case analysis, multiple imputation using a Markov Chain Monte Carlo, and neural network imputation. Furthermore, a public-use dataset was used as an example to illustrate neural networks imputation under a complex survey design
Keywords: Complex survey, estimate, imputation, neural networks, variance.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1083891
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1974References:
[1] Paul D. Allison (1999). "Multiple imputation for missing data: A cautionary tale". Available: http://www.ssc.upenn.edu/~allison/MultInt99.pdf
[2] S. Amer, V. Lesser, and R. Burton, "Neural network imputation, a new fashion or a good tool: Linear neural network imputation," Proceedings of the Survey Research Section, American Statistical Association Meetings, 2003.
[3] D.A. Binder, W. SUN, "Frequency valid multiple imputation for surveys with a complex design. Proceedings of the Section on Survey Research Methods", American Statistical Association,, pp. 281-286, 1996.
[4] C.M. Bishop, Neural networks for pattern recognition. Oxford: Clarendon Press, 1995.
[5] K.R.W. Brewer, and R.W. Mellor, "The effect of sample structure on analytical surveys," Australian Journal of Statistics, 15, pp. 145-152, 1973.
[6] E.M. Burns, "Multiple imputation in a complex sample survey," Proceedings of the Survey Research Methods Section of the American Statistical Association, pp. 233-238, 1989.
[7] G. Casella, and R.L. Burger, Statistical inference. California: Duxbury press, 1990.
[8] R.L. Chambers, and C.J. Skinner (eds.) Analysis of survey data. Chester: Wiley, 2003.
[9] W.G. Cochran, Sampling techniques, (3rd Edition). New York: Wiley, 1977.
[10] L.M. Collins, J. L. Schafer, and C-M. Kam, "A comparison of inclusive and restrictive strategies in modern missing data procedures", Psychological Methods, 6 (4), pp. 330-351, 2001.
[11] I. P. Fellegi, and D. Holt. "A systematic approach to automatic edit and imputation," Journal of the American Statistical Association, 71, pp. 17- 35, 1976.
[12] A.E. Gelman, J.B.Carlin, H.S. Stern, and D.B. Rubin. Bayesian data analysis, London: Chapman & Hall, 1995.
[13] A.E. Gelman and D.B. Rubin. "Inference from iterative smulation using multiple sequences," Statistical Science, 7, pp. 457-472, 1992.
[14] S. Geman, and D. Geman. "Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images," IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, pp. 721-741, 1984.
[15] C.J. Geyer. "Practical Markov Chain Monte Carlo," Statistical Science, 7(4), 1992.
[16] M.H. Hansen, W.N. Hurwitz, and W.G. Madow. Sampling survey methods and theory, Vols. I and II. New York: Wiley, 1953.
[17] T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: Data mining, inference, and prediction. Springer, New York, 2001.
[18] N.J. Horton and S.R. Lipsitz. "Multiple imputation in practice: Comparisons of software packages for regression models with missing variables," The American Statistician, 5(3), 2001.
[19] R.A. Jacobs, M.I. Jordan, S.J. Nolman, and G.E. Hinton. "Adaptive mixtures of local experts," Neural Computation, 3, pp. 79-87(1991)..
[20] L. Kish. Survey sampling, New York: Wiley, 1965.
[21] Kish, L. "The Hundred years- wars of survey sampling," Statistics in Transition, 2, pp. 813-830, 1995.
[22] H. Lee, E. Rancourt, and C.E. Särndal. "Variance estimation from survey fata under single imputation," Survey Nonresponse, R.M. Groves, D.A. Dillman, J.L. Eltinge, and R.J.A. Little, (Eds). New York: John Wiley and Sons, 2002.
[23] Little, Roderick J.A. and Rubin, Donald B. Statistical analysis with missing data, New Jersey: John Wiley & Sons, 2002.
[24] S. L. Lohr. Sampling: Design and analysis, Duxbury Press, 1999.
[25] P.C. Mahalanobis. "Recent experiments in statistical sampling in the Indian Statistical Institute," Journal of the Royal Statistical Societ,, 109, pp. 325-370, 1946.
[26] D.A. Marker, D.R. Judkins, and M. Winglee. "Large-scale imputation for complex surveys." R.M. Groves, D.A.Dillman, J.L. Eltinge, and R.J.A Little, (Eds.) Survey Nonresponse, New York: John Wiley and Sons, 2002.
[27] National Center for Health Statistics. Data file documentation, National Health Interview Survey, 2001 (machine readable file and documentation). National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville, Maryland, 2002.
[28] E. Rancourt, C.-E. Särndal, and H. Lee. "Estimation of the variance in presence of nearest neighbor imputation," Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 888- 893, 1994.
[29] I. Rivals and L. Personnaz. "Construction of confidence intervals for neural networks based on least squares estimation," Neural Networks, 13, 463-484 (2000)..
[30] D.B. Rubin. "Formalizing subjective notions about the effect of nonrespondents in sample surveys," Journal of the American Statistical Association, 77, pp. 538-543, 1977.
[31] C.-E. Särndal, B. Swensson, and J. Wretman. Model assisted survey sampling, Springer-Verlag, 1991.
[32] C.-E. Särndal. "Methods for estimating the precision of survey estimates when imputation has been used," Survey Methodology, 18, pp. 241-265, 1992.
[33] J.L. Schafer. Analysis of incomplete multivariate data. London: Chapman and Hall, 1997.
[34] J. Schimert, J.L. Schafer, T.M. Hesterberg, C. Fraley, and D.B. Clarkson. Analyzing data with missing values in S-Plus. Seattle: Insightful Corp, 2000.
[35] A.F.M. Smith and G.O. Roberts. "Bayesian computation via the Gibbs sampler and related Markov Chain Monte Carlo methods," Journal of the Royal Statistical Society, Series B, 5(1), 1992.
[36] Vartivarian, S.L. and Little, R.J. (2003). "Weighting adjustments for unit nonresponse with multiple outcome variables," The University of Michigan Department of Biostatistics (Working Paper Series: Working Paper 21.) Available: http://www.bepress.com/umichbiostat/paper21
[37] R.S. Woodruff. "A simple method for approximating the variance of a complicated estimate," Journal of the American Statistical Association, 66, pp. 411-414, 1971.