A Testbed for the Experiments Performed in Missing Value Treatments
Authors: Dias de J. C. Lilian, Lobato M. F. Fábio, de Santana L. Ádamo
Abstract:
The occurrence of missing values in database is a serious problem for Data Mining tasks, responsible for degrading data quality and accuracy of analyses. In this context, the area has shown a lack of standardization for experiments to treat missing values, introducing difficulties to the evaluation process among different researches due to the absence in the use of common parameters. This paper proposes a testbed intended to facilitate the experiments implementation and provide unbiased parameters using available datasets and suited performance metrics in order to optimize the evaluation and comparison between the state of art missing values treatments.
Keywords: Data imputation, data mining, missing values treatment, testbed.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1086525
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1512References:
[1] Alireza Farhangfar, Lukasz Kurgan, and Witold Pedrycz, "A Novel Frameworl for Imputation of Missing Values in Databases," IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, pp. 692-709, 2007.
[2] P. E. Mcknight, K. M. Mcknight, S. Sidani, A. J. Figueredo. Missing data: a gentleintroduction. New York: The Guilford Press, 2007.
[3] Julián Luengo, Salvador García, and Francisco Herrera, "On the choice of the best impuation methods for missing values considering three groups of classification methods," Knowledge Information Systems, pp. 1-32, 2011.
[4] R. Little and D. Rubin, Statistical Analisys with Missing Data, 2nd ed.: John Wiley and Sons, 2002.
[5] Alireza Farhangfar, Lukasz Kurgan, and Witold Pedrycz, "Experimental analysis of methos for imputation of missing values in databases," Intelligent Computing: Theory and Applications II, vol. 5421, pp. 172- 182, 2004.
[6] Kamakshi Lakshminarayan, Steven A. Harp, and Tariq Samad, "Impuation of Missing Data in Industrial Databases," Applied Intelligence, pp. 259-275, 1999.
[7] A. Frank and A. Asuncion, UCI Machine Learning Repository, 2010, University of California, Irvine, School of Information and Computer Sciences.
[8] ACM Special Interest Group on Knowledge Discovery and Data Mining. ACM KDD CUP.
[Online]. http://www.sigkdd.org/kddcup/index.php.
[9] Government of Canada. Open Data - Open Data Portal.
[Online]. http://www.data.gc.ca/.
[10] Russel G. Congalton, "A Review of Assessing the Accuracy of Classification of Remotely Sensed Data," Retome Sensing of Enviroment, vol. 37, no. 1, pp. 35-46, 1991.
[11] Satyam Maheshwari, Jitendar Agrawal, and Sanjeev Sharma, "A new approach for Classification of Highly Imbalancade Datasets using Evolutionary Algorithms," International Journal of Scientific & Engineering Research, vol. 2, no. 7, pp. 1-5, 2011.
[12] Xiaofeng Zhu, Shichao Zhang, Zhi Jin, Zili Zhang, and Zhoumin Xu, "Missing Value Estimation for Mixed-Attribute Data Sets," IEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 110-121, 2011.
[13] Chih-Feng Liu, Thao-Tsen Chen, and Shie-Jue Lee, "A Comparison of Approaches for Dealing With Missing Values," in International Conference on Machine Learning and Cybernetics, Xian, 2012.
[14] Phimmarin Keering, Werasak Kurutach, and Tossapon Boongoen, "Cluster-based KNN Missing Value Imputation for DNA Microarray Data," in IEEE International Conference on Systems, Man, and Cybernetics, Seoul, Korea, 2012, pp. 445-450.
[15] Y. Zhang, C. Kambhampati, D. N, Davis, K. Goode, and G. F. Cleland, "A Comparative Study of Missing Value Imputation with Multiclass Classification for Clinical Heart Failure Data," in 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012, pp. 2840-2844.
[16] Xiaoling Lu, Jeisheng Si, Lanfeng Pan, and Yanyun Zhao, "Imputation of Missing Data Using Ensemble Algorithms," in 8th International Conference on Fuzzy Systems and Knowledge Discovery, 2011, pp. 1312-1315.
[17] Ludmila Himmelspach and Stefan Conrad, "Clustering Approaches for Data with Missing Values: Comparison and Evaluation," in International Conference on Digital Information Management, 2010, pp. 19-28.
[18] Lars Wohlrab and Johannes Fürnkranz, "A review and comparison of strategies for handling missing values in separate-and-conquer rule learning," Intelligent Information Systems, vol. 36, pp. 73-98, 2011.
[19] Dipak V. Patil and R. S. Bichkar, "Multiple Imputation of Missing Data with Genetic Algorithm based Techniques," IJCA Special Issue on "Evolutionary Computation for Optimization Techniques", pp. 74-78, 2010.
[20] Jeff Struckman and James Purtilo, "A testbed for evaluation of web intrusion prevention systems," in 3rd International Workshop on Security Measurements and Metrics, 2011.
[21] Uttam Adhikari et al., "Development of Power System Tes Bed for Data Mining of Synchrophasors Data, Cyber-Attack and Relay Testing in RTDS," in IEEE Power and Energy Society General Meeting, 2012, pp. 1-7.