Performance Evaluation of Data Mining Techniques for Predicting Software Reliability
Authors: Pradeep Kumar, Abdul Wahid
Abstract:
Accurate software reliability prediction not only enables developers to improve the quality of software but also provides useful information to help them for planning valuable resources. This paper examines the performance of three well-known data mining techniques (CART, TreeNet and Random Forest) for predicting software reliability. We evaluate and compare the performance of proposed models with Cascade Correlation Neural Network (CCNN) using sixteen empirical databases from the Data and Analysis Center for Software. The goal of our study is to help project managers to concentrate their testing efforts to minimize the software failures in order to improve the reliability of the software systems. Two performance measures, Normalized Root Mean Squared Error (NRMSE) and Mean Absolute Errors (MAE), illustrate that CART model is accurate than the models predicted using Random Forest, TreeNet and CCNN in all datasets used in our study. Finally, we conclude that such methods can help in reliability prediction using real-life failure datasets.
Keywords: Classification, Cascade Correlation Neural Network, Random Forest, Software reliability, TreeNet.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1112003
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1839References:
[1] K.K. Aggarwal, Y. Singh, A. Kaur and R. Malhotra, “Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study,” Software Process Improvement Practice Vol. 14, No. 1, pp. 39–62, 2008.
[2] L. Breiman, “Random Forests, “Machine Learning,” Vol. 35, no. 1, pp. 5-32. DOI: 10.1023/A:1010933404324, 2001.
[3] K. Funatsu, “Knowledge-Oriented Applications in Data Mining,” In Tech., under CC BY-NC-SA, 2011.
[4] J. Han, M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufmann Publishers, India, 2006.
[5] T. Hastie, R. Tibshirani and J. Friedman, “The Elements of Statistical Learning: Data Mining, Inference, and Prediction,” New York: Springer, 2001.
[6] S. Ho, M. Xie, and T. Goh, “A study of the connectionist models for software reliability prediction,” Computers and Mathematics with Applications, Vol. 46, pp. 1037-1045, 2003.
[7] N. Karunanithi, D. Whitley and Y. Malaiya, ”Prediction of software reliability using connectionist models,” IEEE Transactions on Software Engineering, Vol. 18, no. 7, pp. 563-574, 1992.
[8] R. Kohavi, “The power of decision tables,” The Eighth European Conference on Machine Learning (ECML-95), Heraclion, Greece 1995, pp. 174-189.
[9] C. Kuei, H. Yeu and L. Tzai, “A study of software reliability growth from the perspective of learning effects,” Reliability Engineering and System Safety, Vol. 93, no. 10, pp. 1410-1421, 2008.
[10] M.R. Lyu, “Handbook of Software Reliability Engineering,” McGraw Hill, India, pp.131-151, 1999.
[11] R. Malhotra, Y. Singh and A. Kaur, “Comparative analysis of regression and machine learning methods for predicting fault proneness models,” International Journal of Computer Applications in Technology, Vol. 35, no. 2, pp. 183-193, 2009.
[12] J. Mueller, F. Lemke, “Self-Organizing Data Mining: An Intelligent Approach to Extract Knowledge from Data,” Dresden, Berlin, 1999.
[13] D. Musa, “Software Reliability Engineering: More Reliable Software Faster and Cheaper”, Second Edition, McGraw-Hill: India, 2009.
[14] K. Raj, V. Ravi, “Software reliability prediction by using soft computing techniques,” The Journal of Systems and Software, pp. 576-583. DOI: 10.1016/jss.2007.05.005, 2008.
[15] Q. Ross, “C4.5: Programs for Machine Learning,” Morgan Kaufman Publishers: San Mateo, CA., 1993
[16] Salford predictive modelling system, http//www.salford-systems.com. (Accessed 1 July 2011).
[17] E. Scott, L. Christian, “The Cascade-Correlation Learning Architecture,” CMU-CS-90-100, School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, 1991.
[18] P.H. Sherrod, DTReg predictive modeling software, 2003 available at http://www.dtreg.com, (Accessed 8 January 2011).
[19] Y. Singh, P. Kumar, “A software reliability growth model for three-tier client-server system,” International Journal of Computer Applications, Vol. 1, no. 13, pp. 9-16. DOI: 10.5120/289-451, 2010.
[20] Y. Singh, P. Kumar, “Determination of software release instant of three-tier client server software system,” International Journal of Software Engineering, Vol. 1, no. 3, pp. 51-62, 2010.
[21] Y. Singh, P. Kumar, “Application of feed-forward networks for software reliability prediction,” ACM SIGSOFT Software Engineering Notes, Vol. 35, no. 5, September 2010, pp. 1-6. DOI: 10.1145/1838687.1838709, 2010.
[22] Y. Singh, P. Kumar, “Prediction of Software Reliability using Feed Forward Neural Networks,” Proceedings of Computational Intelligence and Software Engineering (CiSE), 2010 International Conference, Wuhan, China, DOI: 10.1109/CISE.2010.5677251, 2010.
[23] Y. Singh, A. Kaur and R. Malhotra, “Application of support vector machine to predict fault prone classes,” ACM SIGSOFT Software Engineering Notes, Vol. 34, No. 1, DOI= http://doi.acm.org/10.1145/1457516.1457529, 2009.
[24] R. Sitte, “Comparison of software reliability growth predictions: Neural Networks vs. Parametric Recalibration,” IEEE Transactions on Reliability, Vol. 48, no. 3, pp. 285-291, 1999.
[25] Software Life Cycle Empirical/Experience Database (SLED) compiled by Musa and published by Data & Analysis Center for Software (DACS). http://www.dacs.org (Accessed 14 February 2009).
[26] I. Witten, E. Frank, “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations,” Third Edition, Morgan Kaufman, Addison-Wesley, San Francisco CA, 2011.
[27] J. Zheng, “Predicting software reliability with neural network ensembles,” Expert Systems with Applications, Vol. 36, no. 2, pp. 216-222. DOI: 10.1016/j.eswa.2007.12.029, 2009.