Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Rajvir Kaur; Jeewani Anupama Ginige

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 32804

Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1316355

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1489

References:

[1] A. E. K. Sobel, “The move toward electronic health records,” Computer, vol. 45, no. 11, pp. 22–23, Nov 2012.
[2] S. M. R. Islam, D. Kwak, M. H. Kabir, M. Hossain, and K. S. Kwak, “The internet of things for health care: A comprehensive survey,” IEEE Access, vol. 3, pp. 678–708, 2015.
[3] X. Ma and H. Yu, “Global burden of cancer,” in YALE Journal of Biology and Medicine, vol. 79, no. 3-4, December 2006, pp. 85–94.
[4] A. I. of Health Welfare (AIHW), “Australian cancer incidence and mortality (acim) books: All cancers combined,” in ACIM books, February 2017.
[5] J. Ferley, E. Steliarova-Foucher, J. Lortet-Tieulent, S. Rosso, J. Coebergh, H. Comber, D. Forman, and F. Bray, “Cancer incidence and mortality patterns in europe: Estimates for 40 countries in 2012,” in European Journal of Cancer, vol. 49, February 2013, pp. 1374–1403.
[6] J. A. Lewis and J. Bernstein, Women’s Health: A Relational Perspective across the Life Cycle, ser. 1. Sudbury, Massachussets: Jones and Bartlett Publishers, 1996, vol. 1.
[7] L. A. Torre, F. Bray, R. L. Siegel, J. Ferlay, J. Lortet-Tieulent, and A. Jemal, “Global cancer statistics, 2012,” CA: A Cancer Journal for Clinicians, vol. 65, no. 2, pp. 87–108, 2015. (Online). Available: http://dx.doi.org/10.3322/caac.21262
[8] C. J. Murray, J. Lauer, A. Tandon, and J. Frank, “Overall health system achievement for 191 countries,” Computer, vol. 28, 2000.
[9] P. Bhati and M. Singhal, “Early stage detection and classification of melanoma,” in 2015 Communication, Control and Intelligent Systems (CCIS), Nov 2015, pp. 181–185.
[10] Y. S. Cho, C. L. Chin, and K. C. Wang, “Based on fuzzy linear discriminant analysis for breast cancer mammography analysis,” in 2011 International Conference on Technologies and Applications of Artificial Intelligence, Nov 2011, pp. 57–61.
[11] R. Alyami, J. Alhajjaj, B. Alnajrani, I. Elaalami, A. Alqahtani, N. Aldhafferi, T. O. Owolabi, and S. O. Olatunji, “Investigating the effect of correlation based feature selection on breast cancer diagnosis using artificial neural network and support vector machines,” in 2017 International Conference on Informatics, Health Technology (ICIHT), Feb 2017, pp. 1–7.
[12] A. Ali, S. M. Shamsuddin, A. L. Ralescu, and S. Visa, “Fuzzy classifier for classification of medical data,” in 2011 11th International Conference on Hybrid Intelligent Systems (HIS), Dec 2011, pp. 173–178.
[13] A. C. S. ACS, “American cancer society, cancer facts and figures 2017,” Atlanta; American Cancer Society ;2017, 2017.
[14] B. Shamsaei and C. Gao, “Comparison of some machine learning and statistical algorithms for classification and prediction of human cancer type,” in 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Feb 2016, pp. 296–299.
[15] Y. Y. Leung, C. Q. Chang, Y. S. Hung, and P. C. W. Fung, “Gene selection in microarray data analysis for brain cancer classification,” in 2006 IEEE International Workshop on Genomic Signal Processing and Statistics, May 2006, pp. 99–100.
[16] K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,” Computational and Structural Biotechnology Journal, vol. 13, pp. 8 – 17, 2015. (Online). Available: http://www.sciencedirect.com/ science/article/pii/S2001037014000464
[17] K. P. F. R. S., “Liii. on lines and planes of closest fit to systems of points in space,” Philosophical Magazine, vol. 2, no. 11, pp. 559–572, 1901. (Online). Available: http://dx.doi.org/10.1080/14786440109462720
[18] E. J. Cand`es, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–11:37, Jun. 2011. (Online). Available: http://doi.acm.org/10.1145/1970392.1970395
[19] B. Sch¨olkopf, A. J. Smola, and K.-R. M¨uller, “Advances in kernel methods,” B. Sch¨olkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA, USA: MIT Press, 1999, ch. Kernel Principal Component Analysis, pp. 327–352. (Online). Available: http://dl.acm.org/citation. cfm?id=299094.299113
[20] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, June 2005, pp. 886–893 vol. 1.
[21] J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” J. Mach. Learn. Res., vol. 13, pp. 281–305, Feb. 2012. (Online). Available: http://dl.acm.org/citation.cfm?id=2188385.2188395
[22] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms for hyper-parameter optimization,” in Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2011, pp. 2546–2554. (Online). Available: http://papers.nips.cc/paper/ 4443-algorithms-for-hyper-parameter-optimization.pdf
[23] E. Hazan, A. Klivans, and Y. Yuan, “Hyperparameter optimization: A spectral approach,” CoRR, vol. abs/1706.00764v2, 2017. (Online). Available: http://arxiv.org/abs/1606.08140
[24] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Proceedings of the 25th International Conference on Neural Information Processing Systems, ser. NIPS’12. USA: Curran Associates Inc., 2012, pp. 2951–2959. (Online). Available: http://dl.acm.org/citation.cfm?id=2999325.2999464
[25] J. Liu, X. Yuan, and B. P. Buckles, “Breast cancer diagnosis using level-set statistics and support vector machines,” in 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Aug 2008, pp. 3044–3047.
[26] L. Wei, Y. Yang, R. M. Nishikawa, and Y. Jiang, “A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications,” IEEE Transactions on Medical Imaging, vol. 24, no. 3, pp. 371–380, March 2005.
[27] C. Y. Wang, C. G. Wu, Y. C. Liang, and X. C. Guo, “Diagnosis of breast cancer tumor based on ica and ls-svm,” in 2006 International Conference on Machine Learning and Cybernetics, Aug 2006, pp. 2565–2570.
[28] N. Prez, M. A. Guevara, A. Silva, I. Ramos, and J. Loureiro, “Improving the performance of machine learning classifiers for breast cancer diagnosis based on feature selection,” in 2014 Federated Conference on Computer Science and Information Systems, Sept 2014, pp. 209–217.
[29] F. T. Johra and M. M. H. Shuvo, “Detection of breast cancer from histopathology image and classifying benign and malignant state using fuzzy logic,” in 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Sept 2016, pp. 1–5.
[30] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995. (Online). Available: http://dx. doi.org/10.1023/A:1022627411411
[31] Z. Nematzadeh, R. Ibrahim, and A. Selamat, “Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques,” in 2015 10th Asian Control Conference (ASCC), May 2015, pp. 1–6.
[32] D. Kashyap, A. Somani, J. Shekhar, A. Bhan, M. K. Dutta, R. Burget, and K. Riha, “Cervical cancer detection and classification using independent level sets and multi svms,” in 2016 39th International Conference on Telecommunications and Signal Processing (TSP), June 2016, pp. 523–528.
[33] M. A. Farooq, M. A. M. Azhar, and R. H. Raza, “Automatic lesion detection system (alds) for skin cancer classification using svm and neural classifiers,” in 2016 IEEE 16th International Conference on Bioinformatics and Bioengineering (BIBE), Oct 2016, pp. 301–308.
[34] E. Olfati, H. Zarabadipour, and M. A. Shoorehdeli, “Feature subset selection and parameters optimization for support vector machine in breast cancer diagnosis,” in 2014 Iranian Conference on Intelligent Systems (ICIS), Feb 2014, pp. 1–6.
[35] O. L. Mangasarian, W. N. Street, and W. H. Wolberg, “Breast cancer diagnosis and prognosis via linear programming,” Operations Research, vol. 43, no. 4, pp. 570–577, 1995. (Online). Available: https://doi.org/10. 1287/opre.43.4.570
[36] S.Aruna, D. S. Rajagopalan, and L. V.Nandakishore, “Knowledge based analysis of various statistical tools in detecting breast cancer,” in CCSEA 2011, AIRCCJ, Aug 2011, pp. 37–45.
[37] D. Lavanya and D. K. Rani, “Ensemble decision tree classifier for breast cancer data,” in International Journal of Information Technology Convergence and Services (IJITCS), vol. 2, no. 1, February 2012, pp. 17–24.
[38] G.I.Salama, M. B. Abdelhalim, and M. Zeid, “Breast cancer diagnosis on three different datasets usign multi-classifiers,” in International Journal of Computer and Information Technology (IJCIT), vol. 1, no. 1, September 2012, pp. 36–43.
[39] A. C. Y and D. O. SivaPrakasam, “The negative impact of missing value imputation in classification of diabetes dataset and solution for improvement,” in IOSR Journal of Computer Engineering (IOSRJCE), vol. 7, no. 4, November 2012, pp. 16–23.
[40] H. Jouni, M. Issa, A. Harb, G. Jacquemod, and Y. Leduc, “Neural network architecture for breast cancer detection and classification,” in 2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET), Nov 2016, pp. 37–41.