Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Fahd Sabry Esmail; M. Badr Senousy; Mohamed Ragaie

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy

Authors: Fahd Sabry Esmail, M. Badr Senousy, Mohamed Ragaie

Abstract:

In recent years, there has been an explosion in the rate of using technology that help discovering the diseases. For example, DNA microarrays allow us for the first time to obtain a "global" view of the cell. It has great potential to provide accurate medical diagnosis, to help in finding the right treatment and cure for many diseases. Various classification algorithms can be applied on such micro-array datasets to devise methods that can predict the occurrence of Leukemia disease. In this study, we compared the classification accuracy and response time among eleven decision tree methods and six rule classifier methods using five performance criteria. The experiment results show that the performance of Random Tree is producing better result. Also it takes lowest time to build model in tree classifier. The classification rules algorithms such as nearest- neighbor-like algorithm (NNge) is the best algorithm due to the high accuracy and it takes lowest time to build model in classification.

Keywords: Data mining, classification techniques, decision tree, classification rule, leukemia diseases, microarray data.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1124005

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2564

References:

[1] B. Rajeswari and Aruchamy Rajini, Survey On Data Mining Algorithms to Predict Leukemia Types, Ijrset Volume 2, Issue 5, (2010).
[2] Sujata Dash, Bichitrananda Patra, B.K. Tripathy. A Hybrid Data Mining Technique for Improving the Classification Accuracy of Microarray Data Set, I.J. Information Engineering and Electronic Business, (2012), 2, 43-50.
[3] Monica Madhukar, Sos Agaian, Deterministic Model for Acute Myelogenous Leukemia Classification, IEEE International Conference on Systems, Man, and Cybernetics (2012).
[4] Schena M, Shalon D, Davis RW, Brown PO, Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science. 1995 Oct 20; 270(5235):467-70.
[5] Mutch DM, Berger A, Mansourian R, Rytz A, Roberts MA, Microarray data analysis: a practical approach for selecting differentially expressed genes,Genome Biol. (2001); 2(12):PREPRINT0009.
[6] Arma R, Marcos IL, Taboada V, Ucar E, Irantzu B, Fullaondo A, Pedro L, Zubiaga A. Microarray analysis of autoimmune diseases by machine learning procedures. IEEE Trans Inform Biomed (2009);13(3):341–50.
[7] CRISTINA OPREA, Performance evaluation of the data mining classification methods, Information society and sustainable development, (2014).
[8] Arunkumar Sivaraman, S. Arun Rajesh, Dr.M. Lakshmi, “Optimistic Diagnosis of Acute Leukemia Based On Human Blood Sample Using Feed Forward Back Propagation Neural Network diagnosis system”, International Journal of Innovative Research in Science, Volume 3, Special Issue 3, March (2014).
[9] A. Priyanga, S. Prakasam, Effectiveness of Data Mining - based Cancer Prediction System (DMBCPS), International Journal of Computer Applications, Volume 83 – No 10, December (2013).
[10] Jaya Suji. R1, Dr. Rajagopalan S.P, An automatic Oral Cancer Classification using Data Mining Techniques, International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 10, October (2013).
[11] S. Syed Shajahaan, S. Shanthi, V. ManoChitra, Application of Data Mining Techniques to Model Breast Cancer Data, International Journal of Emerging Technology and Advanced Engineering, Volume 3, Issue 11, November (2013).
[12] Sujata Dash, Bichitrananda Patra, B.K. Tripathy. A Hybrid Data Mining Technique for Improving the Classification Accuracy of Microarray Data Set, I.J. Information Engineering and Electronic Business, (2012), 2, 43-50.
[13] R.M. Chandrasekar Ph.D, V. Palaniammal M.C.A., M. Phil, Performance and Evaluation of Data Mining Techniques in Cancer Diagnosis, IOSR Journal of Computer Engineering (IOSR-JCE),(2013).
[14] Pushpalatha Pujari and Jyoti Bala Gupta, "Improving Classification Accuracy by Using Feature Selection and Ensemble Model", International Journal of Soft Computing and Engineering, ISSN:2231-2307, Vol.2, Issue 2, May (2012).
[15] C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller and etl. "Leukemia dataset "http://archiveorg.com/page/4089630/05-06-2014/http://tunedit.org/repo/mad_ssk/Leukemia.arff (2010).
[16] H. Hu, J. Li, A. Plank, H. Wang and G. Daggard, “A Comparative Study of Classification Methods for Microarray Data Analysis”, Proc. Fifth Australasian Data Mining Conference (AusDM2006), Sydney, Australia. CRPIT, ACS, vol. 61, (2006), pp. 33-37.
[17] I. H. Witten, and E. Frank, “Data Mining Practical Machine Learning Tools and Techniques,” Second Edition, Morgan Kaufmann Publisher, United States of America, (2005).
[18] Y. Zhao and Y. Zhang, “Comparison of Decision Tree Methods for Finding Active Objects,” National Astronomical Observatories, Advances of Space Research, (2007).
[19] Trilok Chand Sharma, Manoj Jain, "WEKA Approach for Comparative Study of Classification Algorithm", International Journal of Advanced Research in Computer and Communication Engineering Vol. 2, Issue 4, April (2013)
[20] L. Breiman, “Random Forests” Machine Learning. 45(1):5-32, (2001).
[21] Lior Rokach, Oded Maimon, "Data Mining and Knowledge Discovery Handbook", Chapter 9: Decision Trees (2005‏).
[22] Kawsar Ahmed, Tasnuba Jesmin, "Comparative Analysis of Data Mining Classification Algorithms in Type-2 Diabetes Prediction Data Using WEKA Approach", Internat. J. Sci. Eng., Vol. 7(2)2014:155-160, October (2014).
[23] W. Iba, & P. Langley, Induction of one-level decision trees. Proc. of the Ninth Inter. Machine Learning Conference (1992). Scotland: Morgan Kaufmann.
[24] S. B. Kotsiantis, D. Kanellopoulos and P. E. Pintelas,"Local Boosting of Decision Stumps for Regression and Classification Problems", Journal of Computers, vol. 1, no. 4, July (2006).
[25] N. Landwehr, M. Hall, and E. Frank, “Logistic model trees”. for Machine Learning, Vol. 59(1-2), pp.161-205, (2005).
[26] R. Kohavi, “Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid,” ser. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press (1996), pp. 202–207.
[27] Ashish Kumar Dogra, Tanuj Wala, "A Review Paper on Data Mining Techniques and Algorithms", International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), Volume 4 Issue 5, May (2015).
[28] Mirza Nazura Abdulkarim, “Classification and Retrieval of Research Papers: A Semantic Hierarchical Approach” (2010).
[29] Gaya Buddhinath and Damien Derry, "A Simple Enhancement to One Rule Classification", Department of Computer Science & Software Engineering. University of Melbourne, Australia, (2006)
[30] F. Leon, M. H. Zaharia and D. Galea, “Performance Analysis of Categorization Algorithms,” International Symposium on Automatic Control and Computer Science, (2004).
[31] B. Martin. Instance - Based Learning: Nearest Neighbour with generalisation, Department of Computer Science, University of Waikato, Hamilton, New Zealand, (1995)
[32] E. Frank and I. H. Witten, “Generating Accurate Rule Sets Without Global Optimization,” International Conference on Machine Learning, pages 144-151, (1998).
[33] B. R. Gaines and P. Compton, “Induction of Ripple-Down Rules Applied to Modeling Large Databases,” J. Intell. Inf. System.5(3), pages 211-228, (1995).
[34] I.H. Witten and E. Frank, “Data mining: practical machine learning tools and techniques” ISBN: 0-12-088407-0, (2005)
[35] Powers, David M W. "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies 2(1): 37–63, (2011).
[36] Divya Tomar and Sonali Agarwal, A survey on Data Mining approaches for Healthcare, International Journal of Bio-Science and Bio-Technology Vol.5, No.5, pp. 241-266 (2013).
[37] Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.I. (1984). Classification and regression trees. Belmont, Calif.: Wadsworth.
[38] Gama, J., Functional trees, Machine Learning, 2004, 55(3):219–250.