Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.
Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1316071
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1529References:
[1] Stéphane, T. (2012). Data Mining et statistique décisionnelle: L'intelligence des données. Éditions Technip.
[2] Drakos, N., Mann, J., Cain, M. W., Andrews, W., Knox, R. E., Valdes, R, … & Harris, K. (2008). Hype Cycle for Social Software, 2008.
[3] Pattanaprateep, O., McEvoy, M., Attia, J., & Thakkinstian, A. (2017). Evaluation of rational nonsteroidal anti-inflammatory drugs and gastro-protective agents use; association rule data mining using outpatient prescription patterns. BMC Medical Informatics and Decision Making, 17(1), 96.
[4] Marinakos, G., & Daskalaki, S. (2016). Viability prediction for retail business units using data mining techniques: a practical application in the Greek pharmaceutical sector. International Journal of Computational Economics and Econometrics, 6(1), 1-12.
[5] Luarn, P., Lin, H. W., Chiu, Y. P., Shyu, Y. L., & Lee, P. C. (2016). The Categorising Characteristics of Facebook Pages: Using the K-Means Grouping Method. International Journal of Business and Management, 11(2), 60.
[6] Buczak, A. L., & Guven, E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2), 1153-1176.
[7] Lad, H., & Mehta, M. A. (2017). Feature Based Object Mining and Tagging Algorithm for Digital Images. In Proceedings of International Conference on Communication and Networks (pp. 345-352). Springer, Singapore.
[8] Shmueli, G., & Lichtendahl Jr, K. C. (2017). Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons.
[9] Kumari, M., & Godara, S. (2011). Comparative study of data mining classification methods in cardiovascular disease prediction, 2(2), pp. 304-308.
[10] Lin, C. C., Chen, C. S., & Chen, A. P. (2017). Using Intelligent Computing and Data Stream Mining for Behavioral Finance associated with Market Profile and Financial Physics. Applied Soft Computing.
[11] Cherif, W., Madani, A., & Kissi, M. (2016, October). A combination of low-level light stemming and support vector machines for the classification of Arabic opinions. In Intelligent Systems: Theories and Applications (SITA), 2016 11th International Conference on (pp. 1-5). IEEE.
[12] Slavenas, M., Rodriguez, P., Craig, A., Wuerffel, E., & Will, J. (2016, July). Image Analysis and Infrastructure Support for Data Mining the Farm Security Administration: Office of War Information Photography Collection. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 1). ACM.
[13] Patil, T. R., & Sherekar, S. S. (2013). Performance analysis of Naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2), 256-261.
[14] Vapnik V (1995) The nature of statistical learning theory. Springer, New York.
[15] Hwang, W. J., & Wen, K. W. (1998). Fast KNN classification algorithm based on partial distance search. Electronics letters, 34(21), 2062-2063.
[16] Wasserman, P. D. (1993). Advanced methods in neural computing. John Wiley & Sons, Inc.
[17] Jackson, P. Introduction to expert systems. 1998. Harlow, UK: Addison Wesley, 3.
[18] Russell, S. J., & Norvig, P. (2002). Artificial intelligence: a modern approach (International Edition).
[19] Weiss, S. M., & Kapouleas, I. (1990). An empirical comparison of pattern recognition, neural nets and machine learning classification methods. Readings in machine learning, 177-183.
[20] Goldberger, J., Hinton, G. E., Roweis, S. T., & Salakhutdinov, R. R. (2005). Neighbourhood components analysis. In Advances in neural information processing systems (pp. 513-520).
[21] Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (1993). "Probability and statistics for engineers and scientists" (Vol. 5). New York: Macmillan.
[22] Lewis, D. D. (1998, April). Naive (Bayes) at forty: The independence assumption in information retrieval. In European conference on machine learning (pp. 4-15). Springer, Berlin, Heidelberg.
[23] Dubois, D. (2011). Intelligence Naturelle et Intelligence Artificielle. Acta Europeana Systemica, 1, 1-10.
[24] Wang, S. C. (2003). Artificial neural network. In Interdisciplinary Computing in Java Programming (pp. 81-100). Springer US.
[25] Vapnik V. (2000). The nature of statistical learning theory. Springer Science & Business Media.
[26] Smola AJ, Schölkopf B. A tutorial on support vector regression. Statistics and computing 2004; 14-3. p.199-222.
[27] Machhale, K., Nandpuru, H. B., Kapur, V., & Kosta, L. (2015, May). MRI brain cancer classification using hybrid classifier (SVM-KNN). In Industrial Instrumentation and Control (ICIC), 2015 International Conference on (pp. 60-65). IEEE.
[28] Xia, M., Lu, W., Yang, J., Ma, Y., Yao, W., & Zheng, Z. (2015). A hybrid method based on extreme learning machine and k-nearest neighbor for cloud classification of ground-based visible cloud image. Neurocomputing, 160, 238-249.
[29] Cherif, W., Madani, A., & Kissi, M. (2016). A hybrid optimal weighting scheme and machine learning for rendering sentiments in tweets. International Journal of Intelligent Engineering Informatics, 4(3-4), 322-339.
[30] Zhang, X., Qiu, D., & Chen, F. (2015). Support vector machine with parameter optimization by a novel hybrid method and its application to fault diagnosis. Neurocomputing, 149, 641-651.
[31] Madić, M., Radovanović, M., Manić, M., & Trajanović, M. (2014). Optimization of ANN models using different optimization methods for improving CO2 laser cut quality characteristics. Journal of the Brazilian Society of Mechanical Sciences and Engineering, 36(1), 91-99.
[32] Mathieu-Dupas, E. (2010). Algorithme des k plus proches voisins pondérés et application en diagnostic. In 42èmes Journées de Statistique.
[33] Ngai, E. W., Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert systems with applications, 36(2), 2592-2602.
[34] Ling, R., & Yen, D. C. (2001). Customer relationship management: An analysis framework and implementation strategies. Journal of computer information systems, 41(3), 82-97.
[35] Giraud-Carrier, C., & Povel, O. (2003). Characterising data mining software. Intelligent Data Analysis, 7(3), 181-192.
[36] Ma, J., Yin, S. Y., He, R. Q., Lu, Y. K., Xie, T. T., Shi, Q., ... & Gang, C. (2017). Significant target genes and signaling for miR-34a in gastrointestinal stromal tumors: a study of GEO-data mining and bioinformatics approaches. international journal of clinical and experimental pathology, 10(3), 2539-2553.
[37] Abdar, M., Kalhori, S. R. N., Sutikno, T., Subroto, I. M. I., & Arji, G. (2015). Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases. International Journal of Electrical and Computer Engineering (IJECE), 5(6), 1569-1576.
[38] Bellazzi, R., & Zupan, B. (2008). Predictive data mining in clinical medicine: current issues and guidelines. International journal of medical informatics, 77(2), 81-97.
[39] Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. arXiv preprint arXiv:1707.02919.
[40] Cherif, W., Madani, A., & Kissi, M. (2015). Towards an efficient opinion measurement in Arabic comments. Procedia Computer Science, 73, 122-129.
[41] Krallinger, M., Rabal, O., Lourenço, A., Oyarzabal, J., & Valencia, A. (2017). Information Retrieval and Text Mining Technologies for Chemistry. Chemical Reviews.
[42] Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of human genetics, 7(2), 179-188.
[43] Kuo, B. C., Ho, H. H., Li, C. H., Hung, C. C., & Taur, J. S. (2014). A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(1), 317-326.