{"title":"Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise \u2014 In the Case of Critical Dataset Size \u2014","authors":"Tetsuro Saeki, Yuichi Kato, Shoutarou Mizuno","volume":102,"journal":"International Journal of Computer and Information Engineering","pagesStart":1483,"pagesEnd":1489,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/10001804","abstract":"
STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to real-world data<\/p>\r\n","references":"[1] Z. Pawlak: Rough sets, Internat. J. Inform. Comput. Sci., Vol. 11, No.\r\n5, pp. 341-356 (1982).\r\n[2] A. Skowron and C. M. Rauser: The discernibility matrix and functions in\r\ninformation systems, In: R. Sl\u00b4owin\u00b4ski(ed), Intelligent Decision Support,\r\nHandbook of Application and Advances of Rough Set Theory, Kluwer\r\nAcademic Publishers, pp. 331-362 (1992).\r\n[3] Y. G. Bao, X. Y. Du, M. G. Deng and N. Ishii: An efficient method for\r\ncomputing all reducts, Transactions of the Japanese Society for Artificial\r\nIntelligence, Vol. 19, No. 3, pp. 166-173 (2004).\r\n[4] J. W. Grzymala-Busse: LERS \u2014 A system for learning from examples\r\nbased on rough sets. In: Intelligent Decision Support. Handbook of\r\nApplications and Advances of the Rough Sets Theory, ed. By R.\r\nSl\u00b4owin\u00b4ski, Kluwer Academic Publishers, pp. 3-18 (1992).\r\n[5] W. Ziarko: Variable precision rough set model, Journal of Computer and\r\nSystem Science, Vol. 46, pp. 39-59 (1993).\r\n[6] N. Shan and W. Ziarko: Data-based acquisition and incremental\r\nmodification of classification rules, Computational Intelligence, Vol. 11,\r\nNo. 2, pp. 357-370 (1995).\r\n[7] T. Nishimura, Y. Kato and T. Saeki: Studies on an effective algorithm\r\nto reduce the decision matrix, RSFDGrC 2011, LNAI Vol. 8743, pp.\r\n240-243, (2011).\r\n[8] T. Matsubayashi, Y. Kato and T. Saeki: A new rule induction method\r\nfrom a decision table using a statistical test, In: T. Li et al. (Eds.): RSKT\r\n2012, LNAI 7414, pp. 81-90, Springer, Heidelberg (2012).\r\n[9] J. W. Grzymala-Busse and W. J. Grzymala-Busse: Handing missing\r\nattribute values, O. Maimon, L. Rokach (eds.), Data Mining and\r\nKnowledge Discovery Handbook, 2nd ed., Springer, pp. 33-49 (2010).\r\n[10] J. W. Grzymala-Busse: MLEM2; A new algorithm for rule induction\r\nfrom imperfect data, Proceedings of 9th International Conference on\r\nInformation Processing and Management of Uncertainty in Knowledge\r\n\u2014 Based Systems, pp. 243-250 (2002).\r\n[11] R. E. Walpole, R. H. Myers, S. L. Myers, K. Ye: Probability and\r\nStatistics for Engineers and Scientists, Eighth edition, Pearson Prentice\r\nHall, pp.187-191 (2007).","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 102, 2015"}