Combining ILP with Semi-supervised Learning for Web Page Categorization
Authors: Nuanwan Soonthornphisaj, Boonserm Kijsirikul
Abstract:
This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervised Naive Bayes, which is the well-known algorithm for the text classification problem. The performance of our learning algorithm is also compare with other semi-supervised learning algorithms which are Co-Training and EM. The experimental results show that ICT algorithm outperforms those algorithms and the performance of the weak learner can be enhanced by ILP system.
Keywords: Inductive Logic Programming, Semi-supervisedLearning, Web Page Categorization
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1058149
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1647References:
[1] R. A. Calvo and H. A. Ceccatto, ''Intelligent document classification,'' Intelligent Data Analysis, vol. 4, no.5, 2000.
[2] F. Sebastiani, ''Machine learning in automated text categorization,'' ACM Computing Surveys (CSUR), vol.34, no. 1, pp. 1- 47, 2002.
[3] Y. Yang and X. Liu, ''A re-examination of text categorization methods,'' in Proc. 22nd Annu. Int. SIGIR, Berkley, 1999, pp. 42-49.
[4] B. Kijsirikul, P. Sasipongpairoege, N. Soonthornphisaj and S. Meknavin, ''Supervised and unsupervised learning algorithms for Thai Web page identification,'' In Proc. Pacific Rim Int. Conf. on Artificial Intelligence, Australia, 2000, pp. 690-700.
[5] S. Muggleton, S. and C. Feng, ''Efficient induction of logic programs,'' In Proc. 1st Conf. Algorithmic Learning Theory. 1990.
[6] J.R. Quinlan, ''Learning logical definitions from relations,'' Machine Learning, vol. 5, no. 3, pp.239-266, 1990.
[7] S. Muggleton, ''Inverse entailment and progol,'' New Generation Computing, vol. 13, pp. 245-286, 1995.
[8] T.M.Mitchell, Machine Learning. New York: McGraw-Hill, 1997, pp. 180-184.
[9] A. Blum and T.M. Mitchell, ''Combining labeled and unlabeled data with co-training,'' In Proc. 11th Annu. Conf. Computational Learning Theory, 1998.
[10] A.P. Dempster, N. M. Laird and D. B. Rubin, ''Maximum likelihood from incomplete data via the EM algorithm,'' Journal of the Royal Statistical Society Series B vol. 39, pp. 1-38, 1977.
[11] K. Nigam, A. McCallum, S. Thrun and T.M. Mitchell, ''Text classification from labeled and unlabeled documents using EM,'' Machine Learning, vol. 9 no. 2, pp.103-134, 1999.
[12] Drug Usage. 2001. Data set. http://www.kindcu.siit.ac.th, Phatumtani, Thailand.
[13] WebKb. 2000. Data set. http://www.cs.cmu.edu/afs/cs.cmu.edu Carnegie Mellon University, U.S.A.