Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Ekachai Phaisangittisagul; Rapeepol Chongprachawat

Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 33122

Analysis of Relation between Unlabeled and Labeled Data to Self-Taught Learning Performance

Authors: Ekachai Phaisangittisagul, Rapeepol Chongprachawat

Abstract:

Obtaining labeled data in supervised learning is often difficult and expensive, and thus the trained learning algorithm tends to be overfitting due to small number of training data. As a result, some researchers have focused on using unlabeled data which may not necessary to follow the same generative distribution as the labeled data to construct a high-level feature for improving performance on supervised learning tasks. In this paper, we investigate the impact of the relationship between unlabeled and labeled data for classification performance. Specifically, we will apply difference unlabeled data which have different degrees of relation to the labeled data for handwritten digit classification task based on MNIST dataset. Our experimental results show that the higher the degree of relation between unlabeled and labeled data, the better the classification performance. Although the unlabeled data that is completely from different generative distribution to the labeled data provides the lowest classification performance, we still achieve high classification performance. This leads to expanding the applicability of the supervised learning algorithms using unsupervised learning.

Keywords: Autoencoder, high-level feature, MNIST dataset, selftaught learning, supervised learning.

Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1333895

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1836

References:

[1] Banko, M., Brill, E. "Mitigating the paucity-of-data problem: exploring the effect of training corpus size on classifier performance for natural language processing", 1st Intl. conf. on Human language technology research, pp. 1-5, 2001.
[2] Ng A. et al., Lecture note on unsupervised feature learning and deep learning, http://deeplearning.stanford.edu/wiki, 2011.
[3] Raina, R., Battle, A., Lee. H., Packer, B., Ng, A. Y. Self-taught learning: Transfer learning from unlabeled data, In Proc. of the 24th Intl. Conf. on Machine Learning, 2007
[4] Hinton, G. E., Salakhutdinov, R. R. "Reducing the dimensionality of data with neural networks", Science, 313, pp. 504-507, 2006.
[5] Hinton, G. E. "Learning multiple layers of representation", Trends in Cognitive sciences, vol.11, no.10, pp. 428-434, 2007.
[6] Pan, S. J., Yang, Q. "A survey on Transfer Learning", IEEE trans. on knowledge and data engineering, vol. 22, no.10, pp. 1345-1359, 2010.
[7] Olshausen, B. A., Field, D. J. "Emergence of simple-cell receptive field properties by learning a sparse code for natural images", Nature, 381, pp. 607-609, 1996.
[8] Lee, H., Grosse R., Ranganath R., Ng A. "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations", In Proc. 26th Intl. conf. on Machine Learning , pp. 609-616, 2009.
[9] Lee, H., Battle, A., Raina, R., Ng, A. Y. "Efficient sparse coding algorithms" NIPS. 19, pp. 801-808, 2007.