Adaptive Few-Shot Deep Metric Learning
Authors: Wentian Shi, Daming Shi, Maysam Orouskhani, Feng Tian
Abstract:
Currently the most prevalent deep learning methods require a large amount of data for training, whereas few-shot learning tries to learn a model from limited data without extensive retraining. In this paper, we present a loss function based on triplet loss for solving few-shot problem using metric based learning. Instead of setting the margin distance in triplet loss as a constant number empirically, we propose an adaptive margin distance strategy to obtain the appropriate margin distance automatically. We implement the strategy in the deep siamese network for deep metric embedding, by utilizing an optimization approach by penalizing the worst case and rewarding the best. Our experiments on image recognition and co-segmentation model demonstrate that using our proposed triplet loss with adaptive margin distance can significantly improve the performance.
Keywords: Few-shot learning, triplet network, adaptive margin, deep learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 908References:
[1] S. Banerjee, A. Hati, S. Chaudhuri, and R. Velmuru-ga, “ Cosegnet: Image co-segmentation using a conditional siamese convolutional network,” in IJCAI, pages 673–679,2019.
[2] D. Batra, A. Kowdle, D. Parikh, J. Luo, and T. Chen, ”icoseg: Interactive co-segmentation with intelligent scribble guidance,” In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pages 3169–3176.IEEE, 2010.
[3] S. Changpinyo, W.-L. Chao, and F. Sha., “Predicting visual exemplars of unseen classes for zero-shot learning,” IEEE international conference on computer vision, pages 3476–3485, 2017.
[4] E. Craeymeersch., “One-shot learning, siamese networks and triplet loss with keras,” https://github.com/CrimyTheBold/tripletloss/, 2019.
[5] C. Ding and D. Tao, “Trunk-branch ensemble convolutional neural networks for video-based face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence. 40(4):1002–1014, 2018.
[6] C. B. Do and S. Batzoglou, “What is the expectation maximization algorithm?” Nature biotechnology 26(8):897–899, 2008.
[7] A. Dominguez-Sanchez, M. Cazorla, and S. Cogdell. ”Orts-Escolano. Pedestrian movement direction recognition using convolutional neural networks,” IEEE Transactions on Intelligent Transportation Systems, 18(12):3540–3548, 2017.
[8] Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. ”Transductive multi-view zero-shot learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(11):2332– 2345, 2015.
[9] W. Ge.”Deep metric learning with hierarchical triplet loss.” European Conference on Computer Vision (ECCV), pages 269–285, 2018.
[10] E. Hoffer and N. Ailon. ”Deep metric learning using triplet network”. In International Workshop on Similarity-Based Pattern Recognition, pages 84–92. Springer, 2015.
[11] L. Karlinsky, J. Shtok, S. Harary, E. Schwartz, A. Aides, R. Feris, R. Giryes, and A. M. Bronstein. ”Repmet: Representative-based metric learning for classification and few-shot object detection.” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5192–5201, 2019.
[12] G. Koch, R. Zemel, and R. Salakhutdinov. ”Siamese neural networks for one-shot image recognition”. In ICML deep learning workshop, volume 2. Lille, 2015.
[13] E. Kodirov, T. Xiang, and S. Gong. ”Semantic autoencoder for zero-shot learning.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4447– 4456, 2017.
[14] C. H. Lampert, H. Nickisch, and S. Harmeling. ”Learning to detect unseen object classes by between-class attribute transfer.” IEEE Conference on Computer Vision and Pattern Recognition, pages 951–958, 2009.
[15] Y. Li, D. Wang, H. Hu, Y. Lin, and Y. Zhuang. ”Zero-shot recognition using dual visual-semantic mapping paths”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3279–3287, 2017.
[16] Li Fe-Fei, Fergus, and Perona. ”A bayesian approach to unsupervised one-shot learning of object categories.” In Proceedings Ninth IEEE International Conference on Computer Vision, pages 1134–1141 vol.2, 2003.
[17] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. ”A simple neural attentive meta-learner”. 2017.
[18] P. M¨uller. ”Model-agnostic meta-learning (maml) for fast adaptation of deep networks.”
[19] F. Schroff, D. Kalenichenko, and J. Philbin. ”Facenet: A unified embedding for face recognition and clustering.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015.
[20] J. Snell, K. Swersky, and R. S. Zemel. ”Prototypical networks for few-shot learning”. 2017.
[21] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales.”Learning to compare: Relation network for few-shot learning.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1199– 1208, 2018.
[22] T.Munkhdalai and H. Yu. ”Meta networks.” In 2017 Proceedings of machine learning research, pages 2554–2563, 2017.
[23] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. ”Matching networks for one shot learning.” In Advances in neural information processing systems, pages 3630–3638, 2016.
[24] X. Wang, F. Yu, R. Wang, T. Darrell, and J. E. Gonzalez. ”Tafe-net: Task-aware feature embeddings for low shot learning.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[25] Y. Wang, X.-M. Wu, Q. Li, J. Gu, W. Xiang, L. Zhang, and V. O. Li. ”Large margin few-shot learning.” 2018.
[26] M. Zhu, D. Shi, M. Zheng, and M. Sadiq. ”Robust facial landmark detection via occlusion-adaptive deep networks.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3481–3491, 2019.