Multi-Objective Optimal Threshold Selection for Similarity Functions in Siamese Networks for Semantic Textual Similarity Tasks
Authors: Kriuk Boris, Kriuk Fedor
Abstract:
This paper presents a comparative study of fundamental similarity functions for Siamese networks in semantic textual similarity (STS) tasks. We evaluate various similarity functions using the STS Benchmark dataset, analyzing their performance and stability. Additionally, we present a multi-objective approach for optimal threshold selection. Our findings provide insights into the effectiveness of different similarity functions and offer a straightforward method for threshold selection optimization, contributing to the advancement of Siamese network architectures in STS applications.
Keywords: Siamese networks, Semantic textual similarity, Similarity functions, STS Benchmark dataset, Threshold selection.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 77References:
[1] J. Bromley, I. Guyon, Y. LeCun, E. S¨ackinger, and R. Shah, “Signature Verification using a ‘Siamese’ Time Delay Neural Network,” Neural Information Processing Systems, 1993.
[2] R. Mihalcea and P. Tarau, “TextRank: Bringing Order into Texts,” 2004. Available: https://aclanthology.org/W04-3252.pdf
[3] E. Agirre, D. Cer, M. Diab, and A. Gonzalez-Agirre, “SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity,” ACLWeb, Jul. 08, 2012. https://aclanthology.org/S12-1051 (accessed Jun. 28, 2024).
[4] D. B¨ar, T. Zesch, and I. Gurevych, “A Reflective View on Text Similarity,” 2011. Accessed: Jun. 28, 2024. Online. Available: https://aclanthology.org/R11-1071.pdf
[5] R. Kiros et al., “Skip-Thought Vectors,” Neural Information Processing Systems, 2015.
[6] W. Yin, H. Schütze, B. Xiang, and B. Zhou, “ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs,” Transactions of the Association for Computational Linguistics, vol. 4, pp. 259–272, Dec. 2016.
[7] W. Gomaa and A. Fahmy, “A Survey of Text Similarity Approaches,” International Journal of Computer Applications, vol. 68, no. 13, pp. 975–8887, 2013.
[8] D. Chandrasekaran and V. Mago, “Evolution of Semantic Similarity—A Survey,” ACM Computing Surveys, vol. 54, no. 2, pp. 1–37, Apr. 2021, doi: https://doi.org/10.1145/3440755.
[9] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. J´egou, “MTEB: A Multilingual Evaluation Benchmark for Cross-Lingual Transfer,” arXiv preprint arXiv:2009.11467, 2020.
[10] J. Mueller and A. Thyagarajan, “Siamese Recurrent Architectures for Learning Sentence Similarity,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1, Mar. 2016, doi: https://doi.org/10.1609/aaai.v30i1.10350.
[11] P. Neculoiu, L. Boroditsky, and M. Vertan, “Siamese networks for semantic textual similarity,” EMNLP, 2020.
[12] Z. Wang, P. Ng, X. Ma, R. Nallapati, and B. Xiang, “Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering,” arXiv:1908.08167 cs, Oct. 2019, Available: https://arxiv.org/abs/1908.08167
[13] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” arXiv.org, 2019. https://arxiv.org/abs/1908.10084
[14] H. Xu, H. Le, and G. Liu, “Smooth inverse frequency: A simple and effective similarity function for Siamese networks in semantic textual similarity,” ACL, 2021.
[15] X. Jiao, Y. Huang, and C. Hong, “Adaptive margin cosine similarity for Siamese networks in semantic textual similarity,” EMNLP, 2022.
[16] K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks,” arXiv:1503.00075 cs, May 2015, Available: https://arxiv.org/abs/1503.00075
[17] A. P. Parikh, O. T¨ackstr¨om, D. Das, and J. Uszkoreit, “A Decomposable Attention Model for Natural Language Inference,” arXiv:1606.01933
[cs], Sep. 2016, Available: https://arxiv.org/abs/1606.01933
[18] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical Attention Networks for Document Classification,” Association for Computational Linguistics, 2016. Available: https://aclanthology.org/N16-1174.pdf
[19] Z. Lin et al., “A Structured Self-attentive Sentence Embedding,” arXiv.org, 2017. https://arxiv.org/abs/1703.03130
[20] M. Peters et al., “Deep Contextualized Word Representations,” Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), vol. 1, 2018, doi: https://doi.org/10.18653/v1/n18-1202.
[21] J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” arXiv.org, 2018. https://arxiv.org/abs/1801.06146
[22] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv.org, May 24, 2019. https://arxiv.org/abs/1810.04805 (accessed Oct. 24, 2023).
[23] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv.org, Jul. 26, 2019. https://arxiv.org/abs/1907.11692
[24] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” arXiv:1909.11942 cs, Feb. 2020, Available: https://arxiv.org/abs/1909.11942
[25] C. Raffel et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020, Available: https://www.jmlr.org/papers/v21/20-074.html
[26] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” arXiv:2003.10555 cs, Mar. 2020, Available: https://arxiv.org/abs/2003.10555
[27] B. Tom et al., “Language Models are Few-Shot Learners,” Advances in Neural Information Processing Systems, vol. 33, 2020.
[28] Y. Zhang and Q. Yang, “A Survey on Multi-Task Learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 1–1, 2021, doi: https://doi.org/10.1109/tkde.2021.3070203.
[29] A. Vaswani et al., “Attention is All you Need,” Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008, 2017.
[30] Q. Li et al., “A Survey on Text Classification: From Shallow to Deep Learning,” arxiv.org, Aug. 2020, doi: https://doi.org/10.48550/arXiv.2008.00364