Enhancing Word Meaning Retrieval Using FastText and NLP Techniques
Authors: Sankalp Devanand, Prateek Agasimani, V. S. Shamith, Rohith Neeraje
Abstract:
Machine translation has witnessed significant advancements in recent years, but the translation of languages with distinct linguistic characteristics, such as English and Sanskrit, remains a challenging task. This research presents the development of a dedicated English to Sanskrit machine translation model, aiming to bridge the linguistic and cultural gap between these two languages. Using a variety of natural language processing (NLP) approaches including FastText embeddings, this research proposes a thorough method to improve word meaning retrieval. Data preparation, part-of-speech tagging, dictionary searches, and transliteration are all included in the methodology. The study also addresses the implementation of an interpreter pattern and uses a word similarity task to assess the quality of word embeddings. The experimental outcomes show how the suggested approach may be used to enhance word meaning retrieval tasks with greater efficacy, accuracy, and adaptability. Evaluation of the model's performance is conducted through rigorous testing, comparing its output against existing machine translation systems. The assessment includes quantitative metrics such as BLEU scores, METEOR scores, Jaccard Similarity etc.
Keywords: Machine translation, English to Sanskrit, natural language processing, word meaning retrieval, FastText embeddings.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 58References:
[1] Fan, Angela, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El- Kishky, Siddharth Goyal, Mandeep Baines et al. "Beyond english-centric multilingual machine translation." The Journal of Machine Learning Research 22, no. 1 (2021): 4839-4886.
[2] Nair, Jayashree, K. Amrutha Krishnan, and R. Deetha. "An efficient English to Hindi machine translation system using hybrid mechanism." In 2016 international conference on advances in computing, communications and informatics (ICACCI), pp. 2109-2113. IEEE, 2016.
[3] Khan, Nadeem Jadoon, Waqas Anwar, and Nadir Durrani. "Machine translation approaches and survey for Indian languages." arXiv preprint arXiv:1701.04290 (2017).
[4] Raulji, Jaideepsinh K., Jatinderkumar R. Saini, Kaushika Pal, and Ketan Kotecha. "A novel framework for Sanskrit-Gujarati symbolic machine translation system." International Journal of Advanced Computer Science and Applications 13, no. 4 (2022).
[5] Krishna, Amrith, Pavankumar Satuluri, and Pawan Goyal. "A dataset for Sanskrit word segmentation." In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 105-114. 2017.
[6] Kunchukuttan, Anoop, Pratik Mehta, and Pushpak Bhattacharyya. "The iit bombay english-hindi parallel corpus." arXiv preprint arXiv:1710.02855 (2017).
[7] Schlichtkrull, Michael Sejr, Nicola De Cao, and Ivan Titov. "Interpreting graph neural networks for nlp with differentiable edge masking." arXiv preprint arXiv:2010.00577 (2020).
[8] Zhang, Yuhao, Peng Qi, and Christopher D. Manning. "Graph convolution over pruned dependency trees improves relation extraction." arXiv preprint arXiv:1809.10185 (2018).
[9] Khanuja, Simran, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam et al. "Muril: Multilingual representations for indian languages." arXiv preprint arXiv:2103.10730 (2021).
[10] Laskar, Sahinur Rahman, Abdullah Faiz Ur Rahman Khilji, Darsh Kaushik, Partha Pakray, and Sivaji Bandyopadhyay. "Improved English to Hindi multimodal neural machine translation." In Proceedings of the 8th Workshop on Asian Translation (WAT2021), pp. 155-160. 2021.
[11] Chakravarthi, Bharathi Raja, Mihael Arcan, and John Philip McCrae. "WordNet gloss translation for under-resourced languages using multilingual neural machine translation." In Proceedings of the second workshop on multilingualism at the intersection of knowledge bases and machine translation, pp. 1-7. 2019.
[12] Chakravarthi, Bharathi Raja, Mihael Arcan, and John P. McCrae. "Comparison of different orthographies for machine translation of under- resourced dravidian languages." In 2nd Conference on Language, Data and Knowledge (LDK 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
[13] Nair, Jayashree, K. Amrutha Krishnan, and R. Deetha. "An efficient English to Hindi machine translation system using hybrid mechanism." In 2016 international conference on advances in computing, communications and informatics (ICACCI), pp. 2109-2113. IEEE, 2016.
[14] Khan, Nadeem Jadoon, Waqas Anwar, and Nadir Durrani. "Machine translation approaches and survey for Indian languages." arXiv preprint arXiv:1701.04290 (2017).
[15] Fan, Angela, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El- Kishky, Siddharth Goyal, Mandeep Baines et al. "Beyond english-centric multilingual machine translation." The Journal of Machine Learning Research 22, no. 1 (2021): 4839-4886.
[16] Bahadur, Promila, Ajai Jain, and Durg Singh Chauhan. "Architecture of English to Sanskrit machine translation." In 2015 SAI Intelligent Systems Conference (IntelliSys), pp. 616-624. IEEE, 2015.
[17] Punia, Ravneet, Aditya Sharma, Sarthak Pruthi, and Minni Jain. "Improving Neural Machine Translation for Sanskrit-English." In Proceedings of the 17th International Conference on Natural Language Processing (ICON), pp. 234-238. 2020