Fine-Tuned Transformers for Translating Multi-Dialect Texts to Modern Standard Arabic
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 33100
Fine-Tuned Transformers for Translating Multi-Dialect Texts to Modern Standard Arabic

Authors: Tahar Alimi, Rahma Boujebane, Wiem Derouich, Lamia Hadrich Belguith

Abstract:

Machine translation task of low-resourced languages such as Arabic is a challenging task. Despite the appearance of sophisticated models based on the latest deep learning techniques, namely the transfer learning and transformers, all models prove incapable of carrying out an acceptable translation, which includes Arabic Dialects (AD), because they do not have official status. In this paper, we present a machine translation model designed to translate Arabic multidialectal content into Modern Standard Arabic (MSA), leveraging both new and existing parallel resources. The latter achieved the best results for both Levantine and Maghrebi dialects with a BLEU score of 64.99.

Keywords: Arabic translation, dialect translation, fine-tune, MSA Translation, transformer, translation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 0

References:


[1] Khoula Al-Kharusi and Abdurahman Abdulsalam. 2023. Machine Translation of Omani Arabic Dialect from Social Media. In Proceedings of ArabicNLP 2023, 2023. Association for Computational Linguistics, Singapore (Hybrid), 302-309. https://doi.org/10.18653/v1/2023.arabicnlp-1.24
[2] Rania Al-Sabbagh. 2023. The Negative Transfer Effect on the Neural Machine Translation of Egyptian Arabic Adjuncts into English: The Case of Google Translate. IJAES (October 2023). https://doi.org/10.33806/ijaes.v24i1.560
[3] Anthropic. 2023. Claude 2. Retrieved from https://www.anthropic.com/news/claude-2
[4] Houda Bouamor, Nizar Habash, and Kemal Oflazer. 2014. A Multidialectal Parallel Corpus of Arabic. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014 (2014), 1240-1245.
[5] Houda Bouamor, Nizar Habash, Mohammad Salameh, Wajdi Zaghouani, Owen Rambow, Dana Abdulrahim, Ossama Obeid, Salam Khalifa, Fadhl Eryani, Alexander Erdmann, and Kemal Oflazer. 2018. The MADAR Arabic Dialect Corpus and Lexicon. (2018).
[6] Wiem Derouich, Sameh Kchaou, and Rahma Boujelbane. 2023. ANLP-RG at NADI 2023 shared task: Machine Translation of Arabic Dialects: A Comparative Study of Transformer Models. In Proceedings of ArabicNLP 2023, 2023. Association for Computational Linguistics, Singapore (Hybrid), 683-689. https://doi.org/10.18653/v1/2023.arabicnlp-1.75
[7] Moussa Kamal Eddine, Nadi Tomeh, Nizar Habash, Joseph Le Roux, and Michalis Vazirgiannis. 2022. AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive Summarization. Retrieved April 16, 2024 from http://arxiv.org/abs/2203.10945
[8] Alexander Erdmann, Nizar Habash, Dima Taji, and Houda Bouamor. 2017. Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic. Retrieved September 21, 2023 from http://arxiv.org/abs/1712.06273
[9] Google. 2023. Gemini (BARD). Retrieved from https://gemini.google.com/app
[10] Ebtesam H Almansor and Ahmed Al-Ani. 2017. Translating Dialectal Arabic as Low Resource Language using Word Embedding. In RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning, November 10, 2017. Incoma Ltd. Shoumen, Bulgaria, 52-57. https://doi.org/10.26615/978-954-452-049-6_008
[11] Nizar Y. Habash. 2010. Introduction to Arabic Natural Language Processing. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-031-02139-8
[12] Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. 2018. Marian: Fast Neural Machine Translation in C++. In Proceedings of ACL 2018, System Demonstrations, 2018. Association for Computational Linguistics, Melbourne, Australia, 116-121. https://doi.org/10.18653/v1/P18-4020
[13] Karima Kadaoui, Samar M Magdy, Abdul Waheed, Tawkat Islam Khondaker, Ahmed Oumar El-Shangiti, El Moatez Billah Nagoudi, and Muhammad Abdul-Mageed. 2023: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties. Proceedings of the First Arabic Natural Language Processing Conference (ArabicNLP 2023), (2023), pages 52-75.
[14] Saméh Kchaou, Rahma Boujelbane, and Lamia Hadrich. 2023. Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 3 (March 2023), 1-21. https://doi.org/10.1145/3568674
[15] Abdullah Khered, Ingy Yasser Abdelhalim, Nadine Abdelhalim, Ahmed Soliman, and Riza Batista-Navarro. 2023. UniManc at NADI 2023 Shared Task: A Comparison of Various T5-based Models for Translating Arabic Dialectal Text to Modern Standard Arabic. Proceedings of the The First Arabic Natural Language Processing Conference (ArabicNLP 2023), (December 2023), pages 658-664.
[16] Philipp Koehn, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, Evan Herbst, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, and Christine Moran. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions - ACL ’07, 2007. Association for Computational Linguistics, Prague, Czech Republic, 177. https://doi.org/10.3115/1557769.1557821
[17] Mohamed Lichouri and Mourad Abbas. 2021. Machine Translation for Zero and Low-resourced Dialects using a New Extended Version of the Dialectal Parallel Corpus (Padic v2.0). Proceedings of the 4th International Conference on Natural Language and Speech Processing 4th, (2021).
[18] K Meftouh, S Harrat, and Kamel Smaïli. 2018. PADIC: extension and new experiments. International Conference on Advanced Technologies ICAT 7th, (April 2018).
[19] El Moatez Billah Nagoudi, AbdelRahim Elmadany, and Muhammad Abdul-Mageed. 2022. AraT5: Text-to-Text Transformers for Arabic Language Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022. Association for Computational Linguistics, Dublin, Ireland, 628-647. https://doi.org/10.18653/v1/2022.acl-long.47
[20] OpenAI. 2023. GPT-4. Retrieved from https://openai.com/research/gpt-4
[21] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2001. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02, 2001. Association for Computational Linguistics, Philadelphia, Pennsylvania, 311. https://doi.org/10.3115/1073083.1073135
[22] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Retrieved April 9, 2024 from http://arxiv.org/abs/1910.10683
[23] Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, and Andrea Gesmundo. 2022. Scaling Up Models and Data with t5x and seqio. Retrieved April 9, 2024 from http://arxiv.org/abs/2203.17189
[24] Malik Sallam and Dhia Mousa. 2024. Evaluating ChatGPT performance in Arabic dialects: A comparative study showing defects in responding to Jordanian and Tunisian general health prompts. MJAIH 2024, (January 2024), 1-7. https://doi.org/10.58496/MJAIH/2024/001
[25] Wael Salloum and Nizar Habash. 2012. Elissa: A Dialectal to Standard Arabic Machine Translation System. Proceedings of COLING 2012: Demonstration Papers, (December 2012), pages 385-392.
[26] Pamela Shapiro and Kevin Duh. 2019. Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation. Association for Computational Linguistics Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, (June 2019), 214-222. https://doi.org/10.18653/v1/W19-1424
[27] Hariram Veeramani, Surendrabikram Thapa, and Usman Naseem. 2023. DialectNLU at NADI 2023 Shared Task: Transformer Based Multitask Approach Jointly Integrating Dialect and Machine Translation Tasks in Arabic. In Proceedings of ArabicNLP 2023, 2023. Association for Computational Linguistics, Singapore (Hybrid), 614-619. https://doi.org/10.18653/v1/2023.arabicnlp-1.63
[28] Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2021. mT5: A massively multilingual pre-trained text-to-text transformer. Retrieved April 9, 2024 from http://arxiv.org/abs/2010.11934