Search results for: neural machine translation (NMT)
4670 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform
Authors: Khadija Refouh
Abstract:
Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms
Procedia PDF Downloads 1484669 Improving Machine Learning Translation of Hausa Using Named Entity Recognition
Authors: Aishatu Ibrahim Birma, Aminu Tukur, Abdulkarim Abbass Gora
Abstract:
Machine translation plays a vital role in the Field of Natural Language Processing (NLP), breaking down language barriers and enabling communication across diverse communities. In the context of Hausa, a widely spoken language in West Africa, mainly in Nigeria, effective translation systems are essential for enabling seamless communication and promoting cultural exchange. However, due to the unique linguistic characteristics of Hausa, accurate translation remains a challenging task. The research proposes an approach to improving the machine learning translation of Hausa by integrating Named Entity Recognition (NER) techniques. Named entities, such as person names, locations, organizations, and dates, are critical components of a language's structure and meaning. Incorporating NER into the translation process can enhance the quality and accuracy of translations by preserving the integrity of named entities and also maintaining consistency in translating entities (e.g., proper names), and addressing the cultural references specific to Hausa. The NER will be incorporated into Neural Machine Translation (NMT) for the Hausa to English Translation.Keywords: machine translation, natural language processing (NLP), named entity recognition (NER), neural machine translation (NMT)
Procedia PDF Downloads 424668 Perception and Implementation of Machine Translation Applications by the Iranian English Translators
Authors: Abdul Amir Hazbavi
Abstract:
The present study is an attempt to provide a relatively comprehensive preview of the Iranian English translators’ perception on Machine Translation. Furthermore, the study tries to shed light on the status of implementation of Machine Translation among the Iranian English Translators. To reach the aforementioned objectives, the Localization Industry Standards Association’s questioner for measuring perceptions with regard to the adoption of a technology innovation was adapted and used to investigate three parameter among the participants of the study, namely familiarity with Machine Translation, general perception on Machine Translation and implementation of Machine Translation systems in translation tasks. The participants of the study were 224 last-year undergraduate Iranian students of English translation at 10 universities across the country. The study revealed a very low level of adoption and a very high level of willingness to get familiar with and learn about Machine Translation, as well as a positive perception of and attitude toward Machine Translation by the Iranian English translators.Keywords: translation technology, machine translation, perception, implementation
Procedia PDF Downloads 5224667 Chinese Undergraduates’ Trust in And Usage of Machine Translation: A Survey
Authors: Bi Zhao
Abstract:
Neural network technology has greatly improved the output of machine translation in terms of both fluency and accuracy, which greatly increases its appeal for young users. The present exploratory study aims to find out how the Chinese undergraduates perceive and use machine translation in their daily life. A survey is conducted to collect data from 100 undergraduate students from multiple Chinese universities and with varied academic backgrounds, including arts, business, science, engineering, and medicine. The survey questions inquire about their use (including frequency, scenarios, purposes, and preferences) of and attitudes (including trust, quality assessment, justifications, and ethics) toward machine translation. Interviews and tasks of evaluating machine translation output are also employed in combination with the survey on a sample of selected respondents. The results indicate that Chinese undergraduate students use machine translation on a daily basis for a wide range of purposes in academic, communicative, and entertainment scenarios. Most of them have preferred machine translation tools, but the availability of machine translation tools within a certain scenario, such as the embedded machine translation tool on the webpage, is also the determining factor in their choice. The results also reveal that despite the reportedly limited trust in the accuracy of machine translation output, most students lack the ability to critically analyze and evaluate such output. Furthermore, the evidence is revealed of the inadequate awareness of ethical responsibility as machine translation users among Chinese undergraduate students.Keywords: Chinese undergraduates, machine translation, trust, usage
Procedia PDF Downloads 1384666 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof
Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba
Abstract:
In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof
Procedia PDF Downloads 1444665 Statistical Comparison of Machine and Manual Translation: A Corpus-Based Study of Gone with the Wind
Authors: Yanmeng Liu
Abstract:
This article analyzes and compares the linguistic differences between machine translation and manual translation, through a case study of the book Gone with the Wind. As an important carrier of human feeling and thinking, the literature translation poses a huge difficulty for machine translation, and it is supposed to expose distinct translation features apart from manual translation. In order to display linguistic features objectively, tentative uses of computerized and statistical evidence to the systematic investigation of large scale translation corpora by using quantitative methods have been deployed. This study compiles bilingual corpus with four versions of Chinese translations of the book Gone with the Wind, namely, Piao by Chunhai Fan, Piao by Huairen Huang, translations by Google Translation and Baidu Translation. After processing the corpus with the software of Stanford Segmenter, Stanford Postagger, and AntConc, etc., the study analyzes linguistic data and answers the following questions: 1. How does the machine translation differ from manual translation linguistically? 2. Why do these deviances happen? This paper combines translation study with the knowledge of corpus linguistics, and concretes divergent linguistic dimensions in translated text analysis, in order to present linguistic deviances in manual and machine translation. Consequently, this study provides a more accurate and more fine-grained understanding of machine translation products, and it also proposes several suggestions for machine translation development in the future.Keywords: corpus-based analysis, linguistic deviances, machine translation, statistical evidence
Procedia PDF Downloads 1424664 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System
Authors: Benyamin Ahmadnia, Javier Serrano
Abstract:
In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus
Procedia PDF Downloads 4864663 An Analysis of Machine Translation: Instagram Translation vs Human Translation on the Perspective Translation Quality
Authors: Aulia Fitri
Abstract:
This aims to seek which part of the linguistics with the common mistakes occurred between Instagram translation and human translation. Instagram is a social media account that is widely used by people in the world. Everyone with the Instagram account can consume the captions and pictures that are shared by their friends, celebrity, and public figures across countries. Instagram provides the machine translation under its caption space that will assist users to understand the language of their non-native. The researcher takes samples from an Indonesian public figure whereas the account is followed by many followers. The public figure tries to help her followers from other countries understand her posts by putting up the English version after the Indonesian version. However, the research on Instagram account has not been done yet even though the account is widely used by the worldwide society. There are 20 samples that will be analysed on the perspective of translation quality and linguistics tools. As the MT, Instagram tends to give a literal translation without regarding the topic meant. On the other hand, the human translation tends to exaggerate the translation which leads a different meaning in English. This is an interesting study to discuss when the human nature and robotic-system influence the translation result.Keywords: human translation, machine translation (MT), translation quality, linguistic tool
Procedia PDF Downloads 3194662 End-to-End Spanish-English Sequence Learning Translation Model
Authors: Vidhu Mitha Goutham, Ruma Mukherjee
Abstract:
The low availability of well-trained, unlimited, dynamic-access models for specific languages makes it hard for corporate users to adopt quick translation techniques and incorporate them into product solutions. As translation tasks increasingly require a dynamic sequence learning curve; stable, cost-free opensource models are scarce. We survey and compare current translation techniques and propose a modified sequence to sequence model repurposed with attention techniques. Sequence learning using an encoder-decoder model is now paving the path for higher precision levels in translation. Using a Convolutional Neural Network (CNN) encoder and a Recurrent Neural Network (RNN) decoder background, we use Fairseq tools to produce an end-to-end bilingually trained Spanish-English machine translation model including source language detection. We acquire competitive results using a duo-lingo-corpus trained model to provide for prospective, ready-made plug-in use for compound sentences and document translations. Our model serves a decent system for large, organizational data translation needs. While acknowledging its shortcomings and future scope, it also identifies itself as a well-optimized deep neural network model and solution.Keywords: attention, encoder-decoder, Fairseq, Seq2Seq, Spanish, translation
Procedia PDF Downloads 1744661 Development of a French to Yorùbá Machine Translation System
Authors: Benjamen Nathaniel, Eludiora Safiriyu Ijiyemi, Egume Oneme Lucky
Abstract:
A review on machine translation systems shows that a lot of computational artefacts has been carried out to translate written or spoken texts from a source language to Yorùbá language through Machine Translation systems. However, there are no work on French to Yorùbá language machine translation system; hence, the study investigated the process involved in the translation of French-to-Yorùbá language equivalent with the view to adopting a rule- based MT approach to build a Machine Translation framework from simple sentences administered through questionnaire. Articles and relevant textbooks were reviewed with key speakers of both languages interviewed to find out the processes involved in the translation of French language and their equivalent in Yorùbálanguage simple sentences using home domain terminologies. Achieving this, a model was formulated using phrase grammar structure, re-write rule, parse tree, automata theory- based techniques, designed and implemented respectively with unified modeling language (UML) and python programming language. Analysing the result, it was observed when carrying out the result that, the Machine Translation system performed 18.45% above Experimental Subject Respondent and 2.7% below Linguistics Expert when analysed with word orthography, sentence syntax and semantic correctness of the sentences. And, when compared with Google Machine Translation system, it was noticed that the developed system performed better on lexicons of the target language.Keywords: machine translation (MT), rule-based, French language, Yoru`ba´ language
Procedia PDF Downloads 764660 Knowledge Required for Avoiding Lexical Errors at Machine Translation
Authors: Yukiko Sasaki Alam
Abstract:
This research aims at finding out the causes that led to wrong lexical selections in machine translation (MT) rather than categorizing lexical errors, which has been a main practice in error analysis. By manually examining and analyzing lexical errors outputted by a MT system, it suggests what knowledge would help the system reduce lexical errors.Keywords: machine translation, error analysis, lexical errors, evaluation
Procedia PDF Downloads 3364659 Fine-Tuned Transformers for Translating Multi-Dialect Texts to Modern Standard Arabic
Authors: Tahar Alimi, Rahma Boujebane, Wiem Derouich, Lamia Hadrich Belguith
Abstract:
Machine translation task of low-resourced languages such as Arabic is a challenging task. Despite the appearance of sophisticated models based on the latest deep learning techniques, namely the transfer learning and transformers, all models prove incapable of carrying out an acceptable translation, which includes Arabic Dialects (AD), because they do not have official status. In this paper, we present a machine translation model designed to translate Arabic multidialectal content into Modern Standard Arabic (MSA), leveraging both new and existing parallel resources. The latter achieved the best results for both Levantine and Maghrebi dialects with a BLEU score of 64.99.Keywords: Arabic translation, dialect translation, fine-tune, MSA translation, transformer, translation
Procedia PDF Downloads 594658 The Effect of Using Computer-Assisted Translation Tools on the Translation of Collocations
Authors: Hassan Mahdi
Abstract:
The integration of computer-assisted translation (CAT) tools in translation creates several opportunities for translators. However, this integration is not useful in all types of English structures. This study aims at examining the impact of using CAT tools in translating collocations. Seventy students of English as a foreign language participated in this study. The participants were divided into three groups (i.e., CAT tools group, Machine Translation group, and the control group). The comparison of the results obtained from the translation output of the three groups demonstrated the improvement of translation using CAT tools. The results indicated that the participants who used CAT tools outscored the participants who used MT, and in turn, both groups outscored the control group who did not use any type of technology in translation. In addition, there was a significant difference in the use of CAT for translation different types of collocations. The results also indicated that CAT tools were more effective in translation fixed and medium-strength collocations than weak collocations. Finally, the results showed that CAT tools were effective in translation collocations in both types of languages (i.e. target language or source language). The study suggests some guidelines for translators to use CAT tools.Keywords: machine translation, computer-assisted translation, collocations, technology
Procedia PDF Downloads 1924657 An Experience of Translating an Excerpt from Sophie Adonon’s Echos de Femmes from French to English, Using Reverso.
Authors: Michael Ngongeh Mombe
Abstract:
This Paper seeks to investigate an assertion made by some colleagues that there is no need paying a human translator to translate their literary texts, that there are softwares such as Reverso that can be used to do the translation. The main objective of this study is to examine the veracity of this assertion using Reverso to translate a literary text without any post-editing by a human translator. The work is based on two theories: Skopos and Communicative theories of translation. The work is a documentary research where data were collected from published documents in libraries, on the internet and from the translation produced by Reverso. We made a comparative text analyses of both source and target texts in a bid to highlight the weaknesses and strengths of the software. Findings of this work revealed that those who advocate the use of only Machine translation do so in ignorance of the translation mistakes usually made by the software. From the review of all the 268 segments of translation, we found out that the translation produced by Reverso is fraught with errors. We therefore recommend the use of human translators to either do the translation of their literary texts or revise the translation produced by machine to conform to the skopos of the work. This paper is based on Reverso translation. Similar works in the near future will be based on the other translation softwares to determine their weaknesses and strengths.Keywords: machine translation, human translator, Reverso, literary text
Procedia PDF Downloads 934656 Comparing Deep Architectures for Selecting Optimal Machine Translation
Authors: Despoina Mouratidis, Katia Lida Kermanidis
Abstract:
Machine translation (MT) is a very important task in Natural Language Processing (NLP). MT evaluation is crucial in MT development, as it constitutes the means to assess the success of an MT system, and also helps improve its performance. Several methods have been proposed for the evaluation of (MT) systems. Some of the most popular ones in automatic MT evaluation are score-based, such as the BLEU score, and others are based on lexical similarity or syntactic similarity between the MT outputs and the reference involving higher-level information like part of speech tagging (POS). This paper presents a language-independent machine learning framework for classifying pairwise translations. This framework uses vector representations of two machine-produced translations, one from a statistical machine translation model (SMT) and one from a neural machine translation model (NMT). The vector representations consist of automatically extracted word embeddings and string-like language-independent features. These vector representations used as an input to a multi-layer neural network (NN) that models the similarity between each MT output and the reference, as well as between the two MT outputs. To evaluate the proposed approach, a professional translation and a "ground-truth" annotation are used. The parallel corpora used are English-Greek (EN-GR) and English-Italian (EN-IT), in the educational domain and of informal genres (video lecture subtitles, course forum text, etc.) that are difficult to be reliably translated. They have tested three basic deep learning (DL) architectures to this schema: (i) fully-connected dense, (ii) Convolutional Neural Network (CNN), and (iii) Long Short-Term Memory (LSTM). Experiments show that all tested architectures achieved better results when compared against those of some of the well-known basic approaches, such as Random Forest (RF) and Support Vector Machine (SVM). Better accuracy results are obtained when LSTM layers are used in our schema. In terms of a balance between the results, better accuracy results are obtained when dense layers are used. The reason for this is that the model correctly classifies more sentences of the minority class (SMT). For a more integrated analysis of the accuracy results, a qualitative linguistic analysis is carried out. In this context, problems have been identified about some figures of speech, as the metaphors, or about certain linguistic phenomena, such as per etymology: paronyms. It is quite interesting to find out why all the classifiers led to worse accuracy results in Italian as compared to Greek, taking into account that the linguistic features employed are language independent.Keywords: machine learning, machine translation evaluation, neural network architecture, pairwise classification
Procedia PDF Downloads 1314655 Design of Neural Predictor for Vibration Analysis of Drilling Machine
Authors: İkbal Eski
Abstract:
This investigation is researched on design of robust neural network predictors for analyzing vibration effects on moving parts of a drilling machine. Moreover, the research is divided two parts; first part is experimental investigation, second part is simulation analysis with neural networks. Therefore, a real time the drilling machine is used to vibrations during working conditions. The measured real vibration parameters are analyzed with proposed neural network. As results: Simulation approaches show that Radial Basis Neural Network has good performance to adapt real time parameters of the drilling machine.Keywords: artificial neural network, vibration analyses, drilling machine, robust
Procedia PDF Downloads 3914654 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques
Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel
Abstract:
Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis
Procedia PDF Downloads 7124653 Study of Syntactic Errors for Deep Parsing at Machine Translation
Authors: Yukiko Sasaki Alam, Shahid Alam
Abstract:
Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.Keywords: syntactic parsing, error analysis, machine translation, deep parsing
Procedia PDF Downloads 5594652 Possibilities and Challenges of Using Machine Translation in Foreign Language Education
Authors: Miho Yamashita
Abstract:
In recent years, there have been attempts to introduce Machine Translation (MT) into foreign language teaching, especially in writing instructions. This is because the performance of neural machine translation has improved dramatically since 2016, and some university instructors started to introduce MT translations to their students as a "good model" to learn from. However, MT is still not perfect, and there are many incorrect translations. In order to translate the intended text into a foreign language, it is necessary to edit the original manuscript written in the native language (pre-edit) and revise the translated foreign language text (post-edit). The latter is considered especially difficult for users without a high proficiency level of foreign language. Therefore, the author allowed her students to use MT in her writing class in one of the private universities in Japan and investigated 1) how groups of students with different English proficiency levels revised MT translations when translating Japanese manuscripts into English and 2) whether the post-edit process differed when the students revised alone or in pairs. The results showed that in 1), certain non-post-edited grammatical errors were found regardless of their proficiency levels, indicating the need for teacher intervention, and in 2), more appropriate corrections were found in pairs, and their frequent use of a dictionary was also observed. In this presentation, the author will discuss how MT writing instruction can be integrated effectively in an aim to achieve multimodal foreign language education.Keywords: machine translation, writing instruction, pre-edit, post-edit
Procedia PDF Downloads 614651 English Grammatical Errors of Arabic Sentence Translations Done by Machine Translations
Authors: Muhammad Fathurridho
Abstract:
Grammar as a rule used by every language to be understood by everyone is always related to syntax and morphology. Arabic grammar is different with another languages’ grammars. It has more rules and difficulties. This paper aims to investigate and describe the English grammatical errors of machine translation systems in translating Arabic sentences, including declarative, exclamation, imperative, and interrogative sentences, specifically in year 2018 which can be supported with artificial intelligence’s role. The Arabic sample sentences which are divided into two; verbal and nominal sentence of several Arabic published texts will be examined as the source language samples. The translated sentences done by several popular online machine translation systems, including Google Translate, Microsoft Bing, Babylon, Facebook, Hellotalk, Worldlingo, Yandex Translate, and Tradukka Translate are the material objects of this research. Descriptive method that will be taken to finish this research will show the grammatical errors of English target language, and classify them. The conclusion of this paper has showed that the grammatical errors of machine translation results are varied and generally classified into morphological, syntactical, and semantic errors in all type of Arabic words (Noun, Verb, and Particle), and it will be one of the evaluations for machine translation’s providers to correct them in order to improve their understandable results.Keywords: Arabic, Arabic-English translation, machine translation, grammatical errors
Procedia PDF Downloads 1544650 Literary Translation Human vs Machine: An Essay about Online Translation
Authors: F. L. Bernardo, R. A. S. Zacarias
Abstract:
The ways to translate are manifold since textual genres undergoing translations are diverse. In this essay, our goal is to give special attention to the literary genre and to the online translation tool Google Translate (GT), widely used either by nonprofessionals or by scholars, in order to show evidence of the indispensability of human wit in a good translation. Our study has its basis on a literary review of prominent authors, with emphasis on translation categories. Also highlighting the issue of polysemous literary translation, we aim to shed light on the translator’s craft and the fallible nature of online translation. To better illustrate these principles, the methodology consisted on performing a comparative analysis involving the original text Moll Flanders by Daniel Defoe in English to its online translation given by GT and to a translation into Brazilian Portuguese performed by a human. We proceeded to identifying and analyzing the degrees of textual equivalence according to the following categories: volume, levels and order. The results have attested the unsuitability in a translation done by a computer connected to the World Wide Web.Keywords: Google Translator, human translation, literary translation, Moll Flanders
Procedia PDF Downloads 6494649 Efficiency of Google Translate and Bing Translator in Translating Persian-to-English Texts
Authors: Samad Sajjadi
Abstract:
Machine translation is a new subject increasingly being used by academic writers, especially students and researchers whose native language is not English. There are numerous studies conducted on machine translation, but few investigations have assessed the accuracy of machine translation from Persian to English at lexical, semantic, and syntactic levels. Using Groves and Mundt’s (2015) Model of error taxonomy, the current study evaluated Persian-to-English translations produced by two famous online translators, Google Translate and Bing Translator. A total of 240 texts were randomly selected from different academic fields (law, literature, medicine, and mass media), and 60 texts were considered for each domain. All texts were rendered by the two translation systems and then by four human translators. All statistical analyses were applied using SPSS. The results indicated that Google translations were more accurate than the translations produced by the Bing Translator, especially in the domains of medicine (lexis: 186 vs. 225; semantic: 44 vs. 48; syntactic: 148 vs. 264 errors) and mass media (lexis: 118 vs. 149; semantic: 25 vs. 32; syntactic: 110 vs. 220 errors), respectively. Nonetheless, both machines are reasonably accurate in Persian-to-English translation of lexicons and syntactic structures, particularly from mass media and medical texts.Keywords: machine translations, accuracy, human translation, efficiency
Procedia PDF Downloads 744648 Learning to Translate by Learning to Communicate to an Entailment Classifier
Authors: Szymon Rutkowski, Tomasz Korbak
Abstract:
We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning
Procedia PDF Downloads 1274647 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian
Authors: Sanja Seljan, Ivan Dunđer
Abstract:
The paper presents combined automatic speech recognition (ASR) for English and machine translation (MT) for English and Croatian in the domain of business correspondence. The first part presents results of training the ASR commercial system on two English data sets, enriched by error analysis. The second part presents results of machine translation performed by online tool Google Translate for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.Keywords: automatic machine translation, integrated language technologies, quality evaluation, speech recognition
Procedia PDF Downloads 4834646 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics
Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman
Abstract:
Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation
Procedia PDF Downloads 3574645 How Is a Machine-Translated Literary Text Organized in Coherence? An Analysis Based upon Theme-Rheme Structure
Abstract:
With the ultimate goal to automatically generate translated texts with high quality, machine translation has made tremendous improvements. However, its translations of literary works are still plagued with problems in coherence, esp. the translation between distant language pairs. One of the causes of the problems is probably the lack of linguistic knowledge to be incorporated into the training of machine translation systems. In order to enable readers to better understand the problems of machine translation in coherence, to seek out the potential knowledge to be incorporated, and thus to improve the quality of machine translation products, this study applies Theme-Rheme structure to examine how a machine-translated literary text is organized and developed in terms of coherence. Theme-Rheme structure in Systemic Functional Linguistics is a useful tool for analysis of textual coherence. Theme is the departure point of a clause and Rheme is the rest of the clause. In a text, as Themes and Rhemes may be connected with each other in meaning, they form thematic and rhematic progressions throughout the text. Based on this structure, we can look into how a text is organized and developed in terms of coherence. Methodologically, we chose Chinese and English as the language pair to be studied. Specifically, we built a comparable corpus with two modes of English translations, viz. machine translation (MT) and human translation (HT) of one Chinese literary source text. The translated texts were annotated with Themes, Rhemes and their progressions throughout the texts. The annotated texts were analyzed from two respects, the different types of Themes functioning differently in achieving coherence, and the different types of thematic and rhematic progressions functioning differently in constructing texts. By analyzing and contrasting the two modes of translations, it is found that compared with the HT, 1) the MT features “pseudo-coherence”, with lots of ill-connected fragments of information using “and”; 2) the MT system produces a static and less interconnected text that reads like a list; these two points, in turn, lead to the less coherent organization and development of the MT than that of the HT; 3) novel to traditional and previous studies, Rhemes do contribute to textual connection and coherence though less than Themes do and thus are worthy of notice in further studies. Hence, the findings suggest that Theme-Rheme structure be applied to measuring and assessing the coherence of machine translation, to being incorporated into the training of the machine translation system, and Rheme be taken into account when studying the textual coherence of both MT and HT.Keywords: coherence, corpus-based, literary translation, machine translation, Theme-Rheme structure
Procedia PDF Downloads 2054644 Degree in Translation and Years of Professional Experience: Predictors of Translation Quality
Authors: Mohsen Varzande
Abstract:
Translators’ professional and academic characteristics may directly influence their translation quality. The present study aimed at investigating whether translators’ degree in translation and years of professional experience predict their translation quality. Following a causal-comparative study, a sample of one hundred professional translators was selected using purposive sampling method. The participants were divided into two groups each containing individuals with and without a degree in translation, respectively. The participants were asked to translate a paragraph to assess their translation quality. For data analysis, appropriate statistical procedures including correlation and regression were used. Results showed that both degree in translation and years of professional experience significantly predict translation quality. Also, the interaction of translators’ years of professional experience and degree in translation significantly affect their translation quality. An implication could be that besides providing translators with academic knowledge and theories, practical training in translation is necessary as a prerequisite for a competent translator.Keywords: translation, degree in translation, translation quality, professional experience
Procedia PDF Downloads 4294643 Enhancing Word Meaning Retrieval Using FastText and Natural Language Processing Techniques
Authors: Sankalp Devanand, Prateek Agasimani, Shamith V. S., Rohith Neeraje
Abstract:
Machine translation has witnessed significant advancements in recent years, but the translation of languages with distinct linguistic characteristics, such as English and Sanskrit, remains a challenging task. This research presents the development of a dedicated English-to-Sanskrit machine translation model, aiming to bridge the linguistic and cultural gap between these two languages. Using a variety of natural language processing (NLP) approaches, including FastText embeddings, this research proposes a thorough method to improve word meaning retrieval. Data preparation, part-of-speech tagging, dictionary searches, and transliteration are all included in the methodology. The study also addresses the implementation of an interpreter pattern and uses a word similarity task to assess the quality of word embeddings. The experimental outcomes show how the suggested approach may be used to enhance word meaning retrieval tasks with greater efficacy, accuracy, and adaptability. Evaluation of the model's performance is conducted through rigorous testing, comparing its output against existing machine translation systems. The assessment includes quantitative metrics such as BLEU scores, METEOR scores, Jaccard Similarity, etc.Keywords: machine translation, English to Sanskrit, natural language processing, word meaning retrieval, fastText embeddings
Procedia PDF Downloads 434642 Overview of Resources and Tools to Bridge Language Barriers Provided by the European Union
Authors: Barbara Heinisch, Mikael Snaprud
Abstract:
A common, well understood language is crucial in critical situations like landing a plane. For e-Government solutions, a clear and common language is needed to allow users to successfully complete transactions online. Misunderstandings here may not risk a safe landing but can cause delays, resubmissions and drive costs. This holds also true for higher education, where misunderstandings can also arise due to inconsistent use of terminology. Thus, language barriers are a societal challenge that needs to be tackled. The major means to bridge language barriers is translation. However, achieving high-quality translation and making texts understandable and accessible require certain framework conditions. Therefore, the EU and individual projects take (strategic) actions. These actions include the identification, collection, processing, re-use and development of language resources. These language resources may be used for the development of machine translation systems and the provision of (public) services including higher education. This paper outlines some of the existing resources and indicate directions for further development to increase the quality and usage of these resources.Keywords: language resources, machine translation, terminology, translation
Procedia PDF Downloads 3184641 Machine Translation Analysis of Chinese Dish Names
Authors: Xinyu Zhang, Olga Torres-Hostench
Abstract:
This article presents a comparative study evaluating and comparing the quality of machine translation (MT) output of Chinese gastronomy nomenclature. Chinese gastronomic culture is experiencing an increased international acknowledgment nowadays. The nomenclature of Chinese gastronomy not only reflects a specific aspect of culture, but it is related to other areas of society such as philosophy, traditional medicine, etc. Chinese dish names are composed of several types of cultural references, such as ingredients, colors, flavors, culinary techniques, cooking utensils, toponyms, anthroponyms, metaphors, historical tales, among others. These cultural references act as one of the biggest difficulties in translation, in which the use of translation techniques is usually required. Regarding the lack of Chinese food-related translation studies, especially in Chinese-Spanish translation, and the current massive use of MT, the quality of the MT output of Chinese dish names is questioned. Fifty Chinese dish names with different types of cultural components were selected in order to complete this study. First, all of these dish names were translated by three different MT tools (Google Translate, Baidu Translate and Bing Translator). Second, a questionnaire was designed and completed by 12 Chinese online users (Chinese graduates of a Hispanic Philology major) in order to find out user preferences regarding the collected MT output. Finally, human translation techniques were observed and analyzed to identify what translation techniques would be observed more often in the preferred MT proposals. The result reveals that the MT output of the Chinese gastronomy nomenclature is not of high quality. It would be recommended not to trust the MT in occasions like restaurant menus, TV culinary shows, etc. However, the MT output could be used as an aid for tourists to have a general idea of a dish (the main ingredients, for example). Literal translation turned out to be the most observed technique, followed by borrowing, generalization and adaptation, while amplification, particularization and transposition were infrequently observed. Possibly because that the MT engines at present are limited to relate equivalent terms and offer literal translations without taking into account the whole context meaning of the dish name, which is essential to the application of those less observed techniques. This could give insight into the post-editing of the Chinese dish name translation. By observing and analyzing translation techniques in the proposals of the machine translators, the post-editors could better decide which techniques to apply in each case so as to correct mistakes and improve the quality of the translation.Keywords: Chinese dish names, cultural references, machine translation, translation techniques
Procedia PDF Downloads 136