Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 18

Search results for: machine translation

18 Machine Translation Analysis of Chinese Dish Names

Authors: Xinyu Zhang, Olga Torres-Hostench

Abstract:

This article presents a comparative study evaluating and comparing the quality of machine translation (MT) output of Chinese gastronomy nomenclature. Chinese gastronomic culture is experiencing an increased international acknowledgment nowadays. The nomenclature of Chinese gastronomy not only reflects a specific aspect of culture, but it is related to other areas of society such as philosophy, traditional medicine, etc. Chinese dish names are composed of several types of cultural references, such as ingredients, colors, flavors, culinary techniques, cooking utensils, toponyms, anthroponyms, metaphors, historical tales, among others. These cultural references act as one of the biggest difficulties in translation, in which the use of translation techniques is usually required. Regarding the lack of Chinese food-related translation studies, especially in Chinese-Spanish translation, and the current massive use of MT, the quality of the MT output of Chinese dish names is questioned. Fifty Chinese dish names with different types of cultural components were selected in order to complete this study. First, all of these dish names were translated by three different MT tools (Google Translate, Baidu Translate and Bing Translator). Second, a questionnaire was designed and completed by 12 Chinese online users (Chinese graduates of a Hispanic Philology major) in order to find out user preferences regarding the collected MT output. Finally, human translation techniques were observed and analyzed to identify what translation techniques would be observed more often in the preferred MT proposals. The result reveals that the MT output of the Chinese gastronomy nomenclature is not of high quality. It would be recommended not to trust the MT in occasions like restaurant menus, TV culinary shows, etc. However, the MT output could be used as an aid for tourists to have a general idea of a dish (the main ingredients, for example). Literal translation turned out to be the most observed technique, followed by borrowing, generalization and adaptation, while amplification, particularization and transposition were infrequently observed. Possibly because that the MT engines at present are limited to relate equivalent terms and offer literal translations without taking into account the whole context meaning of the dish name, which is essential to the application of those less observed techniques. This could give insight into the post-editing of the Chinese dish name translation. By observing and analyzing translation techniques in the proposals of the machine translators, the post-editors could better decide which techniques to apply in each case so as to correct mistakes and improve the quality of the translation.

Keywords: Chinese dish names, cultural references, machine translation, translation techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 372
17 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: Cross-language analysis, machine learning, machine translation, sentiment analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 881
16 Knowledge Required for Avoiding Lexical Errors at Machine Translation

Authors: Yukiko Sasaki Alam

Abstract:

This research aims at finding out the causes that led to wrong lexical selections in machine translation (MT) rather than categorizing lexical errors, which has been a main practice in error analysis. By manually examining and analyzing lexical errors outputted by a MT system, it suggests what knowledge would help the system reduce lexical errors.

Keywords: Error analysis, causes of errors, machine translation, outputs evaluation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 915
15 Study of Syntactic Errors for Deep Parsing at Machine Translation

Authors: Yukiko Sasaki Alam, Shahid Alam

Abstract:

Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.

Keywords: Machine translation, error analysis, syntactic errors, knowledge required for parsing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 635
14 Contextual Distribution for Textual Alignment

Authors: Yuri Bizzoni, Marianne Reboul

Abstract:

Our program compares French and Italian translations of Homer’s Odyssey, from the XVIth to the XXth century. We focus on the third point, showing how distributional semantics systems can be used both to improve alignment between different French translations as well as between the Greek text and a French translation. Although we focus on French examples, the techniques we display are completely language independent.

Keywords: Translation studies, machine translation, computational linguistics, distributional semantics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 467
13 Perception and Implementation of Machine Translation Applications by the Iranian English Translators

Authors: Abdul Amir Hazbavi

Abstract:

The present study is an attempt to provide a relatively comprehensive preview of the Iranian English translators’ perception on Machine Translation. Furthermore, the study tries to shed light on the status of implementation of Machine Translation among the Iranian English Translators. To reach the aforementioned objectives, the Localization Industry Standards Association’s questioner for measuring perceptions with regard to the adoption of a technology innovation was adapted and used to investigate the perception and implementation of Machine Translation applications by the Iranian English language translators. The participants of the study were 224 last-year undergraduate Iranian students of English translation at 10 universities across the country. The study revealed a very low level of adoption and a very high level of willingness to get familiar with and learn about Machine Translation, as well as a positive perception of and attitude toward Machine Translation by the Iranian English translators.

Keywords: Translation Technology, Machine Translation, Perception and Implementation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1880
12 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

Authors: Sanja Seljan, Ivan Dunđer

Abstract:

The paper presents combined automatic speech recognition (ASR) of English and machine translation (MT) for English and Croatian and Croatian-English language pairs in the domain of business correspondence. The first part presents results of training the ASR commercial system on English data sets, enriched by error analysis. The second part presents results of machine translation performed by free online tool for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Keywords: Automatic machine translation, integrated language technologies, quality evaluation, speech recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2384
11 Development of Circulating Support Environment of Multilingual Medical Communication using Parallel Texts for Foreign Patients

Authors: Mai Miyabe, Taku Fukushima, Takashi Yoshino, Aguri Shigeno

Abstract:

The need for multilingual communication in Japan has increased due to an increase in the number of foreigners in the country. When people communicate in their nonnative language, the differences in language prevent mutual understanding among the communicating individuals. In the medical field, communication between the hospital staff and patients is a serious problem. Currently, medical translators accompany patients to medical care facilities, and the demand for medical translators is increasing. However, medical translators cannot necessarily provide support, especially in cases in which round-the-clock support is required or in case of emergencies. The medical field has high expectations from information technology. Hence, a system that supports accurate multilingual communication is required. Despite recent advances in machine translation technology, it is very difficult to obtain highly accurate translations. We have developed a support system called M3 for multilingual medical reception. M3 provides support functions that aid foreign patients in the following respects: conversation, questionnaires, reception procedures, and hospital navigation; it also has a Q&A function. Users can operate M3 using a touch screen and receive text-based support. In addition, M3 uses accurate translation tools called parallel texts to facilitate reliable communication through conversations between the hospital staff and the patients. However, if there is no parallel text that expresses what users want to communicate, the users cannot communicate. In this study, we have developed a circulating support environment for multilingual medical communication using parallel texts. The proposed environment can circulate necessary parallel texts through the following procedure: (1) a user provides feedback about the necessary parallel texts, following which (2) these parallel texts are created and evaluated.

Keywords: multilingual medical communication, parallel texts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1174
10 Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Authors: Asif Ekbal, Sivaji Bandyopadhyay

Abstract:

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2785
9 A Thai to English Machine Translation System Using Thai LFG Tree Structure as Interlingua

Authors: Tawee Chimsuk, Surapong Auwatanamongkol

Abstract:

Machine Translation (MT) between the Thai and English languages has been a challenging research topic in natural language processing. Most research has been done on English to Thai machine translation, but not the other way around. This paper presents a Thai to English Machine Translation System that translates a Thai sentence into interlingua of a Thai LFG tree using LFG grammar and a bottom up parser. The Thai LFG tree is then transformed into the corresponding English LFG tree by pattern matching and node transformation. Finally, an equivalent English sentence is created using structural information prescribed by the English LFG tree. Based on results of experiments designed to evaluate the performance of the proposed system, it can be stated that the system has been proven to be effective in providing a useful translation from Thai to English.

Keywords: Interlingua, LFG grammar, Machine translation, Pattern matching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883
8 A New Model of English-Vietnamese Bilingual Information Retrieval System

Authors: Chinh Trong Nguyen, Dang Tuan Nguyen

Abstract:

In this paper, we propose a new model of English- Vietnamese bilingual Information Retrieval system. Although there are so many CLIR systems had been researched and built, the accuracy of searching results in different languages that the CLIR system supports still need to improve, especially in finding bilingual documents. The problems identified in this paper are the limitation of machine translation-s result and the extra large collections of document to be found. So we try to establish a different model to overcome these problems.

Keywords: Bilingual Information Retrieval, Cross-lingual Information Retrieval, Bilingual Web sites.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1278
7 Performance Analysis of MT Evaluation Measures and Test Suites

Authors: Yao Jian-Min, Lv Qiang, Zhang Jing

Abstract:

Many measures have been proposed for machine translation evaluation (MTE) while little research has been done on the performance of MTE methods. This paper is an effort for MTE performance analysis. A general frame is proposed for the description of the MTE measure and the test suite, including whether the automatic measure is consistent with human evaluation, whether different results from various measures or test suites are consistent, whether the content of the test suite is suitable for performance evaluation, the degree of difficulty of the test suite and its influence on the MTE, the relationship of MTE result significance and the size of the test suite, etc. For a better clarification of the frame, several experiment results are analyzed relating human evaluation, BLEU evaluation, and typological MTE. A visualization method is introduced for better presentation of the results. The study aims for aid in construction of test suite and method selection in MTE practice.

Keywords: Machine translation, natural language processing, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1204
6 Distributional Semantics Approach to Thai Word Sense Disambiguation

Authors: Sunee Pongpinigpinyo, Wanchai Rivepiboon

Abstract:

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely  /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.

Keywords: Distributional semantics, Latent Semantic Indexing, natural language processing, Polysemous words, unsupervisedlearning, Word Sense Disambiguation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1420
5 Estimating Word Translation Probabilities for Thai – English Machine Translation using EM Algorithm

Authors: Chutchada Nusai, Yoshimi Suzuki, Haruaki Yamazaki

Abstract:

Selecting the word translation from a set of target language words, one that conveys the correct sense of source word and makes more fluent target language output, is one of core problems in machine translation. In this paper we compare the 3 methods of estimating word translation probabilities for selecting the translation word in Thai – English Machine Translation. The 3 methods are (1) Method based on frequency of word translation, (2) Method based on collocation of word translation, and (3) Method based on Expectation Maximization (EM) algorithm. For evaluation we used Thai – English parallel sentences generated by NECTEC. The method based on EM algorithm is the best method in comparison to the other methods and gives the satisfying results.

Keywords: Machine translation, EM algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1310
4 Extracting Multiword Expressions in Machine Translation from English to Urdu using Relational Data Approach

Authors: Kashif Bilal, Uzair Muhammad, Atif Khan, M. Nasir Khan

Abstract:

Machine Translation, (hereafter in this document referred to as the "MT") faces a lot of complex problems from its origination. Extracting multiword expressions is also one of the complex problems in MT. Finding multiword expressions during translating a sentence from English into Urdu, through existing solutions, takes a lot of time and occupies system resources. We have designed a simple relational data approach, in which we simply set a bit in dictionary (database) for multiword, to find and handle multiword expression. This approach handles multiword efficiently.

Keywords: Machine Translation, Multiword Expressions, Urdulanguage processing, POS (stands for Parts of Speech) Tagging forUrdu, Expert Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1971
3 AGHAZ : An Expert System Based approach for the Translation of English to Urdu

Authors: Uzair Muhammad, Kashif Bilal, Atif Khan, M. Nasir Khan

Abstract:

Machine Translation (MT 3) of English text to its Urdu equivalent is a difficult challenge. Lot of attempts has been made, but a few limited solutions are provided till now. We present a direct approach, using an expert system to translate English text into its equivalent Urdu, using The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1) Range: 0600–06FF. The expert system works with a knowledge base that contains grammatical patterns of English and Urdu, as well as a tense and gender-aware dictionary of Urdu words (with their English equivalents).

Keywords: Machine Translation, Multiword Expressions, Urdulanguage processing, POS12 Tagging for Urdu, Expert Systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1934
2 Continuous Text Translation Using Text Modeling in the Thetos System

Authors: Nina Suszczanska, Przemyslaw Szmal, Slawomir Kulikow

Abstract:

In the paper a method of modeling text for Polish is discussed. The method is aimed at transforming continuous input text into a text consisting of sentences in so called canonical form, whose characteristic is, among others, a complete structure as well as no anaphora or ellipses. The transformation is lossless as to the content of text being transformed. The modeling method has been worked out for the needs of the Thetos system, which translates Polish written texts into the Polish sign language. We believe that the method can be also used in various applications that deal with the natural language, e.g. in a text summary generator for Polish.

Keywords: anaphora, machine translation, NLP, sign language, text syntax.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1280
1 Determining the Gender of Korean Names for Pronoun Generation

Authors: Seong-Bae Park, Hee-Geun Yoon

Abstract:

It is an important task in Korean-English machine translation to classify the gender of names correctly. When a sentence is composed of two or more clauses and only one subject is given as a proper noun, it is important to find the gender of the proper noun for correct translation of the sentence. This is because a singular pronoun has a gender in English while it does not in Korean. Thus, in Korean-English machine translation, the gender of a proper noun should be determined. More generally, this task can be expanded into the classification of the general Korean names. This paper proposes a statistical method for this problem. By considering a name as just a sequence of syllables, it is possible to get a statistics for each name from a collection of names. An evaluation of the proposed method yields the improvement in accuracy over the simple looking-up of the collection. While the accuracy of the looking-up method is 64.11%, that of the proposed method is 81.49%. This implies that the proposed method is more plausible for the gender classification of the Korean names.

Keywords: machine translation, natural language processing, gender of proper nouns, statistical method

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1972