Search results for: machine language
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6255

Search results for: machine language

6255 Development of a French to Yorùbá Machine Translation System

Authors: Benjamen Nathaniel, Eludiora Safiriyu Ijiyemi, Egume Oneme Lucky

Abstract:

A review on machine translation systems shows that a lot of computational artefacts has been carried out to translate written or spoken texts from a source language to Yorùbá language through Machine Translation systems. However, there are no work on French to Yorùbá language machine translation system; hence, the study investigated the process involved in the translation of French-to-Yorùbá language equivalent with the view to adopting a rule- based MT approach to build a Machine Translation framework from simple sentences administered through questionnaire. Articles and relevant textbooks were reviewed with key speakers of both languages interviewed to find out the processes involved in the translation of French language and their equivalent in Yorùbálanguage simple sentences using home domain terminologies. Achieving this, a model was formulated using phrase grammar structure, re-write rule, parse tree, automata theory- based techniques, designed and implemented respectively with unified modeling language (UML) and python programming language. Analysing the result, it was observed when carrying out the result that, the Machine Translation system performed 18.45% above Experimental Subject Respondent and 2.7% below Linguistics Expert when analysed with word orthography, sentence syntax and semantic correctness of the sentences. And, when compared with Google Machine Translation system, it was noticed that the developed system performed better on lexicons of the target language.

Keywords: machine translation (MT), rule-based, French language, Yoru`ba´ language

Procedia PDF Downloads 55
6254 Sentiment Analysis: Comparative Analysis of Multilingual Sentiment and Opinion Classification Techniques

Authors: Sannikumar Patel, Brian Nolan, Markus Hofmann, Philip Owende, Kunjan Patel

Abstract:

Sentiment analysis and opinion mining have become emerging topics of research in recent years but most of the work is focused on data in the English language. A comprehensive research and analysis are essential which considers multiple languages, machine translation techniques, and different classifiers. This paper presents, a comparative analysis of different approaches for multilingual sentiment analysis. These approaches are divided into two parts: one using classification of text without language translation and second using the translation of testing data to a target language, such as English, before classification. The presented research and results are useful for understanding whether machine translation should be used for multilingual sentiment analysis or building language specific sentiment classification systems is a better approach. The effects of language translation techniques, features, and accuracy of various classifiers for multilingual sentiment analysis is also discussed in this study.

Keywords: cross-language analysis, machine learning, machine translation, sentiment analysis

Procedia PDF Downloads 700
6253 Direct Translation vs. Pivot Language Translation for Persian-Spanish Low-Resourced Statistical Machine Translation System

Authors: Benyamin Ahmadnia, Javier Serrano

Abstract:

In this paper we compare two different approaches for translating from Persian to Spanish, as a language pair with scarce parallel corpus. The first approach involves direct transfer using an statistical machine translation system, which is available for this language pair. The second approach involves translation through English, as a pivot language, which has more translation resources and more advanced translation systems available. The results show that, it is possible to achieve better translation quality using English as a pivot language in either approach outperforms direct translation from Persian to Spanish. Our best result is the pivot system which scores higher than direct translation by (1.12) BLEU points.

Keywords: statistical machine translation, direct translation approach, pivot language translation approach, parallel corpus

Procedia PDF Downloads 476
6252 Improving Machine Learning Translation of Hausa Using Named Entity Recognition

Authors: Aishatu Ibrahim Birma, Aminu Tukur, Abdulkarim Abbass Gora

Abstract:

Machine translation plays a vital role in the Field of Natural Language Processing (NLP), breaking down language barriers and enabling communication across diverse communities. In the context of Hausa, a widely spoken language in West Africa, mainly in Nigeria, effective translation systems are essential for enabling seamless communication and promoting cultural exchange. However, due to the unique linguistic characteristics of Hausa, accurate translation remains a challenging task. The research proposes an approach to improving the machine learning translation of Hausa by integrating Named Entity Recognition (NER) techniques. Named entities, such as person names, locations, organizations, and dates, are critical components of a language's structure and meaning. Incorporating NER into the translation process can enhance the quality and accuracy of translations by preserving the integrity of named entities and also maintaining consistency in translating entities (e.g., proper names), and addressing the cultural references specific to Hausa. The NER will be incorporated into Neural Machine Translation (NMT) for the Hausa to English Translation.

Keywords: machine translation, natural language processing (NLP), named entity recognition (NER), neural machine translation (NMT)

Procedia PDF Downloads 26
6251 Corpus-Based Neural Machine Translation: Empirical Study Multilingual Corpus for Machine Translation of Opaque Idioms - Cloud AutoML Platform

Authors: Khadija Refouh

Abstract:

Culture bound-expressions have been a bottleneck for Natural Language Processing (NLP) and comprehension, especially in the case of machine translation (MT). In the last decade, the field of machine translation has greatly advanced. Neural machine translation NMT has recently achieved considerable development in the quality of translation that outperformed previous traditional translation systems in many language pairs. Neural machine translation NMT is an Artificial Intelligence AI and deep neural networks applied to language processing. Despite this development, there remain some serious challenges that face neural machine translation NMT when translating culture bounded-expressions, especially for low resources language pairs such as Arabic-English and Arabic-French, which is not the case with well-established language pairs such as English-French. Machine translation of opaque idioms from English into French are likely to be more accurate than translating them from English into Arabic. For example, Google Translate Application translated the sentence “What a bad weather! It runs cats and dogs.” to “يا له من طقس سيء! تمطر القطط والكلاب” into the target language Arabic which is an inaccurate literal translation. The translation of the same sentence into the target language French was “Quel mauvais temps! Il pleut des cordes.” where Google Translate Application used the accurate French corresponding idioms. This paper aims to perform NMT experiments towards better translation of opaque idioms using high quality clean multilingual corpus. This Corpus will be collected analytically from human generated idiom translation. AutoML translation, a Google Neural Machine Translation Platform, is used as a custom translation model to improve the translation of opaque idioms. The automatic evaluation of the custom model will be compared to the Google NMT using Bilingual Evaluation Understudy Score BLEU. BLEU is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Human evaluation is integrated to test the reliability of the Blue Score. The researcher will examine syntactical, lexical, and semantic features using Halliday's functional theory.

Keywords: multilingual corpora, natural language processing (NLP), neural machine translation (NMT), opaque idioms

Procedia PDF Downloads 134
6250 Possibilities and Challenges of Using Machine Translation in Foreign Language Education

Authors: Miho Yamashita

Abstract:

In recent years, there have been attempts to introduce Machine Translation (MT) into foreign language teaching, especially in writing instructions. This is because the performance of neural machine translation has improved dramatically since 2016, and some university instructors started to introduce MT translations to their students as a "good model" to learn from. However, MT is still not perfect, and there are many incorrect translations. In order to translate the intended text into a foreign language, it is necessary to edit the original manuscript written in the native language (pre-edit) and revise the translated foreign language text (post-edit). The latter is considered especially difficult for users without a high proficiency level of foreign language. Therefore, the author allowed her students to use MT in her writing class in one of the private universities in Japan and investigated 1) how groups of students with different English proficiency levels revised MT translations when translating Japanese manuscripts into English and 2) whether the post-edit process differed when the students revised alone or in pairs. The results showed that in 1), certain non-post-edited grammatical errors were found regardless of their proficiency levels, indicating the need for teacher intervention, and in 2), more appropriate corrections were found in pairs, and their frequent use of a dictionary was also observed. In this presentation, the author will discuss how MT writing instruction can be integrated effectively in an aim to achieve multimodal foreign language education.

Keywords: machine translation, writing instruction, pre-edit, post-edit

Procedia PDF Downloads 51
6249 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms

Authors: Rikson Gultom

Abstract:

Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.

Keywords: abusive language, hate speech, machine learning, optimization, social media

Procedia PDF Downloads 118
6248 A Supervised Approach for Word Sense Disambiguation Based on Arabic Diacritics

Authors: Alaa Alrakaf, Sk. Md. Mizanur Rahman

Abstract:

Since the last two decades’ Arabic natural language processing (ANLP) has become increasingly much more important. One of the key issues related to ANLP is ambiguity. In Arabic language different pronunciation of one word may have a different meaning. Furthermore, ambiguity also has an impact on the effectiveness and efficiency of Machine Translation (MT). The issue of ambiguity has limited the usefulness and accuracy of the translation from Arabic to English. The lack of Arabic resources makes ambiguity problem more complicated. Additionally, the orthographic level of representation cannot specify the exact meaning of the word. This paper looked at the diacritics of Arabic language and used them to disambiguate a word. The proposed approach of word sense disambiguation used Diacritizer application to Diacritize Arabic text then found the most accurate sense of an ambiguous word using Naïve Bayes Classifier. Our Experimental study proves that using Arabic Diacritics with Naïve Bayes Classifier enhances the accuracy of choosing the appropriate sense by 23% and also decreases the ambiguity in machine translation.

Keywords: Arabic natural language processing, machine learning, machine translation, Naive bayes classifier, word sense disambiguation

Procedia PDF Downloads 345
6247 Overview of Resources and Tools to Bridge Language Barriers Provided by the European Union

Authors: Barbara Heinisch, Mikael Snaprud

Abstract:

A common, well understood language is crucial in critical situations like landing a plane. For e-Government solutions, a clear and common language is needed to allow users to successfully complete transactions online. Misunderstandings here may not risk a safe landing but can cause delays, resubmissions and drive costs. This holds also true for higher education, where misunderstandings can also arise due to inconsistent use of terminology. Thus, language barriers are a societal challenge that needs to be tackled. The major means to bridge language barriers is translation. However, achieving high-quality translation and making texts understandable and accessible require certain framework conditions. Therefore, the EU and individual projects take (strategic) actions. These actions include the identification, collection, processing, re-use and development of language resources. These language resources may be used for the development of machine translation systems and the provision of (public) services including higher education. This paper outlines some of the existing resources and indicate directions for further development to increase the quality and usage of these resources.

Keywords: language resources, machine translation, terminology, translation

Procedia PDF Downloads 308
6246 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 153
6245 Automated Detection of Women Dehumanization in English Text

Authors: Maha Wiss, Wael Khreich

Abstract:

Animals, objects, foods, plants, and other non-human terms are commonly used as a source of metaphors to describe females in formal and slang language. Comparing women to non-human items not only reflects cultural views that might conceptualize women as subordinates or in a lower position than humans, yet it conveys this degradation to the listeners. Moreover, the dehumanizing representation of females in the language normalizes the derogation and even encourages sexism and aggressiveness against women. Although dehumanization has been a popular research topic for decades, according to our knowledge, no studies have linked women's dehumanizing language to the machine learning field. Therefore, we introduce our research work as one of the first attempts to create a tool for the automated detection of the dehumanizing depiction of females in English texts. We also present the first labeled dataset on the charted topic, which is used for training supervised machine learning algorithms to build an accurate classification model. The importance of this work is that it accomplishes the first step toward mitigating dehumanizing language against females.

Keywords: gender bias, machine learning, NLP, women dehumanization

Procedia PDF Downloads 70
6244 English Grammatical Errors of Arabic Sentence Translations Done by Machine Translations

Authors: Muhammad Fathurridho

Abstract:

Grammar as a rule used by every language to be understood by everyone is always related to syntax and morphology. Arabic grammar is different with another languages’ grammars. It has more rules and difficulties. This paper aims to investigate and describe the English grammatical errors of machine translation systems in translating Arabic sentences, including declarative, exclamation, imperative, and interrogative sentences, specifically in year 2018 which can be supported with artificial intelligence’s role. The Arabic sample sentences which are divided into two; verbal and nominal sentence of several Arabic published texts will be examined as the source language samples. The translated sentences done by several popular online machine translation systems, including Google Translate, Microsoft Bing, Babylon, Facebook, Hellotalk, Worldlingo, Yandex Translate, and Tradukka Translate are the material objects of this research. Descriptive method that will be taken to finish this research will show the grammatical errors of English target language, and classify them. The conclusion of this paper has showed that the grammatical errors of machine translation results are varied and generally classified into morphological, syntactical, and semantic errors in all type of Arabic words (Noun, Verb, and Particle), and it will be one of the evaluations for machine translation’s providers to correct them in order to improve their understandable results.

Keywords: Arabic, Arabic-English translation, machine translation, grammatical errors

Procedia PDF Downloads 144
6243 Literacy in First and Second Language: Implication for Language Education

Authors: Inuwa Danladi Bawa

Abstract:

One of the challenges of African states in the development of education in the past and the present is the problem of literacy. Literacy in the first language is seen as a strong base for the development of second language; they are mostly the language of education. Language development is an offshoot of language planning; so the need to develop literacy in both first and second language affects language education and predicts the extent of achievement of the entire education sector. The need to balance literacy acquisition in first language for good conditioning the acquisition of second language is paramount. Likely constraints that includes; non-standardization, underdeveloped and undeveloped first languages are among many. Solutions to some of these include the development of materials and use of the stages and levels of literacy acquisition. This is with believed that a child writes well in second language if he has literacy in the first language.

Keywords: first language, second language, literacy, english language, linguistics

Procedia PDF Downloads 423
6242 How Unicode Glyphs Revolutionized the Way We Communicate

Authors: Levi Corallo

Abstract:

Typed language made by humans on computers and cell phones has made a significant distinction from previous modes of written language exchanges. While acronyms remain one of the most predominant markings of typed language, another and perhaps more recent revolution in the way humans communicate has been with the use of symbols or glyphs, primarily Emojis—globally introduced on the iPhone keyboard by Apple in 2008. This paper seeks to analyze the use of symbols in typed communication from both a linguistic and machine learning perspective. The Unicode system will be explored and methods of encoding will be juxtaposed with the current machine and human perception. Topics in how typed symbol usage exists in conversation will be explored as well as topics across current research methods dealing with Emojis like sentiment analysis, predictive text models, and so on. This study proposes that sequential analysis is a significant feature for analyzing unicode characters in a corpus with machine learning. Current models that are trying to learn or translate the meaning of Emojis should be starting to learn using bi- and tri-grams of Emoji, as well as observing the relationship between combinations of different Emoji in tandem. The sociolinguistics of an entire new vernacular of language referred to here as ‘typed language’ will also be delineated across my analysis with unicode glyphs from both a semantic and technical perspective.

Keywords: unicode, text symbols, emojis, glyphs, communication

Procedia PDF Downloads 186
6241 A Pilot Study to Investigate the Use of Machine Translation Post-Editing Training for Foreign Language Learning

Authors: Hong Zhang

Abstract:

The main purpose of this study is to show that machine translation (MT) post-editing (PE) training can help our Chinese students learn Spanish as a second language. Our hypothesis is that they might make better use of it by learning PE skills specific for foreign language learning. We have developed PE training materials based on the data collected in a previous study. Training material included the special error types of the output of MT and the error types that our Chinese students studying Spanish could not detect in the experiment last year. This year we performed a pilot study in order to evaluate the PE training materials effectiveness and to what extent PE training helps Chinese students who study the Spanish language. We used screen recording to record these moments and made note of every action done by the students. Participants were speakers of Chinese with intermediate knowledge of Spanish. They were divided into two groups: Group A performed PE training and Group B did not. We prepared a Chinese text for both groups, and participants translated it by themselves (human translation), and then used Google Translate to translate the text and asked them to post-edit the raw MT output. Comparing the results of PE test, Group A could identify and correct the errors faster than Group B students, Group A did especially better in omission, word order, part of speech, terminology, mistranslation, official names, and formal register. From the results of this study, we can see that PE training can help Chinese students learn Spanish as a second language. In the future, we could focus on the students’ struggles during their Spanish studies and complete the PE training materials to teach Chinese students learning Spanish with machine translation.

Keywords: machine translation, post-editing, post-editing training, Chinese, Spanish, foreign language learning

Procedia PDF Downloads 137
6240 Transportation Language Register as One of Language Community

Authors: Diyah Atiek Mustikawati

Abstract:

Language register refers to a variety of a language used for particular purpose or in a particular social setting. Language register also means as a concept of adapting one’s use of language to conform to standards or tradition in a given professional or social situation. This descriptive study tends to discuss about the form of language register in transportation aspect, factors, also the function of use it. Mostly, language register in transportation aspect uses short sentences in form of informal register. The factor caused language register used are speaker, word choice, background of language. The functions of language register in transportations aspect are to make communication between crew easily, also to keep safety when they were in bad condition. Transportation language register developed naturally as one of variety of language used.

Keywords: language register, language variety, communication, transportation

Procedia PDF Downloads 463
6239 Technique and Use of Machine Readable Dictionary: In Special Reference to Hindi-Marathi Machine Translation

Authors: Milind Patil

Abstract:

Present paper is a discussion on Hindi-Marathi Morphological Analysis and generating rules for Machine Translation on the basis of Machine Readable Dictionary (MRD). This used Transformative Generative Grammar (TGG) rules to design the MRD. As per TGG rules, the suffix of a particular root word is based on its Tense, Aspect, Modality and Voice. That's why the suffix is very important for the word meanings (or root meanings). The Hindi and Marathi Language both have relation with Indo-Aryan language family. Both have been derived from Sanskrit language and their script is 'Devnagari'. But there are lots of differences in terms of semantics and grammatical level too. In Marathi, there are three genders, but in Hindi only two (Masculine and Feminine), the Natural gender is absent in Hindi. Likewise other grammatical categories also differ in their level of use. For MRD the suffixes (or Morpheme) are of particular root word for GNP (Gender, Number and Person) are based on its natural phenomena. A particular Suffix and Morphine change as per the need of person, number and gender. The design of MRD also based on this format. In first, Person, Number, Gender and Tense are key points than root words and suffix of particular Person, Number Gender (PNG). After that the inferences are drawn on the basis of rules that is (V.stem) (Pre.T/Past.T) (x) + (Aux-Pre.T) (x) → (V.Stem.) + (SP.TM) (X).

Keywords: MRD, TGG, stem, morph, morpheme, suffix, PNG, TAM&V, root

Procedia PDF Downloads 312
6238 A Text Classification Approach Based on Natural Language Processing and Machine Learning Techniques

Authors: Rim Messaoudi, Nogaye-Gueye Gning, François Azelart

Abstract:

Automatic text classification applies mostly natural language processing (NLP) and other AI-guided techniques to automatically classify text in a faster and more accurate manner. This paper discusses the subject of using predictive maintenance to manage incident tickets inside the sociality. It focuses on proposing a tool that treats and analyses comments and notes written by administrators after resolving an incident ticket. The goal here is to increase the quality of these comments. Additionally, this tool is based on NLP and machine learning techniques to realize the textual analytics of the extracted data. This approach was tested using real data taken from the French National Railways (SNCF) company and was given a high-quality result.

Keywords: machine learning, text classification, NLP techniques, semantic representation

Procedia PDF Downloads 84
6237 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian

Authors: Sanja Seljan, Ivan Dunđer

Abstract:

The paper presents combined automatic speech recognition (ASR) for English and machine translation (MT) for English and Croatian in the domain of business correspondence. The first part presents results of training the ASR commercial system on two English data sets, enriched by error analysis. The second part presents results of machine translation performed by online tool Google Translate for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.

Keywords: automatic machine translation, integrated language technologies, quality evaluation, speech recognition

Procedia PDF Downloads 471
6236 Enhancing Word Meaning Retrieval Using FastText and Natural Language Processing Techniques

Authors: Sankalp Devanand, Prateek Agasimani, Shamith V. S., Rohith Neeraje

Abstract:

Machine translation has witnessed significant advancements in recent years, but the translation of languages with distinct linguistic characteristics, such as English and Sanskrit, remains a challenging task. This research presents the development of a dedicated English-to-Sanskrit machine translation model, aiming to bridge the linguistic and cultural gap between these two languages. Using a variety of natural language processing (NLP) approaches, including FastText embeddings, this research proposes a thorough method to improve word meaning retrieval. Data preparation, part-of-speech tagging, dictionary searches, and transliteration are all included in the methodology. The study also addresses the implementation of an interpreter pattern and uses a word similarity task to assess the quality of word embeddings. The experimental outcomes show how the suggested approach may be used to enhance word meaning retrieval tasks with greater efficacy, accuracy, and adaptability. Evaluation of the model's performance is conducted through rigorous testing, comparing its output against existing machine translation systems. The assessment includes quantitative metrics such as BLEU scores, METEOR scores, Jaccard Similarity, etc.

Keywords: machine translation, English to Sanskrit, natural language processing, word meaning retrieval, fastText embeddings

Procedia PDF Downloads 30
6235 Study of Syntactic Errors for Deep Parsing at Machine Translation

Authors: Yukiko Sasaki Alam, Shahid Alam

Abstract:

Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.

Keywords: syntactic parsing, error analysis, machine translation, deep parsing

Procedia PDF Downloads 539
6234 Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms

Authors: Aqsa Ali, Aleem Mushtaq, Attaullah Memon, Monna

Abstract:

In this paper, we present a low cost design for a smart glove that can perform sign language recognition to assist the speech impaired people. Specifically, we have designed and developed an Assistive Hand Gesture Interpreter that recognizes hand movements relevant to the American Sign Language (ASL) and translates them into text for display on a Thin-Film-Transistor Liquid Crystal Display (TFT LCD) screen as well as synthetic speech. Linear Bayes Classifiers and Multilayer Neural Networks have been used to classify 11 feature vectors obtained from the sensors on the glove into one of the 27 ASL alphabets and a predefined gesture for space. Three types of features are used; bending using six bend sensors, orientation in three dimensions using accelerometers and contacts at vital points using contact sensors. To gauge the performance of the presented design, the training database was prepared using five volunteers. The accuracy of the current version on the prepared dataset was found to be up to 99.3% for target user. The solution combines electronics, e-textile technology, sensor technology, embedded system and machine learning techniques to build a low cost wearable glove that is scrupulous, elegant and portable.

Keywords: American sign language, assistive hand gesture interpreter, human-machine interface, machine learning, sensing glove

Procedia PDF Downloads 283
6233 A Novel Combined Finger Counting and Finite State Machine Technique for ASL Translation Using Kinect

Authors: Rania Ahmed Kadry Abdel Gawad Birry, Mohamed El-Habrouk

Abstract:

This paper presents a brief survey of the techniques used for sign language recognition along with the types of sensors used to perform the task. It presents a modified method for identification of an isolated sign language gesture using Microsoft Kinect with the OpenNI framework. It presents the way of extracting robust features from the depth image provided by Microsoft Kinect and the OpenNI interface and to use them in creating a robust and accurate gesture recognition system, for the purpose of ASL translation. The Prime Sense’s Natural Interaction Technology for End-user - NITE™ - was also used in the C++ implementation of the system. The algorithm presents a simple finger counting algorithm for static signs as well as directional Finite State Machine (FSM) description of the hand motion in order to help in translating a sign language gesture. This includes both letters and numbers performed by a user, which in-turn may be used as an input for voice pronunciation systems.

Keywords: American sign language, finger counting, hand tracking, Microsoft Kinect

Procedia PDF Downloads 284
6232 Predictive Analysis of Chest X-rays Using NLP and Large Language Models with the Indiana University Dataset and Random Forest Classifier

Authors: Azita Ramezani, Ghazal Mashhadiagha, Bahareh Sanabakhsh

Abstract:

This study researches the combination of Random. Forest classifiers with large language models (LLMs) and natural language processing (NLP) to improve diagnostic accuracy in chest X-ray analysis using the Indiana University dataset. Utilizing advanced NLP techniques, the research preprocesses textual data from radiological reports to extract key features, which are then merged with image-derived data. This improved dataset is analyzed with Random Forest classifiers to predict specific clinical results, focusing on the identification of health issues and the estimation of case urgency. The findings reveal that the combination of NLP, LLMs, and machine learning not only increases diagnostic precision but also reliability, especially in quickly identifying critical conditions. Achieving an accuracy of 99.35%, the model shows significant advancements over conventional diagnostic techniques. The results emphasize the large potential of machine learning in medical imaging, suggesting that these technologies could greatly enhance clinician judgment and patient outcomes by offering quicker and more precise diagnostic approximations.

Keywords: natural language processing (NLP), large language models (LLMs), random forest classifier, chest x-ray analysis, medical imaging, diagnostic accuracy, indiana university dataset, machine learning in healthcare, predictive modeling, clinical decision support systems

Procedia PDF Downloads 27
6231 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari

Abstract:

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Keywords: Arabic language acquisition and learning, natural language processing, morphological analyzer, part-of-speech

Procedia PDF Downloads 142
6230 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 99
6229 Neural Machine Translation for Low-Resource African Languages: Benchmarking State-of-the-Art Transformer for Wolof

Authors: Cheikh Bamba Dione, Alla Lo, Elhadji Mamadou Nguer, Siley O. Ba

Abstract:

In this paper, we propose two neural machine translation (NMT) systems (French-to-Wolof and Wolof-to-French) based on sequence-to-sequence with attention and transformer architectures. We trained our models on a parallel French-Wolof corpus of about 83k sentence pairs. Because of the low-resource setting, we experimented with advanced methods for handling data sparsity, including subword segmentation, back translation, and the copied corpus method. We evaluate the models using the BLEU score and find that transformer outperforms the classic seq2seq model in all settings, in addition to being less sensitive to noise. In general, the best scores are achieved when training the models on word-level-based units. For subword-level models, using back translation proves to be slightly beneficial in low-resource (WO) to high-resource (FR) language translation for the transformer (but not for the seq2seq) models. A slight improvement can also be observed when injecting copied monolingual text in the target language. Moreover, combining the copied method data with back translation leads to a substantial improvement of the translation quality.

Keywords: backtranslation, low-resource language, neural machine translation, sequence-to-sequence, transformer, Wolof

Procedia PDF Downloads 136
6228 Learning to Translate by Learning to Communicate to an Entailment Classifier

Authors: Szymon Rutkowski, Tomasz Korbak

Abstract:

We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.

Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning

Procedia PDF Downloads 117
6227 Enhancing Code Security with AI-Powered Vulnerability Detection

Authors: Zzibu, Mark, Brian

Abstract:

As software systems become increasingly complex, ensuring code security is a growing concern. Traditional vulnerability detection methods often rely on manual code reviews or static analysis tools, which can be time-consuming and prone to errors. This paper presents a distinct approach to enhancing code security by leveraging artificial intelligence (AI) and machine learning (ML) techniques. Our proposed system utilizes a combination of natural language processing (NLP) and deep learning algorithms to identify and classify vulnerabilities in real-world codebases. By analyzing vast amounts of open-source code data, our AI-powered tool learns to recognize patterns and anomalies indicative of security weaknesses. We evaluated our system on a dataset of over 10,000 open-source projects, achieving an accuracy rate of 92% in detecting known vulnerabilities. Furthermore, our tool identified previously unknown vulnerabilities in popular libraries and frameworks, demonstrating its potential for improving software security.

Keywords: AI, machine language, cord security, machine leaning

Procedia PDF Downloads 0
6226 Enhancing English Language Learning through Learners Cultural Background

Authors: A. Attahiru, Rabi Abdullahi Danjuma, Fatima Bint

Abstract:

Language and culture are two concepts which are closely related that one affects the other. This paper attempts to examine the definition of language and culture by discussing the relationship between them. The paper further presents some instructional strategies for the teaching of language and culture as well as the influence of culture on language. It also looks at its implication to language education and finally some recommendation and conclusion were drawn.

Keywords: culture, language, relationship, strategies, teaching

Procedia PDF Downloads 396