Search results for: Nostratic languages
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 196

Search results for: Nostratic languages

46 Semi-Automatic Analyzer to Detect Authorial Intentions in Scientific Documents

Authors: Kanso Hassan, Elhore Ali, Soule-dupuy Chantal, Tazi Said

Abstract:

Information Retrieval has the objective of studying models and the realization of systems allowing a user to find the relevant documents adapted to his need of information. The information search is a problem which remains difficult because the difficulty in the representing and to treat the natural languages such as polysemia. Intentional Structures promise to be a new paradigm to extend the existing documents structures and to enhance the different phases of documents process such as creation, editing, search and retrieval. The intention recognition of the author-s of texts can reduce the largeness of this problem. In this article, we present intentions recognition system is based on a semi-automatic method of extraction the intentional information starting from a corpus of text. This system is also able to update the ontology of intentions for the enrichment of the knowledge base containing all possible intentions of a domain. This approach uses the construction of a semi-formal ontology which considered as the conceptualization of the intentional information contained in a text. An experiments on scientific publications in the field of computer science was considered to validate this approach.

Keywords: Information research, text analyzes, intentionalstructure, segmentation, ontology, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1601
45 The Contribution of Translation to Arabic and Islamic Civilization during the Golden Age: 661-1258

Authors: Smail Hadj Mahammed

Abstract:

Translation is not merely a process of conveying the meaning from one particular language into another to overcome language barriers and ensure a good understanding; it is also a work of civilization and progress. Without the translation of Greek, Indian and Persian works, Arabic and Islamic Civilization would not have taken off, and without the translations of Arabic works into Latin, and then into European languages, the scientific and technological revolution of the modern world would not have taken place. In this context, the present paper seeks to investigate how the translation movement contributed to the Arabic and Islamic Civilizations during the Golden Age. The paper consists of three major parts: the first part provides a brief historical overview of the translation movement during the golden age, which witnessed two important eras: the Umayyad and Abbasid eras. The second part shows the main reasons why translation was a prominent cultural activity during the Golden Age and why it gained great interest from the Arabs. The last part highlights the constructive contribution of translation to the Arabic and Islamic Civilization during the period (661–1258). The results demonstrate that Arabic translation movement during the Golden Age had significantly assisted in enriching the Arabic and Islamic civilizations considering the major and important scientific works of old Greek, Indian and Persian civilizations which had been absorbed.

Keywords: Arabic and Islamic civilization, contribution, golden age, translation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 136
44 Object-Oriented Programming Strategies in C# for Power Conscious System

Authors: Kayun Chantarasathaporn, Chonawat Srisa-an

Abstract:

Low power consumption is a major constraint for battery-powered system like computer notebook or PDA. In the past, specialists usually designed both specific optimized equipments and codes to relief this concern. Doing like this could work for quite a long time, however, in this era, there is another significant restraint, the time to market. To be able to serve along the power constraint while can launch products in shorter production period, objectoriented programming (OOP) has stepped in to this field. Though everyone knows that OOP has quite much more overhead than assembly and procedural languages, development trend still heads to this new world, which contradicts with the target of low power consumption. Most of the prior power related software researches reported that OOP consumed much resource, however, as industry had to accept it due to business reasons, up to now, no papers yet had mentioned about how to choose the best OOP practice in this power limited boundary. This article is the pioneer that tries to specify and propose the optimized strategy in writing OOP software under energy concerned environment, based on quantitative real results. The language chosen for studying is C# based on .NET Framework 2.0 which is one of the trendy OOP development environments. The recommendation gotten from this research would be a good roadmap that can help developers in coding that well balances between time to market and time of battery.

Keywords: Low power consumption, object oriented programming, power conscious system, software.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1870
43 Porul: Option Generation and Selection and Scoring Algorithms for a Tamil Flash Card Game

Authors: Anitha Narasimhan, Aarthy Anandan, Madhan Karky, C. N. Subalalitha

Abstract:

Games can be the excellent tools for teaching a language. There are few e-learning games in Indian languages like word scrabble, cross word, quiz games etc., which were developed mainly for educational purposes. This paper proposes a Tamil word game called, “Porul”, which focuses on education as well as on players’ thinking and decision-making skills. Porul is a multiple choice based quiz game, in which the players attempt to answer questions correctly from the given multiple options that are generated using a unique algorithm called the Option Selection algorithm which explores the semantics of the question in various dimensions namely, synonym, rhyme and Universal Networking Language semantic category. This kind of semantic exploration of the question not only increases the complexity of the game but also makes it more interesting. The paper also proposes a Scoring Algorithm which allots a score based on the popularity score of the question word. The proposed game has been tested using 20,000 Tamil words.

Keywords: Porul game, Tamil word game, option selection, flash card, scoring, algorithm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1118
42 Interactive Chinese Character Learning System though Pictograph Evolution

Authors: J.H. Low, C.O. Wong, E.J. Han, K.R Kim K.C. Jung, H.K. Yang

Abstract:

This paper proposes an Interactive Chinese Character Learning System (ICCLS) based on pictorial evolution as an edutainment concept in computer-based learning of language. The advantage of the language origination itself is taken as a learning platform due to the complexity in Chinese language as compared to other types of languages. Users especially children enjoy more by utilize this learning system because they are able to memories the Chinese Character easily and understand more of the origin of the Chinese character under pleasurable learning environment, compares to traditional approach which children need to rote learning Chinese Character under un-pleasurable environment. Skeletonization is used as the representation of Chinese character and object with an animated pictograph evolution to facilitate the learning of the language. Shortest skeleton path matching technique is employed for fast and accurate matching in our implementation. User is required to either write a word or draw a simple 2D object in the input panel and the matched word and object will be displayed as well as the pictograph evolution to instill learning. The target of computer-based learning system is for pre-school children between 4 to 6 years old to learn Chinese characters in a flexible and entertaining manner besides utilizing visual and mind mapping strategy as learning methodology.

Keywords: Computer-based learning, Chinese character, pictograph evolution, skeletonization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1865
41 The Development and Future of Hong Kong Typography

Authors: Amic G. Ho

Abstract:

Language usage and typography in Hong Kong are unique, as can be seen clearly on the streets of the city. In contrast to many other parts of the world, where there is only one language, in Hong Kong many signs and billboards display two languages: Chinese and English. The language usage on signage, fonts and types used, and the designs in magazines and advertisements all demonstrate the unique features of Hong Kong typographic design, which reflect the multicultural nature of Hong Kong society. This study is the first step in investigating the nature and development of Hong Kong typography. The preliminary research explored how the historical development of Hong Kong is reflected in its unique typography. Following a review of historical development, a quantitative study was designed: Local Hong Kong participants were invited to provide input on what makes the Hong Kong typographic style unique. Their input was collected and analyzed. This provided us with information about the characteristic criteria and features of Hong Kong typography, as recognized by the local people. The most significant typographic designs in Hong Kong were then investigated and the influence of Chinese and other cultures on Hong Kong typography was assessed. The research results provide an indication to local designers on how they can strengthen local design outcomes and promote the values and culture of their mother town.

Keywords: Typography, Hong Kong, historical developments, multiple cultures.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1531
40 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes

Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari

Abstract:

The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.

Keywords: Arabic Language acquisition and learning, natural language processing, morphological analyzer, part-of-speech.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 953
39 Understanding Barriers to Sports Participation as a Means of Achieving Sustainable Development in Michael Otedola College of Primary Education

Authors: Osifeko Olalekan Remigious, Osifeko Christiana Osikorede, Folarin Bolanle Eunice, Olugbenga Adebola Shodiya

Abstract:

During these difficult economic times, nations are looking for ways to improve their finances, preserve the environment as well as the socio-political climate and educational institutions, which are needed to increase their economy and preserve their sustainable development. Sport is one of the ways through which sustainable development can be achieved. The purpose of this study was to examine and understanding barriers to participation in sport. A total of 1,025 students were purposively selected from five schools (School of Arts and Social Sciences, School of Languages, School of Education, School of Sciences and School of Vocational and Technical Education) in Michael Otedola College of Primary Education (MOCPED). A questionnaire, with a tested reliability coefficient of 0.71, was used for data collection. The collected data were subjected to the descriptive survey research design. The findings showed that sports facilities, funding and lecture schedules were significant barriers to sports participation. It was recommended that sports facilities be provided by the Lagos State government.

Keywords: MOCPED sports, sustainable development, sports participation, state government.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 770
38 Particular Features of the First Romanian Multilingual Dictionaries

Authors: Mihaela Mocanu

Abstract:

The Romanian multilingual dictionaries – also named polyglot, plurilingual or polylingual dictionaries, have known a slow yet constant development starting with the end of the 17th century, when the first such work is attested, to the present time, when we witness a considerable increase of the number of polyglot dictionaries, especially the terminological ones. This paper aims at analyzing the context in which the first Romanian multilingual dictionaries were issued, as well as and the organization and structure particularities of the first lexicographic works of this type. The irretrievable loss of some of these works as well as the partial conservation of others renders the attempt to retrace the beginnings of Romanian lexicography extremely difficult. The research methodology is part of a descriptive and analytical approach based on two types of sources, subject to contrastive analysis: the notes made by the initiators of lexicographic projects and the testimonies of their contemporaries, respectively, along with the specialized studies regarding the history of the old Romanian lexicography. The analysis of the contents has indicated that these dictionaries lacked a scientific apparatus in the true sense of the phrase, failed to obey unitary organizational criteria, being limited, most of the times, to mere inventories of words, where the Romanian term was assigned its correspondent in other languages. Motivated by practical reasons, the first multilingual dictionaries were aimed at the clerics their purpose being to ensure the translators’ fidelity towards the original religious texts, regarded as sacred.

Keywords: Language, multilingual dictionary, Romanian lexicography, terminology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1363
37 Deep Learning Based, End-to-End Metaphor Detection in Greek with Recurrent and Convolutional Neural Networks

Authors: Konstantinos Perifanos, Eirini Florou, Dionysis Goutsos

Abstract:

This paper presents and benchmarks a number of end-to-end Deep Learning based models for metaphor detection in Greek. We combine Convolutional Neural Networks and Recurrent Neural Networks with representation learning to bear on the metaphor detection problem for the Greek language. The models presented achieve exceptional accuracy scores, significantly improving the previous state-of-the-art results, which had already achieved accuracy 0.82. Furthermore, no special preprocessing, feature engineering or linguistic knowledge is used in this work. The methods presented achieve accuracy of 0.92 and F-score 0.92 with Convolutional Neural Networks (CNNs) and bidirectional Long Short Term Memory networks (LSTMs). Comparable results of 0.91 accuracy and 0.91 F-score are also achieved with bidirectional Gated Recurrent Units (GRUs) and Convolutional Recurrent Neural Nets (CRNNs). The models are trained and evaluated only on the basis of training tuples, the related sentences and their labels. The outcome is a state-of-the-art collection of metaphor detection models, trained on limited labelled resources, which can be extended to other languages and similar tasks.

Keywords: Metaphor detection, deep learning, representation learning, embeddings.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 475
36 The Interplay of Locus of Control, Academic Achievement, and Biological Variables among Iranian Online EFL Learners

Authors: Azizeh Chalak, Niloufar Nasri

Abstract:

Students' academic achievement, along with the effects of different variables, has been a serious concern of educators since long ago. This study was an attempt to investigate the interplay of Locus of Control (LOC), academic achievement and biological variables among Iranian online EFL Learners. The participants of the study included 100 students of different age groups and genders studying English online at Iran Language Institute (ILI), Isfahan, Iran. The instrument used was Trice Academic LOC questionnaire which identifies orientations of internality or externality. The participants' Grade Point Averages (GPAs) were used as the measure of their academic achievement. A series of independent samples ttests were performed on the data. The results of the study showed that (a) there were no significant differences between male and female participants in LOC orientation, (b) there was no relationship between LOC and academic achievement among internal males and females, (c) external females were better achievers than external males, (d) and the age had no significant relationship with LOC and academic achievement. It can be concluded that the social, cultural patterns of genders have changed. This study might help sociologists and psychologists as well as applied linguists in that they reflect the recent social changes and their effects on the LOC and their consequent implications in teaching languages.

Keywords: Academic achievement, biological variables, Iranian online EFL learners, locus of control.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2207
35 E-Education in Multicultural Setting: The Success of Mobile Learning

Authors: Subramaniam Chandran

Abstract:

This paper explains how mobile learning assures sustainable e-education for multicultural group of students. This paper reports the impact of mobile learning on distance education in multicultural environment. The emergence of learning technologies through CD, internet, and mobile is increasingly adopted by distance institutes for quick delivery and cost-effective purposes. Their sustainability is conditioned by the structure of learners as well as the teaching community. The experimental study was conducted among the distant learners of Vinayaka Missions University located at Salem in India. Students were drawn from multicultural environment based on different languages, religions, class and communities. During the mobile learning sessions, the students, who are divided on language, religion, class and community, were dominated by play impulse rather than study anxiety or cultural inhibitions. This study confirmed that mobile learning improved the performance of the students despite their division based on region, language or culture. In other words, technology was able to transcend the relative deprivation in the multicultural groups. It also confirms sustainable e-education through mobile learning and cost-effective system of instruction. Mobile learning appropriates the self-motivation and play impulse of the young learners in providing sustainable e-education to multicultural social groups of students.

Keywords: E-Education, mobile learning, multiculturalism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2002
34 An Analysis of Language Borrowing among Algerian University Students Using Online Facebook Conversations

Authors: Messaouda Annab

Abstract:

The rapid development of technology has led to an important context in which different languages and structures are used in the same conversations. This paper investigates the practice of language borrowing within social media platform, namely, Facebook among Algerian Vernacular Arabic (AVA) students. In other words, this study will explore how Algerian students have incorporated lexical English borrowing in their online conversations. This paper will examine the relationships between language, culture and identity among a multilingual group. The main objective is to determine the cultural and linguistic functions that borrowing fulfills in social media and to explain the possible factors underlying English borrowing. The nature of the study entails the use of an online research method that includes ten online Facebook conversations in the form of private messages collected from Bachelor and Masters Algerian students recruited from the English department at the University of Oum El-Bouaghi. The analysis of data revealed that social media platform provided the users with opportunities to shift from one language to another. This practice was noticed in students’ online conversations. English borrowing was the most relevant language performance in accordance with Arabic which is the mother tongue of the chosen sample. The analysis has assumed that participants are skilled in more than one language.

Keywords: Borrowing, language performance, linguistic background, social media.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1231
33 Advances in Artificial Intelligence Using Speech Recognition

Authors: Khaled M. Alhawiti

Abstract:

This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.

Keywords: Speech recognition, acoustic phonetic, artificial intelligence, Hidden Markov Models (HMM), statistical models of speech recognition, human machine performance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7901
32 Formal Analysis of a Public-Key Algorithm

Authors: Markus Kaiser, Johannes Buchmann

Abstract:

In this article, a formal specification and verification of the Rabin public-key scheme in a formal proof system is presented. The idea is to use the two views of cryptographic verification: the computational approach relying on the vocabulary of probability theory and complexity theory and the formal approach based on ideas and techniques from logic and programming languages. A major objective of this article is the presentation of the first computer-proved implementation of the Rabin public-key scheme in Isabelle/HOL. Moreover, we explicate a (computer-proven) formalization of correctness as well as a computer verification of security properties using a straight-forward computation model in Isabelle/HOL. The analysis uses a given database to prove formal properties of our implemented functions with computer support. The main task in designing a practical formalization of correctness as well as efficient computer proofs of security properties is to cope with the complexity of cryptographic proving. We reduce this complexity by exploring a light-weight formalization that enables both appropriate formal definitions as well as efficient formal proofs. Consequently, we get reliable proofs with a minimal error rate augmenting the used database, what provides a formal basis for more computer proof constructions in this area.

Keywords: public-key encryption, Rabin public-key scheme, formalproof system, higher-order logic, formal verification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1492
31 End-to-End Spanish-English Sequence Learning Translation Model

Authors: Vidhu Mitha Goutham, Ruma Mukherjee

Abstract:

The low availability of well-trained, unlimited, dynamic-access models for specific languages makes it hard for corporate users to adopt quick translation techniques and incorporate them into product solutions. As translation tasks increasingly require a dynamic sequence learning curve; stable, cost-free opensource models are scarce. We survey and compare current translation techniques and propose a modified sequence to sequence model repurposed with attention techniques. Sequence learning using an encoder-decoder model is now paving the path for higher precision levels in translation. Using a Convolutional Neural Network (CNN) encoder and a Recurrent Neural Network (RNN) decoder background, we use Fairseq tools to produce an end-to-end bilingually trained Spanish-English machine translation model including source language detection. We acquire competitive results using a duo-lingo-corpus trained model to provide for prospective, ready-made plug-in use for compound sentences and document translations. Our model serves a decent system for large, organizational data translation needs. While acknowledging its shortcomings and future scope, it also identifies itself as a well-optimized deep neural network model and solution.

Keywords: Attention, encoder-decoder, Fairseq, Seq2Seq, Spanish, translation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 415
30 Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction

Authors: Jérôme Azé, Mathieu Roche, Yves Kodratoff, Michèle Sebag

Abstract:

Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).

Keywords: Text-mining, Terminology Extraction, Evolutionary algorithm, ROC Curve.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1618
29 The Effectiveness of Implementing Interactive Training for Teaching Kazakh Language

Authors: Samal Abzhanova, Saule Mussabekova

Abstract:

Today, a new system of education is being created in Kazakhstan in order to develop the system of education and to satisfy the world class standards. For this purpose, there have been established new requirements and responsibilities to the instructors. Students should not be limited with providing only theoretical knowledge. Also, they should be encouraged to be competitive, to think creatively and critically. Moreover, students should be able to implement these skills into practice. These issues could be resolved through the permanent improvement of teaching methods. Therefore, a specialist who teaches the languages should use up-to-date methods and introduce new technologies. The result of the investigation suggests that an interactive teaching method is one of the new technologies in this field. This paper aims to provide information about implementing new technologies in the process of teaching language. The paper will discuss about necessity of introducing innovative technologies and the techniques of organizing interactive lessons. At the same time, the structure of the interactive lesson, conditions, principles, discussions, small group works and role-playing games will be considered. Interactive methods are carried out with the help of several types of activities, such as working in a team (with two or more group of people), playing situational or role-playing games, working with different sources of information, discussions, presentations, creative works and learning through solving situational tasks and etc.

Keywords: Games, interactive learning, Kazakh language, teaching methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1387
28 EGCL: An Extended G-Code Language with Flow Control, Functions and Mnemonic Variables

Authors: Oscar E. Ruiz, S. Arroyave, J. F. Cardona

Abstract:

In the context of computer numerical control (CNC) and computer aided manufacturing (CAM), the capabilities of programming languages such as symbolic and intuitive programming, program portability and geometrical portfolio have special importance. They allow to save time and to avoid errors during part programming and permit code re-usage. Our updated literature review indicates that the current state of art presents voids in parametric programming, program portability and programming flexibility. In response to this situation, this article presents a compiler implementation for EGCL (Extended G-code Language), a new, enriched CNC programming language which allows the use of descriptive variable names, geometrical functions and flow-control statements (if-then-else, while). Our compiler produces low-level generic, elementary ISO-compliant Gcode, thus allowing for flexibility in the choice of the executing CNC machine and in portability. Our results show that readable variable names and flow control statements allow a simplified and intuitive part programming and permit re-usage of the programs. Future work includes allowing the programmer to define own functions in terms of EGCL, in contrast to the current status of having them as library built-in functions.

Keywords: CNC Programming, Compiler, G-code Language, Numerically Controlled Machine-Tools.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2576
27 JaCoText: A Pretrained Model for Java Code-Text Generation

Authors: Jessica Lòpez Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri

Abstract:

Pretrained transformer-based models have shown high performance in natural language generation task. However, a new wave of interest has surged: automatic programming language generation. This task consists of translating natural language instructions to a programming code. Despite the fact that well-known pretrained models on language generation have achieved good performance in learning programming languages, effort is still needed in automatic code generation. In this paper, we introduce JaCoText, a model based on Transformers neural network. It aims to generate java source code from natural language text. JaCoText leverages advantages of both natural language and code generation models. More specifically, we study some findings from the state of the art and use them to (1) initialize our model from powerful pretrained models, (2) explore additional pretraining on our java dataset, (3) carry out experiments combining the unimodal and bimodal data in the training, and (4) scale the input and output length during the fine-tuning of the model. Conducted experiments on CONCODE dataset show that JaCoText achieves new state-of-the-art results.

Keywords: Java code generation, Natural Language Processing, Sequence-to-sequence Models, Transformers Neural Networks.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 735
26 Semantic Modeling of Management Information: Enabling Automatic Reasoning on DMTF-CIM

Authors: Fernando Alonso, Rafael Fernandez, Sonia Frutos, Javier Soriano

Abstract:

CIM is the standard formalism for modeling management information developed by the Distributed Management Task Force (DMTF) in the context of its WBEM proposal, designed to provide a conceptual view of the managed environment. In this paper, we propose the inclusion of formal knowledge representation techniques, based on Description Logics (DLs) and the Web Ontology Language (OWL), in CIM-based conceptual modeling, and then we examine the benefits of such a decision. The proposal is specified as a CIM metamodel level mapping to a highly expressive subset of DLs capable of capturing all the semantics of the models. The paper shows how the proposed mapping can be used for automatic reasoning about the management information models, as a design aid, by means of new-generation CASE tools, thanks to the use of state-of-the-art automatic reasoning systems that support the proposed logic and use algorithms that are sound and complete with respect to the semantics. Such a CASE tool framework has been developed by the authors and its architecture is also introduced. The proposed formalization is not only useful at design time, but also at run time through the use of rational autonomous agents, in response to a need recently recognized by the DMTF.

Keywords: CIM, Knowledge-based Information Models, Ontology Languages, OWL, Description Logics, Integrated Network Management, Intelligent Agents, Automatic Reasoning Techniques.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1693
25 Article 5 (3) of the Brussels I Regulation and Its Applicability in the Case of Intellectual Property Rights Infringement on the Internet

Authors: Nataliya Hitsevich

Abstract:

Article 5(3) of the Brussels I Regulation provides that a person domiciled in a Member State may be sued in another Member State in matters relating to tort, delict or quasi-delict, in the courts for the place where the harmful events occurred or may occur. For a number of years Article 5 (3) of the Brussels I Regulation has been at the centre of the debate regarding the intellectual property rights infringement over the Internet. Nothing has been done to adapt the provisions relating to non-internet cases of infringement of intellectual property rights to the context of the Internet. The author’s findings indicate that in the case of intellectual property rights infringement on the Internet, the plaintiff has the option to sue either: the court of the Member State of the event giving rise to the damage: where the publisher of the newspaper is established; the court of the Member State where the damage occurred: where defamatory article is distributed. However, it must be admitted that whilst infringement over the Internet has some similarity to multi-State defamation by means of newspapers, the position is not entirely analogous due to the cross-border nature of the Internet. A simple example which may appropriately illustrate its contentious nature is a defamatory statement published on a website accessible in different Member States, and available in different languages. Therefore, we need to answer the question: how these traditional jurisdictional rules apply in the case of intellectual property rights infringement over the Internet? Should these traditional jurisdictional rules be modified?

Keywords: Intellectual property rights, infringement, Internet, jurisdiction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4476
24 The Folksongs of Jharkhand: An Intangible Cultural Heritage of Tribal India

Authors: Walter Beck

Abstract:

Jharkhand is newly constituted 28th State in the eastern part of India which is known for the oldest settlement of the indigenous people. In the State of Jharkhand in which broadly three language family are found namely, Austric, Dravidian, and Indo-European. Ex-Mundari, kharia, Ho Santali come from the Austric Language family. Kurukh, Malto under Dravidian language family and Nagpuri Khorta etc. under Indo-European language family. There are 32 Indigenous Communities identified as Scheduled Tribe in the State of Jharkhand. Santhal, Munda, Kahria, Ho and Oraons are some of the major Tribe of the Jharkhand state. Jharkhand has a Rich Cultural heritage which includes Folk art, folklore, Folk Dance, Folk Music, Folk Songs for which diversity can been seen from place to place, season to season and all traditional Culture and practices. The languages as well as the songs are vulnerable to dominant culture and hence needed to be protected. The collection and documentation of these songs in their natural setting adds significant contribution to the conservation and propagation of the cultural elements. This paper reflects to bring out the Originality of the Collected Songs from remote areas of the plateau of Sothern Jharkhand as a rich intangible Cultural heritage of the Country. The research was done through participatory observation. In this research project more than 100 songs which were never documented before.

Keywords: Cultural heritage, India, Indigenous people, songs.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2105
23 Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines

Authors: Mona Soliman Habib

Abstract:

This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. A simple machine learning approach is used that eliminates prior language knowledge such as part-of-speech or noun phrase tagging thereby allowing for its applicability across languages. No domain-specific knowledge is included. The accuracy measures achieved are comparable to those obtained using more complex approaches, which constitutes a motivation to investigate ways to improve the scalability of multiclass SVM in order to make the solution more practical and useable. Improving training time of multi-class SVM would make support vector machines a more viable and practical machine learning solution for real-world problems with large datasets. An initial prototype results in great improvement of the training time at the expense of memory requirements.

Keywords: Named entity recognition, support vector machines, language independence, bioinformatics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1645
22 Digital Twin of Real Electrical Distribution System with Real Time Recursive Load Flow Calculation and State Estimation

Authors: Anosh Arshad Sundhu, Francesco Giordano, Giacomo Della Croce, Maurizio Arnone

Abstract:

Digital Twin (DT) is a technology that generates a virtual representation of a physical system or process, enabling real-time monitoring, analysis, and simulation. DT of an Electrical Distribution System (EDS) can perform online analysis by integrating the static and real-time data in order to show the current grid status and predictions about the future status to the Distribution System Operator (DSO), producers and consumers. DT technology for EDS also offers the opportunity to DSO to test hypothetical scenarios. This paper discusses the development of a DT of an EDS by Smart Grid Controller (SGC) application, which is developed using open-source libraries and languages. The developed application can be integrated with Supervisory Control and Data Acquisition System (SCADA) of any EDS for creating the DT. The paper shows the performance of developed tools inside the application, tested on real EDS for grid observability, Smart Recursive Load Flow (SRLF) calculation and state estimation of loads in MV feeders.

Keywords: Digital Twin, Distribution System Operator, Electrical Distribution System, Smart Grid Controller, Supervisory Control and Data Acquisition System, Smart Recursive Load Flow.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 157
21 A Corpus-Based Analysis on Code-Mixing Features in Mandarin-English Bilingual Children in Singapore

Authors: Xunan Huang, Caicai Zhang

Abstract:

This paper investigated the code-mixing features in Mandarin-English bilingual children in Singapore. First, it examined whether the code-mixing rate was different in Mandarin Chinese and English contexts. Second, it explored the syntactic categories of code-mixing in Singapore bilingual children. Moreover, this study investigated whether morphological information was preserved when inserting syntactic components into the matrix language. Data are derived from the Singapore Bilingual Corpus, in which the recordings and transcriptions of sixty English-Mandarin 5-to-6-year-old children were preserved for analysis. Results indicated that the rate of code-mixing was asymmetrical in the two language contexts, with the rate being significantly higher in the Mandarin context than that in the English context. The asymmetry is related to language dominance in that children are more likely to code-mix when using their nondominant language. Concerning the syntactic categories of code-mixing words in the Singaporean bilingual children, we found that noun-mixing, verb-mixing, and adjective-mixing are the three most frequently used categories in code-mixing in the Mandarin context. This pattern mirrors the syntactic categories of code-mixing in the Cantonese context in Cantonese-English bilingual children, and the general trend observed in lexical borrowing. Third, our results also indicated that English vocabularies that carry morphological information are embedded in bare forms in the Mandarin context. These findings shed light upon how bilingual children take advantage of the two languages in mixed utterances in a bilingual environment.

Keywords: Code-mixing, Mandarin Chinese, English, bilingual children.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1053
20 Co-Articulation between Consonant and Vowel in Cantonese Syllables

Authors: Wai-Sum Lee

Abstract:

This study investigates C-V and V-C co-articulation in Cantonese monosyllables of the CV, VC or CVC structure, with C = one of the three stop consonants [p, t, k] and V = one of the three corner vowels [i, a, u]. Five repetitions of each test syllable on a randomized list were elicited from Cantonese young adult speakers in their early-20s. A research tool, EMA AG500, was used to record the synchronized audio signals and articulatory data at three different locations of the tongue – tongue tip, tongue middle, and tongue back – and the positions of the upper and lower lips during the test syllables. The main findings based on the articulatory data collected from two male Cantonese speakers are as follows: (i) For the syllable-initial [p-], strong co-articulation is observed when [p-] preceding the high vowel [i] or [u], but not the low vowel [a]. As for the syllable-final [-p], it is strongly co-articulated with the preceding vowel, even when the vowel is [a]. (ii) The co-articulation between the initial [t-] and the following vowel of any type is weak. In the syllable-final position, the degree of co-articulatory resistance of [-t] is also large when following the vowel [u], but [-t] is largely co-articulated with the preceding vowel when the vowel is [i] or [a]. (iii) The strength of co-articulation differs when the initial [k-] precedes the different types of vowel. A stronger co-articulation between [k-] and [i] than between [k-] and [u], and the strength of co-articulation is much reduced between [k-] and [a]. However, in the syllable-final position, there is strong co-articulation between [-k] and the preceding vowel [a]. (iv) Among the three types of stop consonants in the syllable-initial position, the decreasing degree of co-articulatory resistance (CR) is [t-] > [k-] > [p-], and the degree of CR is reduced during all three types of stop in the syllable-final position. In general, the data on co-articulation between consonant and vowel in the Cantonese monosyllables are similar to those in other languages reported in previous studies.

Keywords: Cantonese, co-articulation, consonant, vowel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1081
19 Context Detection in Spreadsheets Based on Automatically Inferred Table Schema

Authors: Alexander Wachtel, Michael T. Franzen, Walter F. Tichy

Abstract:

Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.

Keywords: Natural language processing, end user development; natural language interfaces, human computer interaction, data recognition, dialog systems, spreadsheet.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1078
18 The Role of Ideophones: Phonological and Morphological Characteristics in Literature

Authors: Cristina Bahón Arnaiz

Abstract:

Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature. 

Keywords: Ideophones, mimetic words, phonomimes, phenomimes, psychomimes, sound symbolism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1053
17 The Effect of Iconic and Beat Gestures on Memory Recall in Greek’s First and Second Language

Authors: Eleni Ioanna Levantinou

Abstract:

Gestures play a major role in comprehension and memory recall due to the fact that aid the efficient channel of the meaning and support listeners’ comprehension and memory. In the present study, the assistance of two kinds of gestures (iconic and beat gestures) is tested in regards to memory and recall. The hypothesis investigated here is whether or not iconic and beat gestures provide assistance in memory and recall in Greek and in Greek speakers’ second language. Two groups of participants were formed, one comprising Greeks that reside in Athens and one with Greeks that reside in Copenhagen. Three kinds of stimuli were used: A video with words accompanied with iconic gestures, a video with words accompanied with beat gestures and a video with words alone. The languages used are Greek and English. The words in the English videos were spoken by a native English speaker and by a Greek speaker talking English. The reason for this is that when it comes to beat gestures that serve a meta-cognitive function and are generated according to the intonation of a language, prosody plays a major role. Thus, participants that have different influences in prosody may generate different results from rhythmic gestures. Memory recall was assessed by asking the participants to try to remember as many words as they could after viewing each video. Results show that iconic gestures provide significant assistance in memory and recall in Greek and in English whether they are produced by a native or a second language speaker. In the case of beat gestures though, the findings indicate that beat gestures may not play such a significant role in Greek language. As far as intonation is concerned, a significant difference was not found in the case of beat gestures produced by a native English speaker and by a Greek speaker talking English.

Keywords: First language, gestures, memory, second language acquisition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1231