Search results for: parallel corpora
1283 Learning to Translate by Learning to Communicate to an Entailment Classifier
Authors: Szymon Rutkowski, Tomasz Korbak
Abstract:
We present a reinforcement-learning-based method of training neural machine translation models without parallel corpora. The standard encoder-decoder approach to machine translation suffers from two problems we aim to address. First, it needs parallel corpora, which are scarce, especially for low-resource languages. Second, it lacks psychological plausibility of learning procedure: learning a foreign language is about learning to communicate useful information, not merely learning to transduce from one language’s 'encoding' to another. We instead pose the problem of learning to translate as learning a policy in a communication game between two agents: the translator and the classifier. The classifier is trained beforehand on a natural language inference task (determining the entailment relation between a premise and a hypothesis) in the target language. The translator produces a sequence of actions that correspond to generating translations of both the hypothesis and premise, which are then passed to the classifier. The translator is rewarded for classifier’s performance on determining entailment between sentences translated by the translator to disciple’s native language. Translator’s performance thus reflects its ability to communicate useful information to the classifier. In effect, we train a machine translation model without the need for parallel corpora altogether. While similar reinforcement learning formulations for zero-shot translation were proposed before, there is a number of improvements we introduce. While prior research aimed at grounding the translation task in the physical world by evaluating agents on an image captioning task, we found that using a linguistic task is more sample-efficient. Natural language inference (also known as recognizing textual entailment) captures semantic properties of sentence pairs that are poorly correlated with semantic similarity, thus enforcing basic understanding of the role played by compositionality. It has been shown that models trained recognizing textual entailment produce high-quality general-purpose sentence embeddings transferrable to other tasks. We use stanford natural language inference (SNLI) dataset as well as its analogous datasets for French (XNLI) and Polish (CDSCorpus). Textual entailment corpora can be obtained relatively easily for any language, which makes our approach more extensible to low-resource languages than traditional approaches based on parallel corpora. We evaluated a number of reinforcement learning algorithms (including policy gradients and actor-critic) to solve the problem of translator’s policy optimization and found that our attempts yield some promising improvements over previous approaches to reinforcement-learning based zero-shot machine translation.Keywords: agent-based language learning, low-resource translation, natural language inference, neural machine translation, reinforcement learning
Procedia PDF Downloads 1281282 Adjectives in Academic Discourse: A Comparative Study of Research Articles
Authors: Beata Grymska
Abstract:
The research studies on academic discourse focus in general on lexical bundles, epistemic modality markers, or interactions between writers and readers. Following the research into the written forms of the academic community, this study concentrates on adjectives in research articles. The study investigates the distribution of adjectives in research articles in two academic disciplines: linguistics and medicine. It is corpus-based in design and consists of 100 linguistic and 100 medical research articles all written in English. The aim of the study is to compare the distribution of adjectives between the two corpora and four main parts of articles: IMRD (Introduction, Methods, Results, and Discussion). The second aim is to see if the two corpora share common core adjectives, e.g., different, important, specific, and if there are discipline-specific adjectives. The further part of the paper elaborates on adjectives use in the corpora together with examples. The results indicate that the two corpora do not differ in the distribution of adjectives to a great extent. The occurrences of the most frequently used adjectives depend on the academic discipline of the research articles. The concluding part reflects upon the role of adjectives in academic discourse and also presents how corpora can be helpful in composing academic texts.Keywords: academic discourse, academic texts, adjectives, corpus analysis, research articles
Procedia PDF Downloads 1911281 The Contribution of Corpora to the Investigation of Cross-Linguistic Equivalence in Phraseology: A Contrastive Analysis of Russian and Italian Idioms
Authors: Federica Floridi
Abstract:
The long tradition of contrastive idiom research has essentially been focusing on three domains: the comparison of structural types of idioms (e.g. verbal idioms, idioms with noun-phrase structure, etc.), the description of idioms belonging to the same thematic groups (Sachgruppen), the identification of different types of cross-linguistic equivalents (i.e. full equivalents, partial equivalents, phraseological parallels, non-equivalents). The diastratic, diachronic and diatopic aspects of the compared idioms, as well as their syntactic, pragmatic and semantic properties, have been rather ignored. Corpora (both monolingual and parallel) give the opportunity to investigate the actual use of correlating idioms in authentic texts of L1 and L2. Adopting the corpus-based approach, it is possible to draw attention to the frequency of occurrence of idioms, their syntactic embedding, their potential syntactic transformations (e.g., nominalization, passivization, relativization, etc.), their combinatorial possibilities, the variations of their lexical structure, their connotations in terms of stylistic markedness or register. This paper aims to present the results of a contrastive analysis of Russian and Italian idioms referring to the concepts of ‘beginning’ and ‘end’, that has been carried out by using the Russian National Corpus and the ‘La Repubblica’ corpus. Beyond the digital corpora, bilingual dictionaries, like Skvorcova - Majzel’, Dobrovol’skaja, Kovalev, Čerdanceva, as well as monolingual resources, have been consulted. The study has shown that many of the idioms that have been traditionally indicated as cross-linguistic equivalents on bilingual dictionaries cannot be considered correspondents. The findings demonstrate that even those idioms, that are formally identical in Russian and Italian and are presumably derived from the same source (e.g., conceptual metaphor, Bible, classical mythology, World literature), exhibit differences regarding usage. The ultimate purpose of this article is to highlight that it is necessary to review and improve the existing bilingual dictionaries considering the empirical data collected in corpora. The materials gathered in this research can contribute to this sense.Keywords: corpora, cross-linguistic equivalence, idioms, Italian, Russian
Procedia PDF Downloads 1491280 Applied Linguistics: Language, Corpora, and Technology
Authors: M. Imran
Abstract:
This research explores the intersections of applied linguistics, corpus linguistics, translation, and technology, aiming to present innovative cross-disciplinary tools and frameworks. It highlights significant contributions to language, corpora, and technology within applied linguistics, which deepen our understanding of these domains and provide practical resources for scholars, educators, and translators. By showcasing these advancements, the study seeks to enhance collaboration and application in language-related fields. The significance of applied linguistics is emphasized by some of the research that has been emphasized, which presents pedagogical perspectives that could enhance instruction and the learning results of student’s at all academic levels as well as translation trainees. Researchers provided useful data from language studies with classroom applications from an instructional standpoint.Keywords: linguistics, language, corpora, technology
Procedia PDF Downloads 201279 Damage Strain Analysis of Parallel Fiber Eutectic
Authors: Jian Zheng, Xinhua Ni, Xiequan Liu
Abstract:
According to isotropy of parallel fiber eutectic, the no- damage strain field in parallel fiber eutectic is obtained from the flexibility tensor of parallel fiber eutectic. Considering the damage behavior of parallel fiber eutectic, damage variables are introduced to determine the strain field of parallel fiber eutectic. The damage strains in the matrix, interphase, and fiber of parallel fiber eutectic are quantitatively analyzed. Results show that damage strains are not only associated with the fiber volume fraction of parallel fiber eutectic, but also with the damage degree.Keywords: damage strain, initial strain, fiber volume fraction, parallel fiber eutectic
Procedia PDF Downloads 5781278 Using Corpora in Semantic Studies of English Adjectives
Authors: Oxana Lukoshus
Abstract:
The methods of corpus linguistics, a well-established field of research, are being increasingly applied in cognitive linguistics. Corpora data are especially useful for different quantitative studies of grammatical and other aspects of language. The main objective of this paper is to demonstrate how present-day corpora can be applied in semantic studies in general and in semantic studies of adjectives in particular. Polysemantic adjectives have been the subject of numerous studies. But most of them have been carried out on dictionaries. Undoubtedly, dictionaries are viewed as one of the basic data sources, but only at the initial steps of a research. The author usually starts with the analysis of the lexicographic data after which s/he comes up with a hypothesis. In the research conducted three polysemantic synonyms true, loyal, faithful have been analyzed in terms of differences and similarities in their semantic structure. A corpus-based approach in the study of the above-mentioned adjectives involves the following. After the analysis of the dictionary data there was the reference to the following corpora to study the distributional patterns of the words under study – the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). These corpora are continually updated and contain thousands of examples of the words under research which make them a useful and convenient data source. For the purpose of this study there were no special needs regarding genre, mode or time of the texts included in the corpora. Out of the range of possibilities offered by corpus-analysis software (e.g. word lists, statistics of word frequencies, etc.), the most useful tool for the semantic analysis was the extracting a list of co-occurrence for the given search words. Searching by lemmas, e.g. true, true to, and grouping the results by lemmas have proved to be the most efficient corpora feature for the adjectives under the study. Following the search process, the corpora provided a list of co-occurrences, which were then to be analyzed and classified. Not every co-occurrence was relevant for the analysis. For example, the phrases like An enormous sense of responsibility to protect the minds and hearts of the faithful from incursions by the state was perceived to be the basic duty of the church leaders or ‘True,’ said Phoebe, ‘but I'd probably get to be a Union Official immediately were left out as in the first example the faithful is a substantivized adjective and in the second example true is used alone with no other parts of speech. The subsequent analysis of the corpora data gave the grounds for the distribution groups of the adjectives under the study which were then investigated with the help of a semantic experiment. To sum it up, the corpora-based approach has proved to be a powerful, reliable and convenient tool to get the data for the further semantic study.Keywords: corpora, corpus-based approach, polysemantic adjectives, semantic studies
Procedia PDF Downloads 3151277 Parallel PRBS Generation and Parallel BER Tester for 8-Gbps On-chip Interconnection Testing
Authors: Zhao Bin, Yan Dan Lei
Abstract:
In this paper, a multi-pattern parallel PRBS generator and a dedicated parallel BER tester is proposed for the 8-Gbps On-chip interconnection testing. A unique full-parallel PRBS checker is also proposed. The proposed design, together with the custom-designed high-speed parallel-to-serial and the serial-to-parallel circuit, will be used to test different on-chip interconnection transceivers. The design is implemented in TSMC 28nm CMOS technology with working voltage at 1.0 V. The serial to parallel ratio is 8:1 so the parallel PRBS generation and BER Tester can be run at lower speed.Keywords: PRBS, BER, high speed, generator
Procedia PDF Downloads 7631276 Discourse Markers in Chinese University Students and Native English Speakers: A Corpus-Based Study
Authors: Dan Xie
Abstract:
The use of discourse markers (DMs) can play a crucial role in representing discourse interaction and pragmatic competence. Learners’ use of DMs and differences between native speakers (NSs) and non-native speakers (NNSs) in the use of various DMs have been the focus of considerable research attention. However, some commonly used DMs, such as you know, have not received as much attention in comparative studies, especially in the Chinese context. This study analyses data in two corpora (COLSEC and Spoken BNC 2014 (14-25)) to investigate how Chinese learners differ from NNSs in their use of the DM you know and its functions in speech. The results show that there is a significant difference between the two corpora in terms of the frequency of use of you know. In terms of the functions of you know, the study shows that six functions can all be present in both corpora, although there are significant differences between the five functional dimensions, especially in introducing a claim linked to the prior discourse and highlighting particular points in the discourse. It is hoped to show empirically how Chinese learners and NSs use DMs differently.Keywords: you know, discourse marker, native speaker, Chinese learner
Procedia PDF Downloads 811275 Query in Grammatical Forms and Corpus Error Analysis
Authors: Katerina Florou
Abstract:
Two decades after coined the term "learner corpora" as collections of texts created by foreign or second language learners across various language contexts, and some years following suggestion to incorporate "focusing on form" within a Task-Based Learning framework, this study aims to explore how learner corpora, whether annotated with errors or not, can facilitate a focus on form in an educational setting. Argues that analyzing linguistic form serves the purpose of enabling students to delve into language and gain an understanding of different facets of the foreign language. This same objective is applicable when analyzing learner corpora marked with errors or in their raw state, but in this scenario, the emphasis lies on identifying incorrect forms. Teachers should aim to address errors or gaps in the students' second language knowledge while they engage in a task. Building on this recommendation, we compared the written output of two student groups: the first group (G1) employed the focusing on form phase by studying a specific aspect of the Italian language, namely the past participle, through examples from native speakers and grammar rules; the second group (G2) focused on form by scrutinizing their own errors and comparing them with analogous examples from a native speaker corpus. In order to test our hypothesis, we created four learner corpora. The initial two were generated during the task phase, with one representing each group of students, while the remaining two were produced as a follow-up activity at the end of the lesson. The results of the first comparison indicated that students' exposure to their own errors can enhance their grasp of a grammatical element. The study is in its second stage and more results are to be announced.Keywords: Corpus interlanguage analysis, task based learning, Italian language as F1, learner corpora
Procedia PDF Downloads 541274 A Parallel Implementation of k-Means in MATLAB
Authors: Dimitris Varsamis, Christos Talagkozis, Alkiviadis Tsimpiris, Paris Mastorocostas
Abstract:
The aim of this work is the parallel implementation of k-means in MATLAB, in order to reduce the execution time. Specifically, a new function in MATLAB for serial k-means algorithm is developed, which meets all the requirements for the conversion to a function in MATLAB with parallel computations. Additionally, two different variants for the definition of initial values are presented. In the sequel, the parallel approach is presented. Finally, the performance tests for the computation times respect to the numbers of features and classes are illustrated.Keywords: K-means algorithm, clustering, parallel computations, Matlab
Procedia PDF Downloads 3851273 A Survey on Constraint Solving Approaches Using Parallel Architectures
Authors: Nebras Gharbi, Itebeddine Ghorbel
Abstract:
In the latest years and with the advancements of the multicore computing world, the constraint programming community tried to benefit from the capacity of new machines and make the best use of them through several parallel schemes for constraint solving. In this paper, we propose a survey of the different proposed approaches to solve Constraint Satisfaction Problems using parallel architectures. These approaches use in a different way a parallel architecture: the problem itself could be solved differently by several solvers or could be split over solvers.Keywords: constraint programming, parallel programming, constraint satisfaction problem, speed-up
Procedia PDF Downloads 3201272 Frequency of the English Phrasal Verbs Used by Iranian Learners as a Reference to the Style of Writing Adopted by the Learners
Authors: Hamzeh Mazaherylaghab, Mehrangiz Vahabian, Seyyedeh Zahra Asghari
Abstract:
The present study initially focused on the frequency of phrasal verbs used by Iranian learners of English. The results then needed to be compared to the findings from native speaker corpora. After the extraction of phrasal verbs from learner and native-speaker corpora the findings were analysed. The results showed that Iranian learners avoided using phrasal verbs in many cases. Some of the findings proved to be significant. It was also found that the learners used the single-word counterparts of the avoided phrasal verbs to compensate for their lack of knowledge in many cases. Semantic complexity and Lack of L1 counterpart may have been the main reasons for avoidance, but despite the avoidance phenomenon, the learners displayed a tendency to use many other phrasal verbs which may have been due to the increase in the number of multi-word verbs in Persian. The overall scores confirmed the fact that the language produced by the learners illustrates signs of more formal style in comparison with the native speakers of English by using less phrasal verbs and more formal single word verbs instead.Keywords: corpus, corpora, LOCNESS, phrasal verbs, single-word verb
Procedia PDF Downloads 2031271 The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English
Authors: Ana Elvira Ojanguren Lopez, Javier Martin Arista
Abstract:
The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed.Keywords: corpus linguistics, historical linguistics, old English, parallel corpus
Procedia PDF Downloads 2131270 Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine
Authors: C. Brierley, H. El-Farahaty, A. Farhan
Abstract:
The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’.Keywords: Arabic constitution, corpus-based legal linguistics, human rights, parallel Arabic-English legal corpora
Procedia PDF Downloads 1831269 The Vision Baed Parallel Robot Control
Abstract:
In this paper, we describe the control strategy of high speed parallel robot system with EtherCAT network. This work deals the parallel robot system with centralized control on the real-time operating system such as window TwinCAT3. Most control scheme and algorithm is implemented master platform on the PC, the input and output interface is ported on the slave side. The data is transferred by maximum 20usecond with 1000byte. EtherCAT is very high speed and stable industrial network. The control strategy with EtherCAT is very useful and robust on Ethernet network environment. The developed parallel robot is controlled pre-design nonlinear controller for 6G/0.43 cycle time of pick and place motion tracking. The experiment shows the good design and validation of the controller.Keywords: parallel robot control, etherCAT, nonlinear control, parallel robot inverse kinematic
Procedia PDF Downloads 5711268 Flowing Online Vehicle GPS Data Clustering Using a New Parallel K-Means Algorithm
Authors: Orhun Vural, Oguz Bayat, Rustu Akay, Osman N. Ucan
Abstract:
This study presents a new parallel approach clustering of GPS data. Evaluation has been made by comparing execution time of various clustering algorithms on GPS data. This paper aims to propose a parallel based on neighborhood K-means algorithm to make it faster. The proposed parallelization approach assumes that each GPS data represents a vehicle and to communicate between vehicles close to each other after vehicles are clustered. This parallelization approach has been examined on different sized continuously changing GPS data and compared with serial K-means algorithm and other serial clustering algorithms. The results demonstrated that proposed parallel K-means algorithm has been shown to work much faster than other clustering algorithms.Keywords: parallel k-means algorithm, parallel clustering, clustering algorithms, clustering on flowing data
Procedia PDF Downloads 2221267 Parallel 2-Opt Local Search on GPU
Authors: Wen-Bao Qiao, Jean-Charles Créput
Abstract:
To accelerate the solution for large scale traveling salesman problems (TSP), a parallel 2-opt local search algorithm with simple implementation based on Graphics Processing Unit (GPU) is presented and tested in this paper. The parallel scheme is based on technique of data decomposition by dynamically assigning multiple K processors on the integral tour to treat K edges’ 2-opt local optimization simultaneously on independent sub-tours, where K can be user-defined or have a function relationship with input size N. We implement this algorithm with doubly linked list on GPU. The implementation only requires O(N) memory. We compare this parallel 2-opt local optimization against sequential exhaustive 2-opt search along integral tour on TSP instances from TSPLIB with more than 10000 cities.Keywords: parallel 2-opt, double links, large scale TSP, GPU
Procedia PDF Downloads 6281266 Parallelization by Domain Decomposition for 1-D Sugarcane Equation with Message Passing Interface
Authors: Ewedafe Simon Uzezi
Abstract:
In this paper we presented a method based on Domain Decomposition (DD) for parallelization of 1-D Sugarcane Equation on parallel platform with parallel paradigms on Master-Slave platform using Message Passing Interface (MPI). The 1-D Sugarcane Equation was discretized using explicit method of discretization requiring evaluation nof temporal and spatial distribution of temperature. This platform gives better predictions of the effects of temperature distribution of the sugarcane problem. This work presented parallel overheads with overlapping communication and communication across parallel computers with numerical results across different block sizes with scalability. However, performance improvement strategies from the DD on various mesh sizes were compared experimentally and parallel results show speedup and efficiency for the parallel algorithms design.Keywords: sugarcane, parallelization, explicit method, domain decomposition, MPI
Procedia PDF Downloads 251265 Designing a Corpus Database to Enhance the Learning of Old English Language
Authors: Raquel Mateo Mendaza, Carmen Novo Urraca
Abstract:
The current paper presents the elaboration of a corpus database that aligns two different corpora in order to simplify the search of information both for researchers and students of Old English. This database comprises the information contained in two main reference corpora, namely the Dictionary of Old English Corpus (DOEC), compiled at the University of Toronto, and the York-Toronto-Helsinki Parsed Corpus of Old English (YCOE). The first one provides information on all surviving texts written in the Old English language. The latter offers the syntactical and morphological annotation of several texts included in the DOEC. Although both corpora are closely related, as the YCOE includes the DOE source text identifier, the main problem detected is that there is not an alignment of texts that allows for the search of whole fragments to be further analysed in terms of morphology and syntax. The database proposed in this paper gathers all this information and presents it in a simple, more accessible, visual, and educational way. The alignment of fragments has been done in an automatized way. However, some problems have emerged during the creating process particularly related to the lack of correspondence in the division of fragments. For this reason, it has been necessary to revise the whole entries manually to obtain a truthful high-quality product and to carefully indicate the gaps encountered in these corpora. All in all, this database contains more than 60,000 entries corresponding with the DOE fragments annotated by the YCOE. The main strength of the resulting product is its research and teaching implications in the study of Old English. The use of this database will help researchers and students in the study of different aspects of the language, such as inflectional morphology, syntactic behaviour of given words, or translation studies, among others. By means of the search of words or fragments, the annotated information on morphology and syntax will be automatically displayed, automatizing, and speeding up the search of data.Keywords: alignment, corpus database, morphosyntactic analysis, Old English
Procedia PDF Downloads 1341264 Corpora in Secondary Schools Training Courses for English as a Foreign Language Teachers
Authors: Francesca Perri
Abstract:
This paper describes a proposal for a teachers’ training course, focused on the introduction of corpora in the EFL didactics (English as a foreign language) of some Italian secondary schools. The training course is conceived as a part of a TEDD participant’s five months internship. TEDD (Technologies for Education: diversity and devices) is an advanced course held by the Department of Engineering and Information Technology at the University of Trento, Italy. Its main aim is to train a selected, heterogeneous group of graduates to engage with the complex interdependence between education and technology in modern society. The educational approach draws on a plural coexistence of various theories as well as socio-constructivism, constructionism, project-based learning and connectivism. TEDD educational model stands as the main reference source to the design of a formative course for EFL teachers, drawing on the digitalization of didactics and creation of learning interactive materials for L2 intermediate students. The training course lasts ten hours, organized into five sessions. In the first part (first and second session) a series of guided and semi-guided activities drive participants to familiarize with corpora through the use of a digital tools kit. Then, during the second part, participants are specifically involved in the realization of a ML (Mistakes Laboratory) where they create, develop and share digital activities according to their teaching goals with the use of corpora, supported by the digital facilitator. The training course takes place into an ICT laboratory where the teachers work either individually or in pairs, with a computer connected to a wi-fi connection, while the digital facilitator shares inputs, materials and digital assistance simultaneously on a whiteboard and on a digital platform where participants interact and work together both synchronically and diachronically. The adoption of good ICT practices is a fundamental step to promote the introduction and use of Corpus Linguistics in EFL teaching and learning processes, in fact dealing with corpora not only promotes L2 learners’ critical thinking and orienteering versus wild browsing when they are looking for ready-made translations or language usage samples, but it also entails becoming confident with digital tools and activities. The paper will explain reasons, limits and resources of the pedagogical approach adopted to engage EFL teachers with the use of corpora in their didactics through the promotion of digital practices.Keywords: digital didactics, education, language learning, teacher training
Procedia PDF Downloads 1551263 Exploring the Use of Adverbs in Two Young Learners Written Corpora
Authors: Chrysanthi S. Tiliakou, Katerina T. Frantzi
Abstract:
Writing has always been considered a most demanding skill for English as a Foreign Language learners as well as for native speakers. Novice foreign language writers are asked to handle a limited range of vocabulary to produce writing tasks at lower levels. Adverbs are the parts of speech that are not used extensively in the early stages of English as a Foreign Language writing. An additional problem with learning new adverbs is that, next to learning their meanings, learners are expected to acquire the proper placement of adverbs in a sentence. The use of adverbs is important as they enhance “expressive richness to one’s message”. By exploring the patterns of use of adverbs, researchers and educators can identify types of adverbs, which appear more taxing for young learners or that puzzle novice English as a Foreign Language writers with their placement, and focus on their teaching. To this end, the study examines the use of adverbs on two written Corpora of young learners of English of A1 – A2 levels and determines the types of adverbs used, their frequencies, problems in their use, and whether there is any differentiation between levels. The Antconc concordancing tool was used for the Greek Learner Corpus, and the Corpuscle concordancing tool for the Norwegian Corpus. The research found a similarity in the normalized frequencies of the adverbs used in the A1-A2 level Greek Learner Corpus with the frequencies of the same adverbs in the Norwegian Learner Corpus.Keywords: learner corpora, young learners, writing, use of adverbs
Procedia PDF Downloads 931262 Dynamic Analysis of Offshore 2-HUS/U Parallel Platform
Authors: Xie Kefeng, Zhang He
Abstract:
For the stability and control demand of offshore small floating platform, a 2-HUS/U parallel mechanism was presented as offshore platform. Inverse kinematics was obtained by institutional constraint equation, and the dynamic model of offshore 2-HUS/U parallel platform was derived based on rigid body’s Lagrangian method. The equivalent moment of inertia, damping and driving force/torque variation of offshore 2-HUS/U parallel platform were analyzed. A numerical example shows that, for parallel platform of given motion, system’s equivalent inertia changes 1.25 times maximally. During the movement of platform, they change dramatically with the system configuration and have coupling characteristics. The maximum equivalent drive torque is 800 N. At the same time, the curve of platform’s driving force/torque is smooth and has good sine features. The control system needs to be adjusted according to kinetic equation during stability and control and it provides a basis for the optimization of control system.Keywords: 2-HUS/U platform, dynamics, Lagrange, parallel platform
Procedia PDF Downloads 3451261 Designing a Robust Controller for a 6 Linkage Robot
Authors: G. Khamooshian
Abstract:
One of the main points of application of the mechanisms of the series and parallel is the subject of managing them. The control of this mechanism and similar mechanisms is one that has always been the intention of the scholars. On the other hand, modeling the behavior of the system is difficult due to the large number of its parameters, and it leads to complex equations that are difficult to solve and eventually difficult to control. In this paper, a six-linkage robot has been presented that could be used in different areas such as medical robots. Using these robots needs a robust control. In this paper, the system equations are first found, and then the system conversion function is written. A new controller has been designed for this robot which could be used in other parallel robots and could be very useful. Parallel robots are so important in robotics because of their stability, so methods for control of them are important and the robust controller, especially in parallel robots, makes a sense.Keywords: 3-RRS, 6 linkage, parallel robot, control
Procedia PDF Downloads 1601260 Parallel Querying of Distributed Ontologies with Shared Vocabulary
Authors: Sharjeel Aslam, Vassil Vassilev, Karim Ouazzane
Abstract:
Ontologies and various semantic repositories became a convenient approach for implementing model-driven architectures of distributed systems on the Web. SPARQL is the standard query language for querying such. However, although SPARQL is well-established standard for querying semantic repositories in RDF and OWL format and there are commonly used APIs which supports it, like Jena for Java, its parallel option is not incorporated in them. This article presents a complete framework consisting of an object algebra for parallel RDF and an index-based implementation of the parallel query engine capable of dealing with the distributed RDF ontologies which share common vocabulary. It has been implemented in Java, and for validation of the algorithms has been applied to the problem of organizing virtual exhibitions on the Web.Keywords: distributed ontologies, parallel querying, semantic indexing, shared vocabulary, SPARQL
Procedia PDF Downloads 2051259 English for Academic and Specific Purposes: A Corpus-Informed Approach to Designing Vocabulary Teaching Materials
Authors: Said Ahmed Zohairy
Abstract:
Significant shifts in the theory and practice of teaching vocabulary affect teachers’ decisions about learning materials’ design. Relevant literature supports teaching specialised, authentic, and multi-word lexical items rather than focusing on single-word vocabulary lists. Corpora, collections of texts stored in a database, presents a reliable source of teaching and learning materials. Although corpus-informed studies provided guidance for teachers to identify useful language chunks and phraseological units, there is a scarcity in the literature discussing the use of corpora in teaching English for academic and specific purposes (EASP). The aim of this study is to improve teaching practices and provide a description of the pedagogical choices and procedures of an EASP tutor in an attempt to offer guidance for novice corpus users. It draws on the researcher’s experience of utilising corpus linguistic tools to design vocabulary learning activities without focusing on students’ learning outcomes. Hence, it adopts a self-study research methodology which is based on five methodological components suggested by other self-study researchers. The findings of the study noted that designing specialised and corpus-informed vocabulary learning activities could be challenging for teachers, as they require technical knowledge of how to navigate corpora and utilise corpus analysis tools. Findings also include a description of the researcher’s approach to building and analysing a specialised corpus for the benefit of novice corpus users; they should be able to start their own journey of designing corpus-based activities.Keywords: corpora, corpus linguistics, corpus-informed, English for academic and specific purposes, agribusiness, vocabulary, phraseological units, materials design
Procedia PDF Downloads 271258 A Corpus Output Error Analysis of Chinese L2 Learners From America, Myanmar, and Singapore
Authors: Qiao-Yu Warren Cai
Abstract:
Due to the rise of big data, building corpora and using them to analyze ChineseL2 learners’ language output has become a trend. Various empirical research has been conducted using Chinese corpora built by different academic institutes. However, most of the research analyzed the data in the Chinese corpora usingcorpus-based qualitative content analysis with descriptive statistics. Descriptive statistics can be used to make summations about the subjects or samples that research has actually measured to describe the numerical data, but the collected data cannot be generalized to the population. Comte, a Frenchpositivist, has argued since the 19th century that human beings’ knowledge, whether the discipline is humanistic and social science or natural science, should be verified in a scientific way to construct a universal theory to explain the truth and human beings behaviors. Inferential statistics, able to make judgments of the probability of a difference observed between groups being dependable or caused by chance (Free Geography Notes, 2015)and to infer from the subjects or examples what the population might think or behave, is just the right method to support Comte’s argument in the field of TCSOL. Also, inferential statistics is a core of quantitative research, but little research has been conducted by combing corpora with inferential statistics. Little research analyzes the differences in Chinese L2 learners’ language corpus output errors by using theOne-way ANOVA so that the findings of previous research are limited to inferring the population's Chinese errors according to the given samples’ Chinese corpora. To fill this knowledge gap in the professional development of Taiwanese TCSOL, the present study aims to utilize the One-way ANOVA to analyze corpus output errors of Chinese L2 learners from America, Myanmar, and Singapore. The results show that no significant difference exists in ‘shì (是) sentence’ and word order errors, but compared with Americans and Singaporeans, it is significantly easier for Myanmar to have ‘sentence blends.’ Based on the above results, the present study provides an instructional approach and contributes to further exploration of how Chinese L2 learners can have (and use) learning strategies to lower errors.Keywords: Chinese corpus, error analysis, one-way analysis of variance, Chinese L2 learners, Americans, myanmar, Singaporeans
Procedia PDF Downloads 1071257 Parallel Coordinates on a Spiral Surface for Visualizing High-Dimensional Data
Authors: Chris Suma, Yingcai Xiao
Abstract:
This paper presents Parallel Coordinates on a Spiral Surface (PCoSS), a parallel coordinate based interactive visualization method for high-dimensional data, and a test implementation of the method. Plots generated by the test system are compared with those generated by XDAT, a software implementing traditional parallel coordinates. Traditional parallel coordinate plots can be cluttered when the number of data points is large or when the dimensionality of the data is high. PCoSS plots display multivariate data on a 3D spiral surface and allow users to see the whole picture of high-dimensional data with less cluttering. Taking advantage of the 3D display environment in PCoSS, users can further reduce cluttering by zooming into an axis of interest for a closer view or by moving vantage points and by reorienting the viewing angle to obtain a desired view of the plots.Keywords: human computer interaction, parallel coordinates, spiral surface, visualization
Procedia PDF Downloads 141256 Parallel Random Number Generation for the Modern Supercomputer Architectures
Authors: Roman Snytsar
Abstract:
Pseudo-random numbers are often used in scientific computing such as the Monte Carlo Simulations or the Quantum Inspired Optimization. Requirements for a parallel random number generator running in the modern multi-core vector environment are more stringent than those for sequential random number generators. As well as passing the usual quality tests, the output of the parallel random number generator must be verifiable and reproducible throughout the concurrent execution. We propose a family of vectorized Permuted Congruential Generators. Implementations are available for multiple modern vector modern computer architectures. Besides demonstrating good single core performance, the generators scale easily across many processor cores and multiple distributed nodes. We provide performance and parallel speedup analysis and comparisons between the implementations.Keywords: pseudo-random numbers, quantum optimization, SIMD, parallel computing
Procedia PDF Downloads 1201255 Classification Rule Discovery by Using Parallel Ant Colony Optimization
Authors: Waseem Shahzad, Ayesha Tahir Khan, Hamid Hussain Awan
Abstract:
Ant-Miner algorithm that lies under ACO algorithms is used to extract knowledge from data in the form of rules. A variant of Ant-Miner algorithm named as cAnt-MinerPB is used to generate list of rules using pittsburgh approach in order to maintain the rule interaction among the rules that are generated. In this paper, we propose a parallel Ant MinerPB in which Ant colony optimization algorithm runs parallel. In this technique, a data set is divided vertically (i-e attributes) into different subsets. These subsets are created based on the correlation among attributes using Mutual Information (MI). It generates rules in a parallel manner and then merged to form a final list of rules. The results have shown that the proposed technique achieved higher accuracy when compared with original cAnt-MinerPB and also the execution time has also reduced.Keywords: ant colony optimization, parallel Ant-MinerPB, vertical partitioning, classification rule discovery
Procedia PDF Downloads 2961254 Pushing the Boundary of Parallel Tractability for Ontology Materialization via Boolean Circuits
Authors: Zhangquan Zhou, Guilin Qi
Abstract:
Materialization is an important reasoning service for applications built on the Web Ontology Language (OWL). To make materialization efficient in practice, current research focuses on deciding tractability of an ontology language and designing parallel reasoning algorithms. However, some well-known large-scale ontologies, such as YAGO, have been shown to have good performance for parallel reasoning, but they are expressed in ontology languages that are not parallelly tractable, i.e., the reasoning is inherently sequential in the worst case. This motivates us to study the problem of parallel tractability of ontology materialization from a theoretical perspective. That is we aim to identify the ontologies for which materialization is parallelly tractable, i.e., in the NC complexity. Since the NC complexity is defined based on Boolean circuit that is widely used to investigate parallel computing problems, we first transform the problem of materialization to evaluation of Boolean circuits, and then study the problem of parallel tractability based on circuits. In this work, we focus on datalog rewritable ontology languages. We use Boolean circuits to identify two classes of datalog rewritable ontologies (called parallelly tractable classes) such that materialization over them is parallelly tractable. We further investigate the parallel tractability of materialization of a datalog rewritable OWL fragment DHL (Description Horn Logic). Based on the above results, we analyze real-world datasets and show that many ontologies expressed in DHL belong to the parallelly tractable classes.Keywords: ontology materialization, parallel reasoning, datalog, Boolean circuit
Procedia PDF Downloads 271