Search results for: Corpus of Spoken Lithuanian
128 Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development
Authors: L. Kamandulytė-Merfeldienė
Abstract:
The paper deals with the main issues of methodology of the Corpus of Spoken Lithuanian which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances.
Keywords: CHILDES, Corpus of Spoken Lithuanian, grammatical annotation, grammatical disambiguation, lexicon, Lithuanian.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 948127 Knowledge Acquisition for the Construction of an Evolving Ontology: Application to Augmented Surgery
Authors: Nora Taleb, Sellami Mokhtar, Michel Simonet
Abstract:
This work concerns the evolution and the maintenance of an ontological resource in relation with the evolution of the corpus of texts from which it had been built. The knowledge forming a text corpus, especially in dynamic domains, is in continuous evolution. When a change in the corpus occurs, the domain ontology must evolve accordingly. Most methods manage ontology evolution independently from the corpus from which it is built; in addition, they treat evolution just as a process of knowledge addition, not considering other knowledge changes. We propose a methodology for managing an evolving ontology from a text corpus that evolves over time, while preserving the consistency and the persistence of this ontology. Our methodology is based on the changes made on the corpus to reflect the evolution of the considered domain - augmented surgery in our case. In this context, the results of text mining techniques, as well as the ARCHONTE method slightly modified, are used to support the evolution process.Keywords: Corpus, Evolution, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1443126 Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: 'Reddit'
Authors: Yasmeen Bassas, Sandra Kuebler, Allen Riddell
Abstract:
Native Language Identification is one of the growing subfields in Natural Language Processing (NLP). The task of Native Language Identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL) and then the trained models are evaluated on a different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and Logistic Regression. Results show that content-based features are more accurate and robust than content independent ones when tested within corpus and across corpus.
Keywords: NLI, NLP, content-based features, content independent features, social media corpus, ML.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 414125 Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English
Authors: Valdênia Carvalho e Almeida
Abstract:
The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.
Keywords: Applied linguistics, corpus linguistics, chemistry, research article, semantic preference.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1364124 Specialized Translation Teaching Strategies: A Corpus-Based Approach
Authors: Yingying Ding
Abstract:
This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice.
Keywords: Corpus-based approach, translation teaching, specialized translation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269123 Comparison of Parameterization Methods in Recognizing Spoken Arabic Digits
Authors: Ali Ganoun
Abstract:
This paper proposes evaluation of sound parameterization methods in recognizing some spoken Arabic words, namely digits from zero to nine. Each isolated spoken word is represented by a single template based on a specific recognition feature, and the recognition is based on the Euclidean distance from those templates. The performance analysis of recognition is based on four parameterization features: the Burg Spectrum Analysis, the Walsh Spectrum Analysis, the Thomson Multitaper Spectrum Analysis and the Mel Frequency Cepstral Coefficients (MFCC) features. The main aim of this paper was to compare, analyze, and discuss the outcomes of spoken Arabic digits recognition systems based on the selected recognition features. The results acqired confirm that the use of MFCC features is a very promising method in recognizing Spoken Arabic digits.
Keywords: Speech Recognition, Spectrum Analysis, Burg Spectrum, Walsh Spectrum Analysis, Thomson Multitaper Spectrum, MFCC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1593122 Corporate Cautionary Statement: A Genre of Professional Communication
Authors: Chie Urawa
Abstract:
Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impressions. This study compares the language of cautionary statements using two corpora, Sony’s cautionary statement corpus (S-corpus) and Panasonic’s cautionary statement corpus (P-corpus), illustrating the differences and similarities in relation to the use of meaningful cautionary statements and critically analyzing why practitioners use the way. The findings describe the distinct differences between the two companies in the presentation of the risk factors and the way how they make the statements. The word ability is used more for legal protection in S-corpus whereas the word possibility is used more to convey a better impression in P-corpus. The main similarities are identified in the use of lexical words and pronouns, and almost the same wordings for eight years. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication. Important implications of this study are that more comprehensive approach can be applied in other contexts, and be used by companies to reflect upon their cautionary statements.
Keywords: Cautionary statements, corporate annual reports, corpus, risk factors.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 867121 A Corpus-Based Study on the Styles of Three Translators
Authors: Wang Yunhong
Abstract:
The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator.
Keywords: Corpus, lexical choices, sentence characteristics, style.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 709120 A Web-Based Self-Learning Grammar for Spoken Language Understanding
Authors: S. M. Biondi, V. Catania, R. Di Natale, A. R. Intilisano, D. Panno
Abstract:
One of the major goals of Spoken Dialog Systems (SDS) is to understand what the user utters. In the SDS domain, the Spoken Language Understanding (SLU) Module classifies user utterances by means of a pre-definite conceptual knowledge. The SLU module is able to recognize only the meaning previously included in its knowledge base. Due the vastity of that knowledge, the information storing is a very expensive process. Updating and managing the knowledge base are time-consuming and error-prone processes because of the rapidly growing number of entities like proper nouns and domain-specific nouns. This paper proposes a solution to the problem of Name Entity Recognition (NER) applied to a SDS domain. The proposed solution attempts to automatically recognize the meaning associated with an utterance by using the PANKOW (Pattern based Annotation through Knowledge On the Web) method at runtime. The method being proposed extracts information from the Web to increase the SLU knowledge module and reduces the development effort. In particular, the Google Search Engine is used to extract information from the Facebook social network.
Keywords: Spoken Dialog System, Spoken Language Understanding, Web Semantic, Name Entity Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1776119 Assamese Numeral Corpus for Speech Recognition using Cooperative ANN Architecture
Authors: Mousmita Sarma, Krishna Dutta, Kandarpa Kumar Sarma
Abstract:
Speech corpus is one of the major components in a Speech Processing System where one of the primary requirements is to recognize an input sample. The quality and details captured in speech corpus directly affects the precision of recognition. The current work proposes a platform for speech corpus generation using an adaptive LMS filter and LPC cepstrum, as a part of an ANN based Speech Recognition System which is exclusively designed to recognize isolated numerals of Assamese language- a major language in the North Eastern part of India. The work focuses on designing an optimal feature extraction block and a few ANN based cooperative architectures so that the performance of the Speech Recognition System can be improved.Keywords: Filter, Feature, LMS, LPC, Cepstrum, ANN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2385118 Saudi Twitter Corpus for Sentiment Analysis
Authors: Adel Assiri, Ahmed Emam, Hmood Al-Dossari
Abstract:
Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus.Keywords: Arabic, Sentiment Analysis, Twitter, annotation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4044117 Words of Peace in the Speeches of the Egyptian President, Abdulfattah El-Sisi: A Corpus-Based Study
Authors: Mohamed S. Negm, Waleed S. Mandour
Abstract:
The present study aims primarily at investigating words of peace (lexemes of peace) in the formal speeches of the Egyptian president Abdulfattah El-Sisi in a two-year span of time, from 2018 to 2019. This paper attempts to shed light not only on the contextual use of the antonyms, war and peace, but also it underpins quantitative analysis through the current methods of corpus linguistics. As such, the researchers have deployed a corpus-based approach in collecting, encoding, and processing 30 presidential speeches over the stated period (23,411 words and 25,541 tokens in total). Further, semantic fields and collocational networkzs are identified and compared statistically. Results have shown a significant propensity of adopting peace, including its relevant collocation network, textually and therefore, ideationally, at the expense of war concept which in most cases surfaces euphemistically through the noun conflict. The president has not justified the action of war with an honorable cause or a valid reason. Such results, so far, have indicated a positive sociopolitical mindset the Egyptian president possesses and moreover, reveal national and international fair dealing on arising issues.
Keywords: Corpus-assisted discourse studies, critical discourse analysis, collocation network, corpus linguistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1629116 Innovation Policy and Development of Creative Industries: Case Study of Lithuanian Animation Industry
Authors: Tomas Mitkus, Vaida Nedzinskaitė-Mitkė
Abstract:
The objective of this study is to identify and explore how adequate is modern innovation support mechanism to developed creative industries. We argue that current development and support strategy for creative industries, although acknowledge high correlation between innovation and creativity, do not seek to improve conditions to promote systematic innovation development in the creative sector. Using the Lithuanian animation industry as a case study, this paper will examine innovation contribution to creativity and, for that matter, the competitiveness of animation enterprises. This paper proposes insights that contribute to theoretical and practical discussions on how creative profile companies build national and international competitiveness through innovations. The conclusions suggest that development of creative industries could greatly benefit if policymakers would implement tools that would encourage creative profile enterprises to invest in to development of innovation at a constant rate.
Keywords: Creative industries, animation, innovation, innovation policy, management.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1031115 OPEN_EmoRec_II- A Multimodal Corpus of Human-Computer Interaction
Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue
Abstract:
OPEN_EmoRec_II is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (facial reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes*. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations.Keywords: Open multimodal emotion corpus, annotated labels.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1820114 OPEN_EmoRec_II- A Multimodal Corpus of Human-Computer Interaction
Authors: Stefanie Rukavina, Sascha Gruss, Steffen Walter, Holger Hoffmann, Harald C. Traue
Abstract:
OPEN_EmoRec_II is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (facial reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes*. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and facial reactions annotations.Keywords: Open multimodal emotion corpus, annotated labels.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 389113 Sperm Production Rate, Gonadal and Extragonadal Sperm Reserves in the Sokoto Red (Maradi) Buck in a Tropical Environment
Authors: Immanuel I. Bitto, Thomas Agam
Abstract:
28 healthy adult Maradi bucks were used to evaluate sperm production and sperm storage capacity in the breed. Daily sperm production (DSP) averaged 0.55±0.05x109, while the daily sperm production/g (DSP/g) was 1.37±0.12 x107. Gonadal sperm reserve was 1.99±0.18 x109, while the caput, upper corpus and lower corpus averaged 0.58±0.04 x109, 0.36±0.02 x109 and 0.33±0.08 x109 respectively. The proximal cauda, mid cauda, distal cauda and ductus deferens had values of 0.68±0.10 x109, 1.23±0.16 x109,1.87±0. x109and 0.17±0.05 x109 respectively. The relative contributions of the respective epididymal sections and ductus deferens to the total extragonadal sperm reserves were, 11.11%, 6.89%, 6.32%, 13.03%, 23.56%, 35.82% and 3.26% respectively. Gonadal sperm reserves were significantly higher (p<0.05) than caput reserves, upper corpus reserves, lower corpus reserves, proximal cauda reserves and ductus deferens reserves. Gonadal reserves were however similar (p>0.05) to mid cauda and distal cauda epididymal reserves.Keywords: Goats, Reserves, Sperm, Tropics
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1796112 Unit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis
Authors: Mohamed Ali KAMMOUN, Ahmed Ben HAMIDA
Abstract:
In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenation cost were classically defined for unit selection. In Corpus-Based Speech Synthesis System, when using large text corpora, cost functions were limited to a juxtaposition of symbolic criteria and the acoustic information of units is not exploited in the definition of the target cost. In this manuscript, we token in our consideration the unit phonetic information corresponding to acoustic information. This would be realized by defining a probabilistic linguistic Bi-grams model basically used for unit selection. The selected units would be extracted from the English TIMIT corpora.Keywords: Unit selection, Corpus-based Speech Synthesis, Bigram model
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1441111 Corpus-Assisted Study of Gender Related Tiger Metaphors in the Chinese Context
Authors: Na Xiao
Abstract:
Animal metaphors have many different connotations, ranging from loving emotions to derogatory epithets, but gender expressions using animal metaphors are often imbalanced. Generally, animal metaphors related to females tend to be negative. Little known about the reasons for the negative expressions of animal female metaphors in Chinese contexts still have not been quantified. The study was based on the conceptual metaphor theory, and it used the Modern Chinese Corpus at the Center for Chinese Linguistics at Peking University (CCL Corpus) as a database, which identified the influencing variables of gender differences in the description of animal metaphors mapping humans in the Chinese context by observing the percentage of "tiger" metaphor. This study has proved that the tiger metaphors associated with humans in the Chinese context tend to be negative. Importantly, this study has also shown that the proportion of tiger metaphorical idioms that are related to women is very high. This finding can be used as crucial information for future studies on other gender-related animal metaphorical idioms and can offer additional insights for understanding trends in other animal metaphors.
Keywords: Chinese, CCL Corpus, gender differences, metaphorical idioms, tigers.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 266110 Ultrasonic Assessment of Corpora Lutea and Plasma Progesterone Levels in Early Pregnant and Non Pregnant Cows
Authors: Abdurraouf Gaja, Salah Al-Dahash, Guru Solmon Raju, Chikara Kubota
Abstract:
Corpus luteum cross sectional (by ultrasonography) and plasma progesterone (by DELFIA) were estimated in early pregnant and non pregnant cows on days 14th and 20th to 23rd post insemination. On day 14th, corpus luteum sectional area was 348.43 mm2 in pregnant and 387.84mm2 in non pregnant cows. Within days 20th to 23rd, corpus luteum sectional area ranged between 342.06 and 367.90 mm2 in pregnant and between 193.85 and 270.69 mm2 in non pregnant cows. Plasma progesterone level was 2.43 ng/ml in pregnant and 2.46 ng/ml in non pregnant cows on day 14th, while during days 20th to 23rd the level ranged between 2.47 and 2.84 ng/ml in pregnant and between 0.53 and 1.17 ng/ml in non pregnant cows. Results of both luteal tissue areas as well as plasma progesterone levels were highly significantly deferent (P<0.01) between pregnant and non pregnant cows during days 20th to 23rd, but there were no significant differences on day 14th. The correlation between CL cross sectional area and plasma progesterone level was 0.4 in pregnant cows and 0.99 in non pregnant cow. It is clear, from this study, that ultrasonic assessment of corpora lutea is a viable alternative to determine plasma progesterone levels for early pregnancy diagnosis in cows.
Keywords: Progesterone, ultrasonography, corpus luteum, pregnancy diagnosis, cow.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1835109 The Analysis of Regulation on Sustainability in Financial Sector in Lithuania
Authors: D. Kubiliute
Abstract:
The Republic of Lithuania is known as a trusted location for global business institutions and it attracts investors with its competitive environment for financial service providers. Along with the aspiration to offer a strong results-oriented and innovations-driven environment for financial service providers, Lithuanian regulatory authorities consistently implement the European Union's high regulatory standards for financial activities including sustainability-related disclosures. Since the European Union directed its policy towards transition to a climate-neutral, green, competitive and inclusive economy, additional regulatory requirements for financial market participants are adopted: disclosure of sustainable activities, transparency, prevention of greenwashing, and other. The financial sector is one of the key factors influencing the implementation of sustainability objectives in the European Union policies and mitigating the negative effects of climate change – public funds are not enough to make a significant impact on sustainable investments, therefore directing public and private capital to green projects may help to finance the necessary changes. The topic of the study is original and has not yet been widely analyzed in Lithuanian legal discourse. There are used quantitative and qualitative methodologies, logical, systematic and critical analysis principles, hence the aim of this study is to reveal the problematic of the implementation of regulation on sustainability in the Lithuanian financial sector. Additional regulatory requirements could cause serious changes in financial business operations: additional funds, employees and time have to be dedicated in order the companies could implement these regulations. Lack of knowledge and data on how to implement new regulatory requirements towards sustainable reporting causes a lot of uncertainty for financial market participants. And for some companies it might even be an essential point in terms of business continuity. It is considered that the supervisory authorities should find a balance between financial market needs and legal regulation.
Keywords: Financial, market participant, legal, regulation, sustainability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 218108 Online Multilingual Dictionary Using Hamburg Notation for Avatar-Based Indian Sign Language Generation System
Authors: Sugandhi, Parteek Kumar, Sanmeet Kaur
Abstract:
Sign Language (SL) is used by deaf and other people who cannot speak but can hear or have a problem with spoken languages due to some disability. It is a visual gesture language that makes use of either one hand or both hands, arms, face, body to convey meanings and thoughts. SL automation system is an effective way which provides an interface to communicate with normal people using a computer. In this paper, an avatar based dictionary has been proposed for text to Indian Sign Language (ISL) generation system. This research work will also depict a literature review on SL corpus available for various SL s over the years. For ISL generation system, a written form of SL is required and there are certain techniques available for writing the SL. The system uses Hamburg sign language Notation System (HamNoSys) and Signing Gesture Mark-up Language (SiGML) for ISL generation. It is developed in PHP using Web Graphics Library (WebGL) technology for 3D avatar animation. A multilingual ISL dictionary is developed using HamNoSys for both English and Hindi Language. This dictionary will be used as a database to associate signs with words or phrases of a spoken language. It provides an interface for admin panel to manage the dictionary, i.e., modification, addition, or deletion of a word. Through this interface, HamNoSys can be developed and stored in a database and these notations can be converted into its corresponding SiGML file manually. The system takes natural language input sentence in English and Hindi language and generate 3D sign animation using an avatar. SL generation systems have potential applications in many domains such as healthcare sector, media, educational institutes, commercial sectors, transportation services etc. This research work will help the researchers to understand various techniques used for writing SL and generation of Sign Language systems.
Keywords: Avatar, dictionary, HamNoSys, hearing-impaired, Indian Sign Language, sign language.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1353107 Redundancy in Malay Morphology: School Grammar versus Corpus Grammar
Authors: Zaharani Ahmad, Nor Hashimah Jalaluddin
Abstract:
The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers.
Keywords: Corpus grammar, morphology, redundancy, school grammar.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1791106 Monitoring Co-Creation: A Survey of Lithuanian Urban Communities
Authors: Aelita Skarzauskiene, Monika Maciuliene
Abstract:
In this paper, we conduct a systematic survey of urban communities in Lithuania to evaluate their potential to co-create collective intelligence or “civic intelligence” applying Digital Co-creation Index methodology that includes different socio-technological indicators. Civic intelligence is a form of collective intelligence that refers to the group’s capacity to perceive societal problems and to address them effectively. The research focuses on evaluation of diverse organizational designs that increase efficient collective performance. The current scientific project advanced the state of the art by evaluating the basic preconditions in the urban communities through which the collective intelligence is being co-created under the systemic manner. The research subject is the “bottom up” digital enabled urban platforms, initiated by Lithuanian public organizations, civic movements or business entities. The web-based monitoring results obtained by applying a social indices calculation methodology and Pearson correlation analysis provided the information about the potential and limits of the urban communities and what possible changes need to be implemented to overcome the limitations.
Keywords: Computer supported collaboration, co-creation, collective intelligence, socio-technological system, networked society.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 741105 Semantic Indexing Approach of a Corpora Based On Ontology
Authors: Mohammed Erritali
Abstract:
The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. This paper presents a new semantic indexing approach of a documentary corpus. The indexing process starts first by a term weighting phase to determine the importance of these terms in the documents. Then the use of a thesaurus like Wordnet allows moving to the conceptual level. Each candidate concept is evaluated by determining its level of representation of the document, that is to say, the importance of the concept in relation to other concepts of the document. Finally, the semantic index is constructed by attaching to each concept of the ontology, the documents of the corpus in which these concepts are found.Keywords: Semantic, indexing, corpora, WordNet, ontology.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1368104 Information Retrieval: A Comparative Study of Textual Indexing Using an Oriented Object Database (db4o) and the Inverted File
Authors: Mohammed Erritali
Abstract:
The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word... In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.
Keywords: Information Retrieval, indexation, oriented object database (db4o), inverted file.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1734103 Choice Experiment Approach on Evaluation of Non-Market Farming System Outputs: First Results from Lithuanian Case Study
Authors: A. Novikova, L. Rocchi, G. Startiene
Abstract:
Market and non-market outputs are produced jointly in agriculture. Their supply depends on the intensity and type of production. The role of agriculture as an economic activity and its effects are important for the Lithuanian case study, as agricultural land covers more than a half of country. Positive and negative externalities, created in agriculture are not considered in the market. Therefore, specific techniques such as stated preferences methods, in particular choice experiments (CE) are used for evaluation of non-market outputs in agriculture. The main aim of this paper is to present construction of the research path for evaluation of non-market farming system outputs in Lithuania. The conventional and organic farming, covering crops (including both cereal and industrial crops) and livestock (including dairy and cattle) production has been selected. The CE method and nested logit (NL) model were selected as appropriate for evaluation of non-market outputs of different farming systems in Lithuania. A pilot survey was implemented between October–November 2018, in order to test and improve the CE questionnaire. The results of the survey showed that the questionnaire is accepted and well understood by the respondents. The econometric modelling showed that the selected NL model could be used for the main survey. The understanding of the differences between organic and conventional farming by residents was identified. It was revealed that they are more willing to choose organic farming in comparison to conventional farming.
Keywords: Choice experiments, farming system, Lithuania market outputs, non-market outputs.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 600102 Slovenian Text-to-Speech Synthesis for Speech User Interfaces
Authors: Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešić, Mario Žganec, Stanislav Gruden
Abstract:
The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.Keywords: text-to-speech synthesis, prosody modeling, speech user interface.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1457101 On Developing an Automatic Speech Recognition System for Standard Arabic Language
Authors: R. Walha, F. Drira, H. El-Abed, A. M. Alimi
Abstract:
The Automatic Speech Recognition (ASR) applied to Arabic language is a challenging task. This is mainly related to the language specificities which make the researchers facing multiple difficulties such as the insufficient linguistic resources and the very limited number of available transcribed Arabic speech corpora. In this paper, we are interested in the development of a HMM-based ASR system for Standard Arabic (SA) language. Our fundamental research goal is to select the most appropriate acoustic parameters describing each audio frame, acoustic models and speech recognition unit. To achieve this purpose, we analyze the effect of varying frame windowing (size and period), acoustic parameter number resulting from features extraction methods traditionally used in ASR, speech recognition unit, Gaussian number per HMM state and number of embedded re-estimations of the Baum-Welch Algorithm. To evaluate the proposed ASR system, a multi-speaker SA connected-digits corpus is collected, transcribed and used throughout all experiments. A further evaluation is conducted on a speaker-independent continue SA speech corpus. The phonemes recognition rate is 94.02% which is relatively high when comparing it with another ASR system evaluated on the same corpus.Keywords: ASR, HMM, acoustical analysis, acoustic modeling, Standard Arabic language
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1778100 Sentence Modality Recognition in French based on Prosody
Authors: Pavel Král, Jana Klečková, Christophe Cerisara
Abstract:
This paper deals with automatic sentence modality recognition in French. In this work, only prosodic features are considered. The sentences are recognized according to the three following modalities: declarative, interrogative and exclamatory sentences. This information will be used to animate a talking head for deaf and hearing-impaired children. We first statistically study a real radio corpus in order to assess the feasibility of the automatic modeling of sentence types. Then, we test two sets of prosodic features as well as two different classifiers and their combination. We further focus our attention on questions recognition, as this modality is certainly the most important one for the target application.Keywords: Automatic sentences modality recognition (ASMR), fundamental frequency (F0), energy, modal corpus, prosody.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167999 Mining News Sites to Create Special Domain News Collections
Authors: David B. Bracewell, Fuji Ren, Shingo Kuroiwa
Abstract:
We present a method to create special domain collections from news sites. The method only requires a single sample article as a seed. No prior corpus statistics are needed and the method is applicable to multiple languages. We examine various similarity measures and the creation of document collections for English and Japanese. The main contributions are as follows. First, the algorithm can build special domain collections from as little as one sample document. Second, unlike other algorithms it does not require a second “general" corpus to compute statistics. Third, in our testing the algorithm outperformed others in creating collections made up of highly relevant articles.Keywords: Information Retrieval, News, Special DomainCollections,
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1487