Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 14

Search results for: adverbs

14 Exploring the Use of Adverbs in Two Young Learners Written Corpora

Authors: Chrysanthi S. Tiliakou, Katerina T. Frantzi

Abstract:

Writing has always been considered a most demanding skill for English as a Foreign Language learners as well as for native speakers. Novice foreign language writers are asked to handle a limited range of vocabulary to produce writing tasks at lower levels. Adverbs are the parts of speech that are not used extensively in the early stages of English as a Foreign Language writing. An additional problem with learning new adverbs is that, next to learning their meanings, learners are expected to acquire the proper placement of adverbs in a sentence. The use of adverbs is important as they enhance “expressive richness to one’s message”. By exploring the patterns of use of adverbs, researchers and educators can identify types of adverbs, which appear more taxing for young learners or that puzzle novice English as a Foreign Language writers with their placement, and focus on their teaching. To this end, the study examines the use of adverbs on two written Corpora of young learners of English of A1 – A2 levels and determines the types of adverbs used, their frequencies, problems in their use, and whether there is any differentiation between levels. The Antconc concordancing tool was used for the Greek Learner Corpus, and the Corpuscle concordancing tool for the Norwegian Corpus. The research found a similarity in the normalized frequencies of the adverbs used in the A1-A2 level Greek Learner Corpus with the frequencies of the same adverbs in the Norwegian Learner Corpus.

Keywords: learner corpora, young learners, writing, use of adverbs

Procedia PDF Downloads 92

13 The Effect of Phonetics Factors in Interpretation of Japanese Degree Adverbs

Authors: Yan Lyu

Abstract:

Japanese degree adverbs can be explained in different ways, which is hard for Japanese learners to comprehend. For instance, when ‘tyotto’ is used as a degree word, it can be interpreted literally or not. In the sentence ‘Ano mise, tyotto oishi yo. zehi iku to ii yo.’, ‘tyotto’ can be interpreted as a high degree contextually. Despite pragmatic factors, phonetics factors can also affect the interpretation of such ‘tyotto’. Concentrating on the pattern of ‘tyotto +adjective’, the paper aims to investigate the correlation between the interpretation of ‘tyotto’ and the phonetic factors in some specific contexts based on a listening experiment via PRAAT. It is also investigated that how the phonetic factors affect the interpretation of high degree adverbs, including ‘soutou’ , ‘totemo’ , ‘kanari’ and ‘sugoku’. In the experiment, Japanese speakers listened to sentences which were composed of degree adverbs and adjectives in different intonations and judged which degree the sentences expressed. Two conclusions can be drawn from the experiment results. Firstly, for adverbs expressing a high degree, in the pattern of ‘degree adverb + adjective’, either degree adverb or adjective is pronounced in a higher pitch, or both are highly pronounced, a higher degree can be expressed. Besides, with the insertion of geminate consonant and the extension of the vowel, the longer the duration of the degree adverb becomes, the higher degree can be expressed. Secondly, for ‘tyotto’, which expresses a low degree, the interpretation will be influenced by both phonetic and contextual factors. Phonetically, there are three factors causing ‘tyotto’ to be interpreted as a common degree or a high degree. The three factors are the high pitch of the modified adjective, the extended silence period of the geminate consonant and the change in the intonations of ‘tyotto’. In some contexts just like the comparison sentences, no matter how ‘tyotto + adjective’ is pronounced, ‘tyotto’ tends to be interpreted as a low degree literally.

Keywords: contextual interpretation, Japanese degree adverbs, phonetic interpretation, PRAAT

Procedia PDF Downloads 268

12 A Relational Approach to Adverb Use in Interactions

Authors: Guillaume P. Fernandez

Abstract:

Individual language use is a matter of choice in particular interactions. The paper proposes a conceptual and theoretical framework with methodological consideration to develop how language produced in dyadic relations is to be considered and situated in the larger social configuration the interaction is embedded within. An integrated and comprehensive view is taken: social interactions are expected to be ruled by a normative context, defined by the chain of interdependences that structures the personal network. In this approach, the determinants of discursive practices are not only constrained by the moment of production and isolated from broader influences. Instead, the position the individual and the dyad have in the personal network influences the discursive practices in a twofold manner: on the one hand, the network limits the access to linguistic resources available within it, and, on the other hand, the structure of the network influences the agency of the individual, by the social control inherent to particular network characteristics. Concretely, we investigate how and to what extent consistent ego is from one interaction to another in his or her use of adverbs. To do so, social network analysis (SNA) methods are mobilized. Participants (N=130) are college students recruited in the french speaking part of Switzerland. The personal network of significant ones of each individual is created using name generators and edge interpreters, with a focus on social support and conflict. For the linguistic parts, respondents were asked to record themselves with five of their close relations. From the recordings, we computed an average similarity score based on the adverb used across interactions. In terms of analyses, two are envisaged: First, OLS regressions including network-level measures, such as density and reciprocity, and individual-level measures, such as centralities, are performed to understand the tenets of linguistic similarity from one interaction to another. The second analysis considers each social tie as nested within ego networks. Multilevel models are performed to investigate how the different types of ties may influence the likelihood to use adverbs, by controlling structural properties of the personal network. Primary results suggest that the more cohesive the network, the less likely is the individual to change his or her manner of speaking, and social support increases the use of adverbs in interactions. While promising results emerge, further research should consider a longitudinal approach to able the claim of causality.

Keywords: personal network, adverbs, interactions, social influence

Procedia PDF Downloads 67

11 Written Grammatical Errors of Arabic as Second Language (ASL) Learners: An Evaluative Study

Authors: Sadeq Al Yaari, Fayza Al Hammadi, Ayman Al Yaari, Adham Al Yaari, Montaha Al Yaari, Aayah Al Yaari, Sajedah Al Yaari, Salah Al Yami

Abstract:

Background: In recent years, the number of non-native speakers of the Arabic language has exponentially increased. Aims: This analytical study aims to investigate written grammatical errors committed by Arabic as a Second Language (ASL) learners. More specifically, it explores the reasons behind committing these errors and their effects on the daily communication of ASL learners. Methods: Ten (10) ASL senior learners of the Arabic Language Institute (ALI), College of Arts, King Saud University (KSU), Riyadh, Kingdom of Saudi Arabia (KSA) were randomly selected in this study. The participants were asked to write paragraphs about themselves and then their written work was linguistically analyzed and evaluated by the researchers and some Arabic Language experts before it was statistically analyzed. Conclusions: Results outline that written grammatical errors of ASL learners are characterized by the misuse of many grammatical items. Mainly, these items are proper nouns (PN), common nouns (CN), main verbs (MV), adjectives (adj.), time adverbs (T. Adv.), manner adverbs (M. Adv.), objective pronouns (OP), and central determiners (C Det.) including demonstratives (Dem.) and articles (Artic.), pronouns (Pron.) and prepositions (Prep.).

Keywords: written, grammatical errors, Arabic, second language, non-native learners, analysis.

Procedia PDF Downloads 43

10 Spatial Deictics in Face-to-Face Communication: Findings in Baltic Languages

Authors: Gintare Judzentyte

Abstract:

The present research is aimed to discuss semantics and pragmatics of spatial deictics (deictic adverbs of place and demonstrative pronouns) in the Baltic languages: in spoken Lithuanian and in spoken Latvian. The following objectives have been identified to achieve the aim: 1) to determine the usage of adverbs of place in spoken Lithuanian and Latvian and to verify their meanings in face-to-face communication; 2) to determine the usage of demonstrative pronouns in spoken Lithuanian and Latvian and to verify their meanings in face-to-face communication; 3) to compare the systems between the two spoken languages and to identify the main tendencies. As meanings of demonstratives (adverbs of place and demonstrative pronouns) are context-bound, it is necessary to verify their usage in spontaneous interaction. Besides, deictic gestures play a very important role in face-to-face communication. Therefore, an experimental method is necessary to collect the data. Video material representing spoken Lithuanian and spoken Latvian was recorded by means of the method of a qualitative interview (a semi-structured interview: an empirical research is all about asking right questions). The collected material was transcribed and evaluated taking into account several approaches: 1) physical distance (location of the referent, visual accessibility of the referent); 2) deictic gestures (the combination of language and gesture is especially characteristic of the exophoric use); 3) representation of mental spaces in physical space (a speaker sometimes wishes to mark something that is psychically close as psychologically distant and vice versa). The research of the collected data revealed that in face-to-face communication the participants choose deictic adverbs of place instead of demonstrative pronouns to locate/identify entities in situations where the demonstrative pronouns would be expected in spoken Lithuanian and in spoken Latvian. The analysis showed that visual accessibility of the referent is very important in face-to-face communication, but the main criterion while localizing objects and entities is the need for contrast: lith. čia ‘here’, šis ‘this’, latv. šeit ‘here’, šis ‘this’ usually identify distant entities and are used instead of distal demonstratives (lith. ten ‘there’, tas ‘that’, latv. tur ‘there’, tas ‘that’), because the referred objects/subjects contrast to further entities. Furthermore, the interlocutors in examples from a spontaneously situated interaction usually extend their space and can refer to a ‘distal’ object/subject with a ‘proximal’ demonstrative based on the psychological choice. As the research of the spoken Baltic languages confirmed, the choice of spatial deictics in face-to-face communication is strongly effected by a complex of criteria. Although there are some main tendencies, the exact meaning of spatial deictics in the spoken Baltic languages is revealed and is relevant only in a certain context.

Keywords: Baltic languages, face-to-face communication, pragmatics, semantics, spatial deictics

Procedia PDF Downloads 289

9 Sentiment Classification Using Enhanced Contextual Valence Shifters

Authors: Vo Ngoc Phu, Phan Thi Tuoi

Abstract:

We have explored different methods of improving the accuracy of sentiment classification. The sentiment orientation of a document can be positive (+), negative (-), or neutral (0). We combine five dictionaries from [2, 3, 4, 5, 6] into the new one with 21137 entries. The new dictionary has many verbs, adverbs, phrases and idioms, that are not in five ones before. The paper shows that our proposed method based on the combination of Term-Counting method and Enhanced Contextual Valence Shifters method has improved the accuracy of sentiment classification. The combined method has accuracy 68.984% on the testing dataset, and 69.224% on the training dataset. All of these methods are implemented to classify the reviews based on our new dictionary and the Internet Movie data set.

Keywords: sentiment classification, sentiment orientation, valence shifters, contextual, valence shifters, term counting

Procedia PDF Downloads 503

8 Pragmatic Discoursal Study of Hedging Constructions in English Language

Authors: Mohammed Hussein Ahmed, Bahar Mohammed Kareem

Abstract:

This study is concerned with the pragmatic discoursal study of hedging constructions in English language. Hedging is a mitigated word used to lessen the impact of the utterance uttered by the speakers. Hedging could be either adverbs, adjectives, verbs and sometimes it may consist of clauses. It aims at finding out the extent to which speakers and participants of the discourse use hedging constructions during their conversations. The study also aims at finding out whether or not there are any significant differences in the types and functions of the frequency of hedging constructions employed by male and female. It is hypothesized that hedging constructions are frequent in English discourse more than any other languages due to its formality and that the frequency of the types and functions are influenced by the gender of the participants. To achieve the aims of the study, two types of procedures have been followed: theoretical and practical. The theoretical procedure consists of presenting a theoretical background of hedging topic which includes its definitions, etymology and theories. The practical procedure consists of selecting a sample of texts and analyzing them according to an adopted model. A number of conclusions will be drawn based on the findings of the study.

Keywords: hedging, pragmatics, politeness, theoretical

Procedia PDF Downloads 587

7 On the Weightlessness of Vowel Lengthening: Insights from Arabic Dialect of Yemen and Contribution to Psychoneurolinguistics

Authors: Sadeq Al Yaari, Muhammad Alkhunayn, Montaha Al Yaari, Ayman Al Yaari, Aayah Al Yaari, Adham Al Yaari, Sajedah Al Yaari, Fatehi Eissa

Abstract:

Introduction: It is well established that lengthening (longer duration) is considered one of the correlates of lexical and phrasal prominence. However, it is unexplored whether the scope of vowel lengthening in the Arabic dialect of Yemen (ADY) is differently affected by educated and/or uneducated speakers from different dialectal backgrounds. Specifically, the research aims to examine whether or not linguistic background acquired through different educational channels makes a difference in the speech of the speaker and how that is reflected in related psychoneurolinguistic impairments. Methods: For the above mentioned purpose, we conducted an articulatory experiment wherein a set of words from ADY were examined in the dialectal speech of thousand and seven hundred Yemeni educated and uneducated speakers aged 19-61 years growing up in five regions of the country: Northern, southern, eastern, western and central and were, accordingly, assigned into five dialectal groups. A seven-minute video clip was shown to the participants, who have been asked to spontaneously describe the scene they had just watched before the researchers linguistically and statistically analyzed recordings to weigh vowel lengthening in the speech of the participants. Results: The results show that vowels (monophthongs and diphthongs) are lengthened by all participants. Unexpectedly, educated and uneducated speakers from northern and central dialects lengthen vowels. Compared with uneducated speakers from the same dialect, educated speakers lengthen fewer vowels in their dialectal speech. Conclusions: These findings support the notion that extensive exposure to dialects on account of standard language can cause changes to the patterns of dialects themselves, and this can be seen in the speech of educated and uneducated speakers of these dialects. Further research is needed to clarify the phonemic distinctive features and frequency of lengthening in other open class systems (i.e., nouns, adjectives, and adverbs). Phonetic and phonological report measures are needed as well as validation of existing measures for assessing phonemic vowel length in the Arabic population in general and Arabic individuals with voice, speech, and language impairments in particular.

Keywords: vowel lengthening, Arabic dialect of Yemen, phonetics, phonology, impairment, distinctive features

Procedia PDF Downloads 40

6 Narratives in Science as Covert Prestige Indicators

Authors: Zinaida Shelkovnikova

Abstract:

The language in science is changing and meets the demands of the society. We shall argue that in the varied modern world there are important reasons for the integration of narratives into scientific discourse. As far as nowadays scientists are faced with extremely prompt science development and progress; modern scientific society lives in the conditions of tough competition. The integration of narratives into scientific discourse is thus a good way to prompt scientific experience to different audiences and to express covert prestige of the discourse. Narratives also form the identity of the persuasive narrator. Using the narrative approach to the scientific discourse analysis we reveal the sociocultural diversity of the scientists. If you want to attract audience’s attention to your scientific research, narratives should be integrated into your scientific discourse. Those who understand this consistent pattern are considered the leading scientists. Taking into account that it is prestigious to be renowned, celebrated in science, it is a covert prestige to write narratives in science. We define a science narrative as the intentional, consequent, coherent, event discourse or a discourse fragment, which contains the author creativity, in some cases intrigue, and gives mostly qualitative information (compared with quantitative data) in order to provide maximum understanding of the research. Science narratives also allow the effective argumentation and consequently construct the identity of the persuasive narrator. However, skills of creating appropriate scientific discourse reflect the level of prestige. In order to teach postgraduate students to be successful in English scientific writing and to be prestigious in the scientific society, we have defined the science narrative and outlined its main features and characteristics. Narratives contribute to audience’s involvement with the narrator and his/her narration. In general, the way in which a narrative is performed may result in (limited or greater) contact with the audience. To gain these aim authors use emotional fictional elements; descriptive elements: adjectives; adverbs; comparisons and so on; author’s evaluative elements. Thus, the features of science narrativity are the following: descriptive tools; authors evaluation; qualitative information exceeds the quantitative data; facts take the event status; understandability; accessibility; creativity; logics; intrigue; esthetic nature; fiction. To conclude, narratives function covert prestige of the scientific discourse and shape the identity of the persuasive scientist.

Keywords: covert prestige, narrativity, scientific discourse, scientific narrative

Procedia PDF Downloads 399

5 Linguistic Analysis of Argumentation Structures in Georgian Political Speeches

Authors: Mariam Matiashvili

Abstract:

Argumentation is an integral part of our daily communications - formal or informal. Argumentative reasoning, techniques, and language tools are used both in personal conversations and in the business environment. Verbalization of the opinions requires the use of extraordinary syntactic-pragmatic structural quantities - arguments that add credibility to the statement. The study of argumentative structures allows us to identify the linguistic features that make the text argumentative. Knowing what elements make up an argumentative text in a particular language helps the users of that language improve their skills. Also, natural language processing (NLP) has become especially relevant recently. In this context, one of the main emphases is on the computational processing of argumentative texts, which will enable the automatic recognition and analysis of large volumes of textual data. The research deals with the linguistic analysis of the argumentative structures of Georgian political speeches - particularly the linguistic structure, characteristics, and functions of the parts of the argumentative text - claims, support, and attack statements. The research aims to describe the linguistic cues that give the sentence a judgmental/controversial character and helps to identify reasoning parts of the argumentative text. The empirical data comes from the Georgian Political Corpus, particularly TV debates. Consequently, the texts are of a dialogical nature, representing a discussion between two or more people (most often between a journalist and a politician). The research uses the following approaches to identify and analyze the argumentative structures Lexical Classification & Analysis - Identify lexical items that are relevant in argumentative texts creating process - Creating the lexicon of argumentation (presents groups of words gathered from a semantic point of view); Grammatical Analysis and Classification - means grammatical analysis of the words and phrases identified based on the arguing lexicon. Argumentation Schemas - Describe and identify the Argumentation Schemes that are most likely used in Georgian Political Speeches. As a final step, we analyzed the relations between the above mentioned components. For example, If an identified argument scheme is “Argument from Analogy”, identified lexical items semantically express analogy too, and they are most likely adverbs in Georgian. As a result, we created the lexicon with the words that play a significant role in creating Georgian argumentative structures. Linguistic analysis has shown that verbs play a crucial role in creating argumentative structures.

Keywords: georgian, argumentation schemas, argumentation structures, argumentation lexicon

Procedia PDF Downloads 70

4 A Study Investigating Word Association Behaviour in People with Acquired Language and Communication Disorders

Authors: Angela Maria Fenu

Abstract:

The aim of this study was to better characterize the nature of word association responses in people with aphasia. The participants selected for the experimental group were 4 individuals with mild Broca’s aphasia. The control group consisted of 51 cognitively intact age- and gender-matched individuals. The participants were asked to perform a word association task in which they had to say the first word they thought of when hearing each cue. The cue words (n= 16) were the translation in Italian of the set of English cue words of a published study. The participants from the experimental group were administered the word association test every two weeks for a period of two months when they received speech-language therapy A combination of analytical approaches to measure the data was used. To analyse different patterns of word association responses in both groups, the nature of the relationship between the cue and the response was examined: responses were divided into five categories of association. To investigate the similarity between aphasic and non-aphasic subjects, the stereotypy of responses was examined.While certain stimulus words (nouns, adjectives) elicited responses from Broca’s aphasics that tended to resemble those made by non-aphasic subjects; others (adverbs, verbs) showed the tendency to elicit responses different from the ones given by normal subjects. This suggests that some mechanisms underlying certain types of associations are degraded in aphasics individuals, while others display little evidence of disruption. The high number of paradigmatic associations given in response to a noun or an adjective might imply that the mechanisms, largely semantic, underlying paradigmatic associations are relatively preserved in Broca’s aphasia, but it might also mean that some words are more easily processed depending on their grammatical class (nouns, adjectives). The most significant variation was noticed when the grammatical class of the cue word was an adverb. Unlike the normal individuals, the experimental subjects gave the most idiosyncratic associations, which are often produced when the attempt to give a paradigmatic response fails. In turn, the failure to retrieve paradigmatic responses when the cue is an adverb might suggest that Broca’s aphasics are more sensitive to this grammatical class.The findings from this study suggest that, from research on word associations in people with aphasia, important data can arise concerning the specific lexical retrieval impairments that characterize the different types of aphasia and the various treatments that might positively influence the kinds of word association responses affected by language disruption.

Keywords: aphasia therapy, clinical linguistics, word-association behaviour, mental lexicon

Procedia PDF Downloads 88

3 Historical Development of Negative Emotive Intensifiers in Hungarian

Authors: Martina Katalin Szabó, Bernadett Lipóczi, Csenge Guba, István Uveges

Abstract:

In this study, an exhaustive analysis was carried out about the historical development of negative emotive intensifiers in the Hungarian language via NLP methods. Intensifiers are linguistic elements which modify or reinforce a variable character in the lexical unit they apply to. Therefore, intensifiers appear with other lexical items, such as adverbs, adjectives, verbs, infrequently with nouns. Due to the complexity of this phenomenon (set of sociolinguistic, semantic, and historical aspects), there are many lexical items which can operate as intensifiers. The group of intensifiers are admittedly one of the most rapidly changing elements in the language. From a linguistic point of view, particularly interesting are a special group of intensifiers, the so-called negative emotive intensifiers, that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g.borzasztóanjó ’awfully good’, which means ’excellent’). Despite their special semantic features, negative emotive intensifiers are scarcely examined in literature based on large Historical corpora via NLP methods. In order to become better acquainted with trends over time concerning the intensifiers, The exhaustively analysed a specific historical corpus, namely the Magyar TörténetiSzövegtár (Hungarian Historical Corpus). This corpus (containing 3 millions text words) is a collection of texts of various genres and styles, produced between 1772 and 2010. Since the corpus consists of raw texts and does not contain any additional information about the language features of the data (such as stemming or morphological analysis), a large amount of manual work was required to process the data. Thus, based on a lexicon of negative emotive intensifiers compiled in a previous phase of the research, every occurrence of each intensifier was queried, and the results were stored in a separate data frame. Then, basic linguistic processing (POS-tagging, lemmatization etc.) was carried out automatically with the ‘magyarlanc’ NLP-toolkit. Finally, the frequency and collocation features of all the negative emotive words were automatically analyzed in the corpus. Outcomes of the research revealed in detail how these words have proceeded through grammaticalization over time, i.e., they change from lexical elements to grammatical ones, and they slowly go through a delexicalization process (their negative content diminishes over time). What is more, it was also pointed out which negative emotive intensifiers are at the same stage in this process in the same time period. Giving a closer look to the different domains of the analysed corpus, it also became certain that during this process, the pragmatic role’s importance increases: the newer use expresses the speaker's subjective, evaluative opinion at a certain level.

Keywords: historical corpus analysis, historical linguistics, negative emotive intensifiers, semantic changes over time

Procedia PDF Downloads 233

2 Sentiment Analysis of Creative Tourism Experiences: The Case of Girona, Spain

Authors: Ariadna Gassiot, Raquel Camprubi, Lluis Coromina

Abstract:

Creative tourism involves the participation of tourists in the co-creation of their own experiences in a tourism destination. Consequently, creative tourists move from a passive behavior to an active behavior, and tourism destinations address this type of tourism by changing the scenario and making tourists learn and participate while they travel instead of merely offering tourism products and services to them. In creative tourism experiences, tourists are in close contact with locals and their culture. In destinations where culture (i.e. food, heritage, etc.) is the basis of their offer, such as Girona, Spain, tourism stakeholders must especially consider, analyze, and further foster the co-creation of authentic tourism experiences. They should focus on discovering more about these experiences, their main attributes, visitors’ opinions, etc. Creative tourists do not only participate while they travel around the world, but they also have and active post-travel behavior. They feel free to write about tourism experiences in different channels. User-generated content becomes crucial for any tourism destination when analyzing the market, making decisions, planning strategies, and when addressing issues, such as their reputation and performance. Sentiment analysis is a methodology used to automatically analyze semantic relationships and meanings in texts, so it is a way to extract tourists’ emotions and feelings. Tourists normally express their views and opinions regarding tourism products and services. They may express positive, neutral or negative feelings towards these products or services. For example, they may express anger, love, hate, sadness or joy towards tourism services and products. They may also express feelings through verbs, nouns, adverbs, adjectives, among others. Sentiment analysis may help tourism professionals in a range of areas, from marketing to customer service. For example, sentiment analysis allows tourism stakeholders to forecast tourism expenditure and tourist arrivals, or to analyze tourists’ profile. While there is an increasing presence of creativity in tourists’ experiences, there is also an increasing need to explore tourists’ expressions about these experiences. There is a need to know how they feel about participating in specific tourism activities. Thus, the main objective of this study is to analyze the meanings, emotions and feelings that tourists express about their creative experiences in Girona, Spain. To do so, sentiment analysis methodology is used. Results show the diversity of tourists who actively participate in tourism in Girona. Their opinions refer both to tangible aspects (e.g. food, museums, etc.) and to intangible aspects (e.g. friendliness, nightlife, etc.) of tourism experiences. Tourists express love, likeliness and other sentiments towards tourism products and services in Girona. This study can help tourism stakeholders in understanding tourists’ experiences and feelings. Consequently, they can offer more customized products and services and they can efficiently make them participate in the co-creation of their own tourism experiences.

Keywords: creative tourism, sentiment analysis, text mining, user-generated content

Procedia PDF Downloads 180

1 Construction and Analysis of Tamazight (Berber) Text Corpus

Authors: Zayd Khayi

Abstract:

This paper deals with the construction and analysis of the Tamazight text corpus. The grammatical structure of the Tamazight remains poorly understood, and a lack of comparative grammar leads to linguistic issues. In order to fill this gap, even though it is small, by constructed the diachronic corpus of the Tamazight language, and elaborated the program tool. In addition, this work is devoted to constructing that tool to analyze the different aspects of the Tamazight, with its different dialects used in the north of Africa, specifically in Morocco. It also focused on three Moroccan dialects: Tamazight, Tarifiyt, and Tachlhit. The Latin version was good choice because of the many sources it has. The corpus is based on the grammatical parameters and features of that language. The text collection contains more than 500 texts that cover a long historical period. It is free, and it will be useful for further investigations. The texts were transformed into an XML-format standardization goal. The corpus counts more than 200,000 words. Based on the linguistic rules and statistical methods, the original user interface and software prototype were developed by combining the technologies of web design and Python. The corpus presents more details and features about how this corpus provides users with the ability to distinguish easily between feminine/masculine nouns and verbs. The interface used has three languages: TMZ, FR, and EN. Selected texts were not initially categorized. This work was done in a manual way. Within corpus linguistics, there is currently no commonly accepted approach to the classification of texts. Texts are distinguished into ten categories. To describe and represent the texts in the corpus, we elaborated the XML structure according to the TEI recommendations. Using the search function may provide us with the types of words we would search for, like feminine/masculine nouns and verbs. Nouns are divided into two parts. The gender in the corpus has two forms. The neutral form of the word corresponds to masculine, while feminine is indicated by a double t-t affix (the prefix t- and the suffix -t), ex: Tarbat (girl), Tamtut (woman), Taxamt (tent), and Tislit (bride). However, there are some words whose feminine form contains only the prefix t- and the suffix –a, ex: Tasa (liver), tawja (family), and tarwa (progenitors). Generally, Tamazight masculine words have prefixes that distinguish them from other words. For instance, 'a', 'u', 'i', ex: Asklu (tree), udi (cheese), ighef (head). Verbs in the corpus are for the first person singular and plural that have suffixes 'agh','ex', 'egh', ex: 'ghrex' (I study), 'fegh' (I go out), 'nadagh' (I call). The program tool permits the following characteristics of this corpus: list of all tokens; list of unique words; lexical diversity; realize different grammatical requests. To conclude, this corpus has only focused on a small group of parts of speech in Tamazight language verbs, nouns. Work is still on the adjectives, prounouns, adverbs and others.

Keywords: Tamazight (Berber) language, corpus linguistic, grammar rules, statistical methods

Procedia PDF Downloads 64