Search results for: corpus linguistics.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 107

Search results for: corpus linguistics.

77 A Study on Bilingual Semantic Processing: Category Effects and Age Effects

Authors: Lai Yi-Hsiu

Abstract:

The present study addressed the nature of bilingual semantic processing in Mandarin Chinese and Southern Min and examined category effects and age effects. Nineteen bilingual adults of Mandarin Chinese and Southern Min, nine monolingual seniors of Mandarin Chinese, and ten monolingual seniors of Southern Min in Taiwan individually completed two semantic tasks: Picture naming and category fluency tasks. The instruments for the naming task were sixty black-and-white pictures, including thirty-five object pictures and twenty-five action pictures. The category fluency task also consisted of two semantic categories – objects (or nouns) and actions (or verbs). The reaction time for each picture/question was additionally calculated and analyzed. Oral productions in Mandarin Chinese and in Southern Min were compared and discussed to examine the category effects and age effects. The results of the category fluency task indicated that the content of information of these seniors was comparatively deteriorated, and thus they produced a smaller number of semantic-lexical items. Significant group differences were also found in the reaction time results. Category effects were significant for both adults and seniors in the semantic fluency task. The findings of the present study will help characterize the nature of the bilingual semantic processing of adults and seniors, and contribute to the fields of contrastive and corpus linguistics.

Keywords: Bilingual semantic processing, aging, Mandarin Chinese, Southern Min.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1297
76 Analyzing Microblogs: Exploring the Psychology of Political Leanings

Authors: Meaghan Bowman

Abstract:

Microblogging has become increasingly popular for commenting on current events, spreading gossip, and encouraging individualism--which favors its low-context communication channel. These social media (SM) platforms allow users to express opinions while interacting with a wide range of populations. Hashtags allow immediate identification of like-minded individuals worldwide on a vast array of topics. The output of the analytic tool, Linguistic Inquiry and Word Count (LIWC)--a program that associates psychological meaning with the frequency of use of specific words--may suggest the nature of individuals’ internal states and general sentiments. When applied to groupings of SM posts unified by a hashtag, such information can be helpful to community leaders during periods in which the forming of public opinion happens in parallel with the unfolding of political, economic, or social events. This is especially true when outcomes stand to impact the well-being of the group. Here, we applied the online tools, Google Translate and the University of Texas’s LIWC, to a 90-posting sample from a corpus of Colombian Spanish microblogs. On translated disjoint sets, identified by hashtag as being authored by advocates of voting “No,” advocates voting “Yes,” and entities refraining from hashtag use, we observed the value of LIWC’s Tone feature as distinguishing among the categories and the word “peace,” as carrying particular significance, due to its frequency of use in the data.

Keywords: Colombia peace referendum, FARC, hashtags, linguistics, microblogging, social media.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 924
75 Distributional Semantics Approach to Thai Word Sense Disambiguation

Authors: Sunee Pongpinigpinyo, Wanchai Rivepiboon

Abstract:

Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely  /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.

Keywords: Distributional semantics, Latent Semantic Indexing, natural language processing, Polysemous words, unsupervisedlearning, Word Sense Disambiguation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1819
74 Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction

Authors: Jérôme Azé, Mathieu Roche, Yves Kodratoff, Michèle Sebag

Abstract:

Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).

Keywords: Text-mining, Terminology Extraction, Evolutionary algorithm, ROC Curve.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1662
73 A Corpus-Based Analysis on Code-Mixing Features in Mandarin-English Bilingual Children in Singapore

Authors: Xunan Huang, Caicai Zhang

Abstract:

This paper investigated the code-mixing features in Mandarin-English bilingual children in Singapore. First, it examined whether the code-mixing rate was different in Mandarin Chinese and English contexts. Second, it explored the syntactic categories of code-mixing in Singapore bilingual children. Moreover, this study investigated whether morphological information was preserved when inserting syntactic components into the matrix language. Data are derived from the Singapore Bilingual Corpus, in which the recordings and transcriptions of sixty English-Mandarin 5-to-6-year-old children were preserved for analysis. Results indicated that the rate of code-mixing was asymmetrical in the two language contexts, with the rate being significantly higher in the Mandarin context than that in the English context. The asymmetry is related to language dominance in that children are more likely to code-mix when using their nondominant language. Concerning the syntactic categories of code-mixing words in the Singaporean bilingual children, we found that noun-mixing, verb-mixing, and adjective-mixing are the three most frequently used categories in code-mixing in the Mandarin context. This pattern mirrors the syntactic categories of code-mixing in the Cantonese context in Cantonese-English bilingual children, and the general trend observed in lexical borrowing. Third, our results also indicated that English vocabularies that carry morphological information are embedded in bare forms in the Mandarin context. These findings shed light upon how bilingual children take advantage of the two languages in mixed utterances in a bilingual environment.

Keywords: Code-mixing, Mandarin Chinese, English, bilingual children.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1124
72 Re-telling Goa's History: The Margin Narrative

Authors: Anna Beatriz Paula

Abstract:

This paper presents the first reflexions about Margaret Mascarenhas-s novel, “Skin", based on post-colonial critic perception of History and its agents. By doing so, this study will put light on a literary corpus of Indian Literatures: the Goan Literature whose cultural basis creates an unique historiographic metafiction conducted by different characters that one by one plays the narrator role.

Keywords: Goa, History, Literature, Metafiction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2133
71 A Corpus-Based Approach to Understanding Market Access in Fisheries and Aquaculture: A Systematic Literature Review

Authors: Cheryl Marie Cordeiro

Abstract:

Although fisheries and aquaculture studies might seem marginal to international business (IB) studies in general, fisheries and aquaculture IB (FAIB) management is currently facing increasing pressure to meet global demand and consumption for fish in the next coming decades. In part address to this challenge, the purpose of this systematic review of literature (SLR) study is to investigate the use of the term ‘market access’ in its context of use in the generic literature and business sector discourse, in comparison to the more specific literature and discourse in fisheries, aquaculture and seafood. This SLR aims to uncover the knowledge/interest gaps between the academic subject discourses and business sector practices. Corpus driven in methodology and using a triangulation method of three different text analysis software including AntConc, VOSviewer and Web of Science (WoS) analytics, the SLR results indicate a gap in conceptual knowledge and business practices in how ‘market access’ is conceived and used in the context of the pharmaceutical healthcare industry and FAIB research and practice. While it is acknowledged that the product orientation of different business sectors might differ, this SLR study works with the assumption that both business sectors are global in orientation. These business sectors are complex in their operations from product to market. This SLR suggests a conceptual model in understanding the challenges, the potential barriers as well as avenues for solutions to developing market access for FAIB.

Keywords: Market access, fisheries and aquaculture, international business, systematic literature review.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 954
70 Maya Semantic Technique: A Mathematical Technique Used to Determine Partial Semantics for Declarative Sentences

Authors: Marcia T. Mitchell

Abstract:

This research uses computational linguistics, an area of study that employs a computer to process natural language, and aims at discerning the patterns that exist in declarative sentences used in technical texts. The approach is mathematical, and the focus is on instructional texts found on web pages. The technique developed by the author and named the MAYA Semantic Technique is used here and organized into four stages. In the first stage, the parts of speech in each sentence are identified. In the second stage, the subject of the sentence is determined. In the third stage, MAYA performs a frequency analysis on the remaining words to determine the verb and its object. In the fourth stage, MAYA does statistical analysis to determine the content of the web page. The advantage of the MAYA Semantic Technique lies in its use of mathematical principles to represent grammatical operations which assist processing and accuracy if performed on unambiguous text. The MAYA Semantic Technique is part of a proposed architecture for an entire web-based intelligent tutoring system. On a sample set of sentences, partial semantics derived using the MAYA Semantic Technique were approximately 80% accurate. The system currently processes technical text in one domain, namely Cµ programming. In this domain all the keywords and programming concepts are known and understood.

Keywords: Natural language understanding, computational linguistics, knowledge representation, linguistic theories.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1673
69 Named Entity Recognition using Support Vector Machine: A Language Independent Approach

Authors: Asif Ekbal, Sivaji Bandyopadhyay

Abstract:

Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.

Keywords: Named Entity (NE), Named Entity Recognition (NER), Support Vector Machine (SVM), Bengali, Hindi.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3414
68 Analysis of Steles with Libyan Inscriptions of Grande Kabylia, Algeria

Authors: Samia Ait Ali Yahia

Abstract:

Several steles with Libyan inscriptions were discovered in Grande Kabylia (Algeria), but very few researchers were interested in these inscriptions. Our work is to list, if possible all these steles in order to do a descriptive study of the corpus. The steles analysis will be focused on the iconographic and epigraphic level and on the different forms of Libyan characters in order to highlight the alphabet used by the Grande Kabylia.

Keywords: Epigraphy, stele, Libyan inscription, Grande Kabylia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1093
67 The Role of Paraphrase in Interpreting Students’ Writing

Authors: Maya Lisa Aryanti, S. S. M. Hum

Abstract:

To improve students’ skill, writing is the most challenging skill to be developed. The reason is that besides helping the students to develop their skill, this activity also helps them to express themselves. This paper depicts how paraphrasing is very helpful to interpret students’ writing. Syntactic units, used tenses and meanings will indeed change once the writings were paraphrased. The objectives of this research are to reveal the inappropriate structure of syntactic units, to show what types of sentences the students often make, and to show how paraphrasing can help to infer the message. The methodology of this research is descriptive qualitative research. In addition, theories of linguistics are also included. This includes theory of Syntax to describe syntactic units and tenses and theory of Semantics to describe theories of meaning and how paraphrasing works. The theories of general linguistics, grammar and writing are also provided to support the theories of Syntax and Semantics. The results of this research are concerned with how the message is received in the end. The message written in the students’ essay is not clear because of the improper structure of syntactic units and use of incorrect of tenses. The students tend to use simple sentences, compound sentences and complex sentences with a few mistakes in their writing. In addition, they tend to create unnecessary phrases. The last point is that this research shows how paraphrase works to attain complete meaning of a sentence.

Keywords: Paraphrase, meanings, syntactic units and tenses.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1021
66 Effective Features for Disambiguation of Turkish Verbs

Authors: Zeynep Orhan, Zeynep Altan

Abstract:

This paper summarizes the results of some experiments for finding the effective features for disambiguation of Turkish verbs. Word sense disambiguation is a current area of investigation in which verbs have the dominant role. Generally verbs have more senses than the other types of words in the average and detecting these features for verbs may lead to some improvements for other word types. In this paper we have considered only the syntactical features that can be obtained from the corpus and tested by using some famous machine learning algorithms.

Keywords: Word sense disambiguation, feature selection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1750
65 A Multigranular Linguistic Additive Ratio Assessment Model in Group Decision Making

Authors: Wiem Daoud Ben Amor, Luis Martínez López, Jr., Hela Moalla Frikha

Abstract:

Most of the multi-criteria group decision making (MCGDM) problems dealing with qualitative criteria require consideration of the large background of expert information. It is common that experts have different degrees of knowledge for giving their alternative assessments according to criteria. So, it seems logical that they use different evaluation scales to express their judgment, i.e., multi granular linguistic scales. In this context, we propose the extension of the classical additive ratio assessment (ARAS) method to the case of a hierarchical linguistics term for managing multi granular linguistic scales in uncertain context where uncertainty is modeled by means in linguistic information. The proposed approach is called the extended hierarchical linguistics-ARAS method (ELH-ARAS). Within the ELH-ARAS approach, the decision maker (DMs) can diagnose the results (the ranking of the alternatives) in a decomposed style i.e., not only at one level of the hierarchy but also at the intermediate ones. Also, the developed approach allows a feedback transformation i.e., the collective final results of all experts are able to be transformed at any level of the extended linguistic hierarchy that each expert has previously used. Therefore, the ELH-ARAS technique makes it easier for decision-makers to understand the results. Finally, an MCGDM case study is given to illustrate the proposed approach.

Keywords: Additive ratio assessment, extended hierarchical linguistic, multi-criteria group decision making problems, multi granular linguistic contexts.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 368
64 Achieving Maximum Performance through the Practice of Entrepreneurial Ethics: Evidence from SMEs in Nigeria

Authors: S. B. Tende, H. L. Abubakar

Abstract:

It is acknowledged that small and medium enterprises (SMEs) may encounter different ethical issues and pressures that could affect the way in which they strategize or make decisions concerning the outcome of their business. Therefore, this research aimed at assessing entrepreneurial ethics in the business of SMEs in Nigeria. Secondary data were adopted as source of corpus for the analysis. The findings conclude that a sound entrepreneurial ethics system has a significant effect on the level of performance of SMEs in Nigeria. The Nigerian Government needs to provide both guiding and physical structures; as well as learning systems that could inculcate these entrepreneurial ethics.

Keywords: Entrepreneurial ethics, culture, performance, SME.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 973
63 Software Architectural Design Ontology

Authors: Muhammad Irfan Marwat, Sadaqat Jan, Syed Zafar Ali Shah

Abstract:

Software Architecture plays a key role in software development but absence of formal description of Software Architecture causes different impede in software development. To cope with these difficulties, ontology has been used as artifact. This paper proposes ontology for Software Architectural design based on IEEE model for architecture description and Kruchten 4+1 model for viewpoints classification. For categorization of style and views, ISO/IEC 42010 has been used. Corpus method has been used to evaluate ontology. The main aim of the proposed ontology is to classify and locate Software Architectural design information.

Keywords: Software Architecture Ontology, Semantic based Software Architecture, Software Architecture, Ontology, Software Engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4198
62 Contextual Distribution for Textual Alignment

Authors: Yuri Bizzoni, Marianne Reboul

Abstract:

Our program compares French and Italian translations of Homer’s Odyssey, from the XVIth to the XXth century. We focus on the third point, showing how distributional semantics systems can be used both to improve alignment between different French translations as well as between the Greek text and a French translation. Although we focus on French examples, the techniques we display are completely language independent.

Keywords: Translation studies, machine translation, computational linguistics, distributional semantics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1041
61 Part of Speech Tagging Using Statistical Approach for Nepali Text

Authors: Archit Yajnik

Abstract:

Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.

Keywords: Hidden Markov model, Viterbi algorithm, POS tagging, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1715
60 New Ways of Vocabulary Enlargement

Authors: T. Solonchak, S. Pesina

Abstract:

Lexical invariants, being a sort of stereotypes within the frames of ordinary consciousness, are created by the members of a language community as a result of uniform division of reality. The invariant meaning is formed in person’s mind gradually in the course of different actualizations of secondary meanings in various contexts. We understand lexical the invariant as abstract language essence containing a set of semantic components. In one of its configurations it is the basis or all or a number of the meanings making up the semantic structure of the word.

Keywords: Lexical invariant, invariant theories, polysemantic word, cognitive linguistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2354
59 A Similarity Measure for Clustering and its Applications

Authors: Guadalupe J. Torres, Ram B. Basnet, Andrew H. Sung, Srinivas Mukkamala, Bernardete M. Ribeiro

Abstract:

This paper introduces a measure of similarity between two clusterings of the same dataset produced by two different algorithms, or even the same algorithm (K-means, for instance, with different initializations usually produce different results in clustering the same dataset). We then apply the measure to calculate the similarity between pairs of clusterings, with special interest directed at comparing the similarity between various machine clusterings and human clustering of datasets. The similarity measure thus can be used to identify the best (in terms of most similar to human) clustering algorithm for a specific problem at hand. Experimental results pertaining to the text categorization problem of a Portuguese corpus (wherein a translation-into-English approach is used) are presented, as well as results on the well-known benchmark IRIS dataset. The significance and other potential applications of the proposed measure are discussed.

Keywords: Clustering Algorithms, Clustering Applications, Similarity Measures, Text Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1577
58 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1888
57 A System of Automatic Speech Recognition based on the Technique of Temporal Retiming

Authors: Samir Abdelhamid, Noureddine Bouguechal

Abstract:

We report in this paper the procedure of a system of automatic speech recognition based on techniques of the dynamic programming. The technique of temporal retiming is a technique used to synchronize between two forms to compare. We will see how this technique is adapted to the field of the automatic speech recognition. We will expose, in a first place, the theory of the function of retiming which is used to compare and to adjust an unknown form with a whole of forms of reference constituting the vocabulary of the application. Then we will give, in the second place, the various algorithms necessary to their implementation on machine. The algorithms which we will present were tested on part of the corpus of words in Arab language Arabdic-10 [4] and gave whole satisfaction. These algorithms are effective insofar as we apply them to the small ones or average vocabularies.

Keywords: Continuous speech recognition, temporal retiming, phonetic decoding, algorithms, vocal signal, dynamic programming.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1353
56 Questions in the School

Authors: Jana M. Havigerová, Jiří Haviger

Abstract:

Paper deals with the topic of questions as important components of information behavior in the school. By analyzing the Corpus Schola2010, the state of contemporary education in terms of questioning is proven unsatisfactory: 80% of the questions are asked by teachers; most of teacher-s questions are asked at the beginning of the first grade, than their number decreases and is settling down on 80±10 questions per lesson. The average number of questions within one lesson per one pupil is generally less than one whole question. The highest values are achieved in the first, sixth, eighth and tenth grade,, i.e. in the transition years in which pupils are moving into higher levels of education and every following year it declines. We can state Czech school do not support questioning and question skill of their pupils, thereby typical Czech schools are neglecting the development of thinking, reasoning and cooperation of their pupils.

Keywords: information behavior, questions, primary and secondary education, Czech Republic

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1465
55 Resolving Dependency Ambiguity of Subordinate Clauses using Support Vector Machines

Authors: Sang-Soo Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

In this paper, we propose a method of resolving dependency ambiguities of Korean subordinate clauses based on Support Vector Machines (SVMs). Dependency analysis of clauses is well known to be one of the most difficult tasks in parsing sentences, especially in Korean. In order to solve this problem, we assume that the dependency relation of Korean subordinate clauses is the dependency relation among verb phrase, verb and endings in the clauses. As a result, this problem is represented as a binary classification task. In order to apply SVMs to this problem, we selected two kinds of features: static and dynamic features. The experimental results on STEP2000 corpus show that our system achieves the accuracy of 73.5%.

Keywords: Dependency analysis, subordinate clauses, binaryclassification, support vector machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1600
54 Teaching Linguistic Humour Research Theories: Egyptian Higher Education EFL Literature Classes

Authors: O. F. Elkommos

Abstract:

“Humour studies” is an interdisciplinary research area that is relatively recent. It interests researchers from the disciplines of psychology, sociology, medicine, nursing, in the work place, gender studies, among others, and certainly teaching, language learning, linguistics, and literature. Linguistic theories of humour research are numerous; some of which are of interest to the present study. In spite of the fact that humour courses are now taught in universities around the world in the Egyptian context it is not included. The purpose of the present study is two-fold: to review the state of arts and to show how linguistic theories of humour can be possibly used as an art and craft of teaching and of learning in EFL literature classes. In the present study linguistic theories of humour were applied to selected literary texts to interpret humour as an intrinsic artistic communicative competence challenge. Humour in the area of linguistics was seen as a fifth component of communicative competence of the second language leaner. In literature it was studied as satire, irony, wit, or comedy. Linguistic theories of humour now describe its linguistic structure, mechanism, function, and linguistic deviance. Semantic Script Theory of Verbal Humor (SSTH), General Theory of Verbal Humor (GTVH), Audience Based Theory of Humor (ABTH), and their extensions and subcategories as well as the pragmatic perspective were employed in the analyses. This research analysed the linguistic semantic structure of humour, its mechanism, and how the audience reader (teacher or learner) becomes an interactive interpreter of the humour. This promotes humour competence together with the linguistic, social, cultural, and discourse communicative competence. Studying humour as part of the literary texts and the perception of its function in the work also brings its positive association in class for educational purposes. Humour is by default a provoking/laughter-generated device. Incongruity recognition, perception and resolving it, is a cognitive mastery. This cognitive process involves a humour experience that lightens up the classroom and the mind. It establishes connections necessary for the learning process. In this context the study examined selected narratives to exemplify the application of the theories. It is, therefore, recommended that the theories would be taught and applied to literary texts for a better understanding of the language. Students will then develop their language competence. Teachers in EFL/ESL classes will teach the theories, assist students apply them and interpret text and in the process will also use humour. This is thus easing students' acquisition of the second language, making the classroom an enjoyable, cheerful, self-assuring, and self-illuminating experience for both themselves and their students. It is further recommended that courses of humour research studies should become an integral part of higher education curricula in Egypt.

Keywords: ABTH, deviance, disjuncture, episodic, GTVH, humour competence, humour comprehension, humour in the classroom, humour in the literary texts, humour research linguistic theories, incongruity- resolution, isotopy-disjunction, jab line, longer text joke, narrative story line (macro-micro), punch line, six knowledge resource, SSTH, stacks, strands, teaching linguistics, teaching literature, TEFL, TESL.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1413
53 Growing Self Organising Map Based Exploratory Analysis of Text Data

Authors: Sumith Matharage, Damminda Alahakoon

Abstract:

Textual data plays an important role in the modern world. The possibilities of applying data mining techniques to uncover hidden information present in large volumes of text collections is immense. The Growing Self Organizing Map (GSOM) is a highly successful member of the Self Organising Map family and has been used as a clustering and visualisation tool across wide range of disciplines to discover hidden patterns present in the data. A comprehensive analysis of the GSOM’s capabilities as a text clustering and visualisation tool has so far not been published. These functionalities, namely map visualisation capabilities, automatic cluster identification and hierarchical clustering capabilities are presented in this paper and are further demonstrated with experiments on a benchmark text corpus.

Keywords: Text Clustering, Growing Self Organizing Map, Automatic Cluster Identification, Hierarchical Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1999
52 A New Time-Frequency Speech Analysis Approach Based On Adaptive Fourier Decomposition

Authors: Liming Zhang

Abstract:

In this paper, a new adaptive Fourier decomposition (AFD) based time-frequency speech analysis approach is proposed. Given the fact that the fundamental frequency of speech signals often undergo fluctuation, the classical short-time Fourier transform (STFT) based spectrogram analysis suffers from the difficulty of window size selection. AFD is a newly developed signal decomposition theory. It is designed to deal with time-varying non-stationary signals. Its outstanding characteristic is to provide instantaneous frequency for each decomposed component, so the time-frequency analysis becomes easier. Experiments are conducted based on the sample sentence in TIMIT Acoustic-Phonetic Continuous Speech Corpus. The results show that the AFD based time-frequency distribution outperforms the STFT based one.

Keywords: Adaptive fourier decomposition, instantaneous frequency, speech analysis, time-frequency distribution.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1728
51 Automatic Text Summarization

Authors: Mohamed Abdel Fattah, Fuji Ren

Abstract:

This work proposes an approach to address automatic text summarization. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First we investigate the effect of each sentence feature on the summarization task. Then we use all features score function to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 English religious articles. The results of the proposed approach are promising.

Keywords: Automatic Summarization, Genetic Algorithm, Mathematical Regression, Text Features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2339
50 Extraction of Temporal Relation by the Creation of Historical Natural Disaster Archive

Authors: Suguru Yoshioka, Seiichi Tani, Seinosuke Toda

Abstract:

In historical science and social science, the influence of natural disaster upon society is a matter of great interest. In recent years, some archives are made through many hands for natural disasters, however it is inefficiency and waste. So, we suppose a computer system to create a historical natural disaster archive. As the target of this analysis, we consider newspaper articles. The news articles are considered to be typical examples that prescribe the temporal relations of affairs for natural disaster. In order to do this analysis, we identify the occurrences in newspaper articles by some index entries, considering the affairs which are specific to natural disasters, and show the temporal relation between natural disasters. We designed and implemented the automatic system of “extraction of the occurrences of natural disaster" and “temporal relation table for natural disaster."

Keywords: Database, digital library, corpus, historical natural disaster, temporal relation

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1406
49 Foreign Elements in the Methodologies of Usul Fiqh: Analyzing the Orientalist Thought

Authors: Ariyanti Mustapha

Abstract:

The development of Islamic jurisprudence since the first century of the hijra has fascinated many orientalists to explore the historiography of Islamic legislation. The practice of usul fiqh began during the lifetime of the Prophet Muhammad and was continued by the companions as the legal reasoning due to the absence of the legal injunction in the Qur’an and Sunnah. The orientalists propagated that the Roman and Jewish legislation were transplanted into Islamic jurisprudence and it was the primary reason for its progression. We used qualitative and comparative methods to analyze the orientalists’ views. Results showed that many erroneous facts were propagated by Goldziher and Schacht by claiming the parallels between the principles, methodologies, and fundamental concepts in Islamic jurisprudence and Roman Provincial law. The orientalists claimed that Islamic jurisprudence was derived from the corpus of Jewish Mishnah and Ha-kol. These judgments are used by the orientalists to prove the inferiority of Islamic jurisprudence. Nevertheless, many evidences have proven that Islamic legislation is capable of developing independently without any foreign transplant.

Keywords: Foreign transplant, ijtihad, orientalist, usul fiqh.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 402
48 Assessment of the Validity of Sentiment Analysis as a Tool to Analyze the Emotional Content of Text

Authors: Trisha Malhotra

Abstract:

Sentiment analysis is a recent field of study that computationally assesses the emotional nature of a body of text. To assess its test-validity, sentiment analysis was carried out on the emotional corpus of text from a personal 15-day mood diary. Self-reported mood scores varied more or less accurately with daily mood evaluation score given by the software. On further assessment, it was found that while sentiment analysis was good at assessing ‘global’ mood, it was not able to ‘locally’ identify and differentially score synonyms of various emotional words. It is further critiqued for treating the intensity of an emotion as universal across cultures. Finally, the software is shown not to account for emotional complexity in sentences by treating emotions as strictly positive or negative. Hence, it is posited that a better output could be two (positive and negative) affect scores for the same body of text.

Keywords: Analysis, data, diary, emotions, mood, sentiment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1129