Search results for: common words
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 6582

Search results for: common words

6552 The Grammatical Dictionary Compiler: A System for Kartvelian Languages

Authors: Liana Lortkipanidze, Nino Amirezashvili, Nino Javashvili

Abstract:

The purpose of the grammatical dictionary is to provide information on the morphological and syntactic characteristics of the basic word in the dictionary entry. The electronic grammatical dictionaries are used as a tool of automated morphological analysis for texts processing. The Georgian Grammatical Dictionary should contain grammatical information for each word: part of speech, type of declension/conjugation, grammatical forms of the word (paradigm), alternative variants of basic word/lemma. In this paper, we present the system for compiling the Georgian Grammatical Dictionary automatically. We propose dictionary-based methods for extending grammatical lexicons. The input lexicon contains only a few number of words with identical grammatical features. The extension is based on similarity measures between features of words; more precisely, we add words to the extended lexicons, which are similar to those, which are already in the grammatical dictionary. Our dictionaries are corpora-based, and for the compiling, we introduce the method for lemmatization of unknown words, i.e., words of which neither full form nor lemma is in the grammatical dictionary.

Keywords: acquisition of lexicon, Georgian grammatical dictionary, lemmatization rules, morphological processor

Procedia PDF Downloads 120
6551 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 370
6550 Compounding and Blending in English and Hausa Languages

Authors: Maryam Maimota

Abstract:

Words are the basic building blocks of a language. In everyday usage of a language, words are used and new words are formed and reformed in order to contain and accommodate all entities, phenomena, qualities and every aspect of the entire human life. This research study seeks to examine and compare some of the word formation processes and how they are used in forming new words in English and Hausa languages. The study focuses its main attention on blending and compounding as word formation processes and how the processes are used adequately in the formation of words in both English and Hausa languages. The research aims to find out, how compounding and blending are used, as processes of word formation in these two languages. And also, to investigate the word formation processes involved in compounding and blending in these languages, and the nature of words that are formed. Therefore, the research tries to find the answers to the following research questions; What types of compound and blended forms are found and how they are formed in the English and Hausa languages? How these compounded and blended forms functioned in both English and Hausa languages in different context such as in phrases and sentences structures? Findings of the study reveal that, there exist new kind of words formed in Hausa and English language under blending, which previous findings did not either reveal or explain in detail. Similarly, there are a lot of similarities found in the way these blends and compounds forms in the two languages, however, the data available shows that, blends in the Hausa language are more, when compared to the blends in English. The data of this study will be gathered based on discourse found in newspaper, articles, novels, and written literature of the Hausa and English languages.

Keywords: blending, compounding, morphology, word formation

Procedia PDF Downloads 346
6549 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi

Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty

Abstract:

The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.

Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity

Procedia PDF Downloads 331
6548 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: independent topic analysis, topic extraction, topic naming, web search engine

Procedia PDF Downloads 98
6547 Structural Analysis of Username Segment in E-Mail Addresses of Engineering Institutes of Gujarat State of India

Authors: Jatinderkumar R. Saini

Abstract:

E-mail has become a key mechanism of electronic communication. This is truer for professional organizations that like to communicate with their subjects online and are slowly shifting to paper-less office. The current paper focuses specifically on academic institutions offering Engineering course in Gujarat state and attempts for textual analysis of the usernames of the institutional e-mail addresses. We found that the institutions tend to design the username segment of their e-mail addresses by choosing words or combination of words from specific categories. The paper also highlights the use of special characters, digits and random words in designing the usernames. On the sidelines, the paper lists the style of employing department names and designations for the design process. To the best of our knowledge, this is the first formal attempt to analyze the selection of words employed for designing username segment of e-mail addresses of Engineering institutions.

Keywords: e-mail address, institute, engineering, username

Procedia PDF Downloads 309
6546 A Method for the Extraction of the Character's Tendency from Korean Novels

Authors: Min-Ha Hong, Kee-Won Kim, Seung-Hoon Kim

Abstract:

The character in the story-based content, such as novels and movies, is one of the core elements to understand the story. In particular, the character’s tendency is an important factor to analyze the story-based content, because it has a significant influence on the storyline. If readers have the knowledge of the tendency of characters before reading a novel, it will be helpful to understand the structure of conflict, episode and relationship between characters in the novel. It may therefore help readers to select novel that the reader wants to read. In this paper, we propose a method of extracting the tendency of the characters from a novel written in Korean. In advance, we build the dictionary with pairs of the emotional words in Korean and English since the emotion words in the novel’s sentences express character’s feelings. We rate the degree of polarity (positive or negative) of words in our emotional words dictionary based on SenticNet. Then we extract characters and emotion words from sentences in a novel. Since the polarity of a word grows strong or weak due to sentence features such as quotations and modifiers, our proposed method consider them to calculate the polarity of characters. The information of the extracted character’s polarity can be used in the book search service or book recommendation service.

Keywords: character tendency, data mining, emotion word, Korean novel

Procedia PDF Downloads 315
6545 N400 Investigation of Semantic Priming Effect to Symbolic Pictures in Text

Authors: Thomas Ousterhout

Abstract:

The purpose of this study was to investigate if incorporating meaningful pictures of gestures and facial expressions in short sentences of text could supplement the text with enough semantic information to produce and N400 effect when probe words incongruent to the picture were subsequently presented. Event-related potentials (ERPs) were recorded from a 14-channel commercial grade EEG headset while subjects performed congruent/incongruent reaction time discrimination tasks. Since pictures of meaningful gestures have been shown to be semantically processed in the brain in a similar manner as words are, it is believed that pictures will add supplementary information to text just as the inclusion of their equivalent synonymous word would. The hypothesis is that when subjects read the text/picture mixed sentences, they will process the images and words just like in face-to-face communication and therefore probe words incongruent to the image will produce an N400.

Keywords: EEG, ERP, N400, semantics, congruency, facilitation, Emotiv

Procedia PDF Downloads 238
6544 Therapeutic Power of Words through Reading Writing and Storytelling

Authors: Sakshi Kaul, Sundeep Verma

Abstract:

The focus of the current paper is to evaluate the therapeutic power of words. This will be done by critically evaluating the impact reading, writing and storytelling have on individuals. When we read, tell or listen to a story we are exercising our imagination. Imagination becomes the source of activation of thoughts and actions. This enables and helps the reader, writer or the listener to express the suppressed emotions or desires. The stories told, untold may bring various human emotions and attributes to forth such as hope, optimism, fear, happiness. Each story narrated evokes different emotions, at times they help us unravel ourselves in the world of the teller thereby bringing solace. Stories heard or told add to individual’s life by creating a community around, giving wings of thoughts that enable individual to be more imaginative and creative thereby fostering positively and happiness. Reading if looked at from the reader’s point of view can broaden the horizon of information and ideas about facts and life laws giving more meaning to life. From ‘once upon a time’ to ‘to happily ever after’, all that stories talk about is life’s learning. The power of words sometimes may be negated, this paper would reiterate the power of words by critically evaluating how words can become powerful and therapeutic in various structures and forms in the society. There is a story behind every situation, action and reaction. Hence it is of prime importance to understand each story, to enable a person to deal with whatever he or she may be going through. For example, if a client is going through some trauma in his or her life, the counsellor needs to know exactly what is the turmoil that is being faced so that the client can be assisted accordingly. Counselling is considered a process of healing through words or as Talk therapy, where merely through words we try to heal the client. In a counselling session, the counsellor focuses on working with the clients to bring a positive change. The counsellor allows the client to express themselves which is referred to as catharsis. The words spoken, written or heard transcend to heal and can be therapeutic. The therapeutic power of words has been seen in various cultural practices and belief systems. The underlining belief that words have the power to heal, save and bring change has existed from ages. Many religious and spiritual practices also acclaim the power of the words. Through this empirical paper, we have tried to bring to light how reading, writing, and storytelling have been used as mediums of healing and have been therapeutic in nature.

Keywords: reading, storytelling, therapeutic, words

Procedia PDF Downloads 243
6543 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text

Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni

Abstract:

The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.

Keywords: cooccurrence graph, entity relation graph, unstructured text, weighted distance

Procedia PDF Downloads 123
6542 Dynamics of Hybrid Language in Urban and Rural Uttar Pradesh India

Authors: Divya Pande

Abstract:

The dynamics of culture expresses itself in language. Even after India got independence in 1947 English subtly crept in the language of the masses with a silent and powerful flow towards the vernacular. The culture contact resulted in learning and emergence of a new language across the Hindi speaking belt of Northern and Central India. The hybrid words thus formed displaced the original word and got contextualized and absorbed in the language of the common masses. The research paper explores the interesting new vocabulary used extensively in the urban and rural districts of the state of Uttar- Pradesh which is the most populous state of India. The paper adopts a two way classification- formal and contextual for the analysis of the hybrid vocabulary of the linguistic items where one element is necessarily from the English language and the other from the Hindi. The new vocabulary represents languages of the wider world cutting across the geographical and the cultural barriers. The paper also broadly points out to the Hinglish commonly used in the state.

Keywords: assimilation, culture contact, Hinglish, hybrid words

Procedia PDF Downloads 378
6541 Aspect-Level Sentiment Analysis with Multi-Channel and Graph Convolutional Networks

Authors: Jiajun Wang, Xiaoge Li

Abstract:

The purpose of the aspect-level sentiment analysis task is to identify the sentiment polarity of aspects in a sentence. Currently, most methods mainly focus on using neural networks and attention mechanisms to model the relationship between aspects and context, but they ignore the dependence of words in different ranges in the sentence, resulting in deviation when assigning relationship weight to other words other than aspect words. To solve these problems, we propose a new aspect-level sentiment analysis model that combines a multi-channel convolutional network and graph convolutional network (GCN). Firstly, the context and the degree of association between words are characterized by Long Short-Term Memory (LSTM) and self-attention mechanism. Besides, a multi-channel convolutional network is used to extract the features of words in different ranges. Finally, a convolutional graph network is used to associate the node information of the dependency tree structure. We conduct experiments on four benchmark datasets. The experimental results are compared with those of other models, which shows that our model is better and more effective.

Keywords: aspect-level sentiment analysis, attention, multi-channel convolution network, graph convolution network, dependency tree

Procedia PDF Downloads 178
6540 Optical Multicast over OBS Networks: An Approach Based on Code-Words and Tunable Decoders

Authors: Maha Sliti, Walid Abdallah, Noureddine Boudriga

Abstract:

In the frame of this work, we present an optical multicasting approach based on optical code-words. Our approach associates, in the edge node, an optical code-word to a group multicast address. In the core node, a set of tunable decoders are used to send a traffic data to multiple destinations based on the received code-word. The use of code-words, which correspond to the combination of an input port and a set of output ports, allows the implementation of an optical switching matrix. At the reception of a burst, it will be delayed in an optical memory. And, the received optical code-word is split to a set of tunable optical decoders. When it matches a configured code-word, the delayed burst is switched to a set of output ports.

Keywords: optical multicast, optical burst switching networks, optical code-words, tunable decoder, virtual optical memory

Procedia PDF Downloads 576
6539 English Loanwords in the Egyptian Variety of Arabic: Morphological and Phonological Changes

Authors: Mohamed Yacoub

Abstract:

This paper investigates the English loanwords in the Egyptian variety of Arabic and reaches three findings. Data, in the first finding, were collected from Egyptian movies and soap operas; over two hundred words have been borrowed from English, code-switching was not included. These words then have been put into eleven different categories according to their use and part of speech. Finding two addresses the morphological and phonological change that occurred to these words. Regarding the phonological change, eight categories were found in both consonant and vowel variation, five for consonants and three for vowels. Examples were given for each. Regarding the morphological change, five categories were found including the masculine, feminine, dual, broken, and non-pluralize-able nouns. The last finding is the answers to a four-question survey that addresses forty eight native speakers of Egyptian Arabic and found that most participants did not recognize English borrowed words and thought they were originally Arabic and could not give Arabic equivalents for the loanwords that they could recognize.

Keywords: sociolinguistics, loanwords, borrowing, morphology, phonology, variation, Egyptian dialect

Procedia PDF Downloads 359
6538 Understanding Relationships between Listening to Music and Pronunciation Learning: An Investigation Based upon Japanese EFL Learners' Self-Evaluation

Authors: Hirokatsu Kawashima

Abstract:

In an attempt to elucidate relationships between listening to music and pronunciation learning, a classroom-based investigation was conducted with Japanese EFL learners (n=45). The subjects were instructed to listen to English songs they liked on YouTube, especially paying attention to phonologically similar vowel and consonant minimal pair words (e.g., live and leave). This kind of activity, which included taking notes, was regularly carried out in the classroom, and the same kind of task was given to the subjects as homework in order to reinforce the in-class activity. The duration of these activities was eight weeks, after which the program was evaluated on a 9-point scale (1: the lowest and 9: the highest) by learners’ self-evaluation. The main questions for this evaluation included 1) how good the learners had been at pronouncing vowel and consonant minimal pair words originally, 2) how often they had listened to songs good for pronouncing vowel and consonant minimal pair words, 3) how frequently they had moved their mouths to vowel and consonant minimal pair words of English songs, and 4) how much they thought the program would support and enhance their pronunciation learning of phonologically similar vowel and consonant minimal pair words. It has been found, for example, A) that the evaluation of this program is by no means low (Mean: 6.51 and SD: 1.23), suggesting that listening to music may support and enhance pronunciation learning, and B) that listening to consonant minimal pair words in English songs and moving the mouth to them are more related to the program’s evaluation (r =.69, p=.00 and r =.55, p=.00, respectively) than listening to vowel minimal pair words in English songs and moving the mouth to them (r =.45, p=.00 and r =.39, p=.01, respectively).

Keywords: minimal pair, music, pronunciation, song

Procedia PDF Downloads 291
6537 An Emphasis on Creativity-Speak Words Increases Crowdfunding Success

Authors: Trayan Kushev, E. Shaunn Mattingly, Andrew S. Manikas

Abstract:

This study utilizes computer-aided text analysis (CATA) on the descriptions of 248,614 Kickstarter crowdfunding campaigns to reveal that backers are more likely to provide funding to projects that contain a higher percentage of creativity-speak words. Further, this relationship is observed to be stronger for product-based campaigns (e.g., games, technology, design) and weaker for content-based campaigns (e.g., film, music, publishing). In addition, both positive linguistic tone and the use of words expressing gratitude in the text of the campaign strengthen the positive effect of creativity-speak on campaign success.

Keywords: creativity-speak, crowdfunding, entrepreneurship, gratitude, tone

Procedia PDF Downloads 47
6536 Fuzzy Set Approach to Study Appositives and Its Impact Due to Positional Alterations

Authors: E. Mike Dison, T. Pathinathan

Abstract:

Computing with Words (CWW) and Possibilistic Relational Universal Fuzzy (PRUF) are the two concepts which widely represent and measure the vaguely defined natural phenomenon. In this paper, we study the positional alteration of the phrases by which the impact of a natural language proposition gets affected and/or modified. We observe the gradations due to sensitivity/feeling of a statement towards the positional alterations. We derive the classification and modification of the meaning of words due to the positional alteration. We present the results with reference to set theoretic interpretations.

Keywords: appositive, computing with words, possibilistic relational universal fuzzy (PRUF), semantic sentiment analysis, set-theoretic interpretations

Procedia PDF Downloads 130
6535 Misconception on Multilingualism in Glorious Quran

Authors: Muhammed Unais

Abstract:

The holy Quran is a pure Arabic book completely ensured the absence of non Arabic term. If it was revealed in a multilingual way including various foreign languages besides the Arabic, it can be easily misunderstood that the Arabs became helpless to compile such a work positively responding to the challenge of Allah due to their lack of knowledge in other languages in which the Quran is compiled. As based on the presence of some non Arabic terms in Quran like Istabrq, Saradiq, Rabbaniyyoon, etc. some oriental scholars argued that the holy Quran is not a book revealed in Arabic. We can see some Muslim scholars who either support or deny the presence of foreign terms in Quran but all of them agree that the roots of these words suspected as non Arabic are from foreign languages and are assimilated to the Arabic and using as same in that foreign language. After this linguistic assimilation was occurred and the assimilated non Arabic words became familiar among the Arabs, the Quran revealed as using these words in such a way stating that all words it contains are Arabic either pure or assimilated. Hence the two of opinions around the authenticity and reliability of etymology of these words are right. Those who argue the presence of foreign words he is right by the way of the roots of that words are from foreign and those who argue its absence he is right for that are assimilated and changed as the pure Arabic. The possibility of multilingualism in a monolingual book is logically negative but its significance is being changed according to time and place. The problem of multilingualism in Quran is the misconception raised by some oriental scholars that the Arabs became helpless to compile a book equal to Quran not because of their weakness in Arabic but because the Quran is revealed in languages they are ignorant on them. Really, the Quran was revealed in pure Arabic, the most literate language of the Arabs, and the whole words and its meaning were familiar among them. If one become positively aware of the linguistic and cultural assimilation ever found in whole civilizations and cultural sets he will have not any question in this respect. In this paper the researcher intends to shed light on the possibility of multilingualism in a monolingual book and debates among scholars in this issue, foreign terms in Quran and the logical justifications along with the exclusive features of Quran.

Keywords: Quran, foreign Terms, multilingualism, language

Procedia PDF Downloads 361
6534 A Corpus-Based Study of Subtitling Religious Words into Arabic

Authors: Yousef Sahari, Eisa Asiri

Abstract:

Hollywood films are produced in an open and liberal context, and when subtitling for a more conservative and closed society such as an Arabic society, religious words can pose a thorny challenge for subtitlers. Using a corpus of 90 Hollywood films released between 2000 and 2018 and applying insights from Descriptive Translation Studies (Toury, 1995, 2012) and the dichotomy of domestication and foreignization, this paper investigates three main research questions: (1) What are the dominant religious terms and functions in the English subtitles? (2) What are the dominant translation strategies used in the translation of religious words? (3) Do these strategies tend to be SL-oriented or TL-oriented (domesticating or foreignising)? To answer the research questions above, a quantitative and qualitative analysis of the corpus is conducted, in which the researcher adopts a self-designed, parallel, aligned corpus of ninety films and their Arabic subtitles. A quantitative analysis is performed to compare the frequencies and distribution of religious words, their functions, and the translation strategies employed by the subtitlers of ninety films, with the aim of identifying similarities or differences in addition to identifying the impact of functions of religious terms on the use of subtitling strategies. Based on the quantitative analysis, a qualitative analysis is performed to identify any translational patterns in Arabic translations of religious words and the possible reasons for subtitlers’ choices. The results show that the function of religious words has a strong influence on the choice of subtitling strategies. Also, it is found that foreignization strategies are applied in about two-thirds of the total occurrences of religious words.

Keywords: religious terms, subtitling, audiovisual translation, modern standard arabic, subtitling strategies, english-arabic subtitling

Procedia PDF Downloads 128
6533 A Novel Machine Learning Approach to Aid Agrammatism in Non-fluent Aphasia

Authors: Rohan Bhasin

Abstract:

Agrammatism in non-fluent Aphasia Cases can be defined as a language disorder wherein a patient can only use content words ( nouns, verbs and adjectives ) for communication and their speech is devoid of functional word types like conjunctions and articles, generating speech of with extremely rudimentary grammar . Past approaches involve Speech Therapy of some order with conversation analysis used to analyse pre-therapy speech patterns and qualitative changes in conversational behaviour after therapy. We describe this approach as a novel method to generate functional words (prepositions, articles, ) around content words ( nouns, verbs and adjectives ) using a combination of Natural Language Processing and Deep Learning algorithms. The applications of this approach can be used to assist communication. The approach the paper investigates is : LSTMs or Seq2Seq: A sequence2sequence approach (seq2seq) or LSTM would take in a sequence of inputs and output sequence. This approach needs a significant amount of training data, with each training data containing pairs such as (content words, complete sentence). We generate such data by starting with complete sentences from a text source, removing functional words to get just the content words. However, this approach would require a lot of training data to get a coherent input. The assumptions of this approach is that the content words received in the inputs of both text models are to be preserved, i.e, won't alter after the functional grammar is slotted in. This is a potential limit to cases of severe Agrammatism where such order might not be inherently correct. The applications of this approach can be used to assist communication mild Agrammatism in non-fluent Aphasia Cases. Thus by generating these function words around the content words, we can provide meaningful sentence options to the patient for articulate conversations. Thus our project translates the use case of generating sentences from content-specific words into an assistive technology for non-Fluent Aphasia Patients.

Keywords: aphasia, expressive aphasia, assistive algorithms, neurology, machine learning, natural language processing, language disorder, behaviour disorder, sequence to sequence, LSTM

Procedia PDF Downloads 142
6532 The Algorithm of Semi-Automatic Thai Spoonerism Words for Bi-Syllable

Authors: Nutthapat Kaewrattanapat, Wannarat Bunchongkien

Abstract:

The purposes of this research are to study and develop the algorithm of Thai spoonerism words by semi-automatic computer programs, that is to say, in part of data input, syllables are already separated and in part of spoonerism, the developed algorithm is utilized, which can establish rules and mechanisms in Thai spoonerism words for bi-syllables by utilizing analysis in elements of the syllables, namely cluster consonant, vowel, intonation mark and final consonant. From the study, it is found that bi-syllable Thai spoonerism has 1 case of spoonerism mechanism, namely transposition in value of vowel, intonation mark and consonant of both 2 syllables but keeping consonant value and cluster word (if any). From the study, the rules and mechanisms in Thai spoonerism word were applied to develop as Thai spoonerism word software, utilizing PHP program. the software was brought to conduct a performance test on software execution; it is found that the program performs bi-syllable Thai spoonerism correctly or 99% of all words used in the test and found faults on the program at 1% as the words obtained from spoonerism may not be spelling in conformity with Thai grammar and the answer in Thai spoonerism could be more than 1 answer.

Keywords: algorithm, spoonerism, computational linguistics, Thai spoonerism

Procedia PDF Downloads 202
6531 Optimized Brain Computer Interface System for Unspoken Speech Recognition: Role of Wernicke Area

Authors: Nassib Abdallah, Pierre Chauvet, Abd El Salam Hajjar, Bassam Daya

Abstract:

In this paper, we propose an optimized brain computer interface (BCI) system for unspoken speech recognition, based on the fact that the constructions of unspoken words rely strongly on the Wernicke area, situated in the temporal lobe. Our BCI system has four modules: (i) the EEG Acquisition module based on a non-invasive headset with 14 electrodes; (ii) the Preprocessing module to remove noise and artifacts, using the Common Average Reference method; (iii) the Features Extraction module, using Wavelet Packet Transform (WPT); (iv) the Classification module based on a one-hidden layer artificial neural network. The present study consists of comparing the recognition accuracy of 5 Arabic words, when using all the headset electrodes or only the 4 electrodes situated near the Wernicke area, as well as the selection effect of the subbands produced by the WPT module. After applying the articial neural network on the produced database, we obtain, on the test dataset, an accuracy of 83.4% with all the electrodes and all the subbands of 8 levels of the WPT decomposition. However, by using only the 4 electrodes near Wernicke Area and the 6 middle subbands of the WPT, we obtain a high reduction of the dataset size, equal to approximately 19% of the total dataset, with 67.5% of accuracy rate. This reduction appears particularly important to improve the design of a low cost and simple to use BCI, trained for several words.

Keywords: brain-computer interface, speech recognition, artificial neural network, electroencephalography, EEG, wernicke area

Procedia PDF Downloads 248
6530 The Greek Root Word ‘Kos’ and the Trade of Ancient Greek with Tamil Nadu, India

Authors: D. Pugazhendhi

Abstract:

The ancient Greeks were forerunners in many fields than other societies. So, the Greeks were well connected with all the countries which were well developed during that time through trade route. In this connection, trading of goods from the ancient Greece to Tamil Nadu which is presently in India, though they are geographically far away, played an important role. In that way, the word and the goods related with kos and kare got exchanged between these two societies. So, it is necessary to compare the phonology and the morphological occurrences of these words that are found common both in the ancient Greek and Tamil literatures of the contemporary period. The results show that there were many words derived from the root kos with the basic meaning of ‘arrange’ in the ancient Greek language, but this is not the case in the usage of the word kare. In the ancient Tamil literature, the word ‘kos’ does not have any root and also had rare occurrences. But it was just the opposite in the case of the word ‘kare’. One of all the meanings of the word, which was derived from the root ‘kos’ in ancient Greek literature, is related with costly ornaments. This meaning seems to have close resemblance with the usage of word ‘kos’ in ancient Tamil literature. Also, the meaning of the word ‘kare’ in ancient Tamil literature is related with spices whereas, in the ancient Greek literature, its meaning is related to that of the cooking of meat using spices. Hence, the similarity seen in the meanings of these words ‘kos’ and ‘kare’ in both these languages provides lead for further study. More than that, the ancient literary resources which are available in both these languages ensure the export and import of gold and spices from the ancient Greek land to Tamil land.

Keywords: arrange, kare, Kos, ornament, Tamil

Procedia PDF Downloads 114
6529 SAMRA: Dataset in Al-Soudani Arabic Maghrebi Script for Recognition of Arabic Ancient Words Handwritten

Authors: Sidi Ahmed Maouloud, Cheikh Ba

Abstract:

Much of West Africa’s cultural heritage is written in the Al-Soudani Arabic script, which was widely used in West Africa before the time of European colonization. This Al-Soudani Arabic script is an African version of the Maghrebi script, in particular, the Al-Mebssout script. However, the local African qualities were incorporated into the Al-Soudani script in a way that gave it a unique African diversity and character. Despite the existence of several Arabic datasets in Oriental script, allowing for the analysis, layout, and recognition of texts written in these calligraphies, many Arabic scripts and written traditions remain understudied. In this paper, we present a dataset of words from Al-Soudani calligraphy scripts. This dataset consists of 100 images selected from three different manuscripts written in Al-Soudani Arabic script by different copyists. The primary source for this database was the libraries of Boston University and Cambridge University. This dataset highlights the unique characteristics of the Al-Soudani Arabic script as well as the new challenges it presents in terms of automatic word recognition of Arabic manuscripts. An HTR system based on a hybrid ANN (CRNN-CTC) is also proposed to test this dataset. SAMRA is a dataset of annotated Arabic manuscript words in the Al-Soudani script that can help researchers automatically recognize and analyze manuscript words written in this script.

Keywords: dataset, CRNN-CTC, handwritten words recognition, Al-Soudani Arabic script, HTR, manuscripts

Procedia PDF Downloads 82
6528 Infringement of Patent Rights with Doctrine of Equivalent for Turkey

Authors: Duru Helin Ozaner

Abstract:

Due to the doctrine of equivalent, the words in the claims' sentences are insufficient for the protection area provided by the patent registration. While this situation widens the boundaries of the protection area, it also obscures the boundaries of the protected area of patents. In addition, it creates distrust for third parties. Therefore, the doctrine of equivalent aims to establish a balance between the rights of patent owners and the legal security of third parties. The current legal system of Turkey has been tried to be created as a parallel judicial system to the widely applied regulations. Therefore, the regulations regarding the protection provided by patents in the current Turkish legal system are similar to many countries. However, infringement through equivalent is common by third parties. This study, it is aimed to explain that the protection provided by the patent is not only limited to the words of the claims but also the wide-ranging protection provided by the claims for the doctrine of equivalence. This study is important to determine the limits of the protection provided by the patent right holder and to indicate the importance of the equivalent elements of the protection granted to the patent right holder.

Keywords: patent, infringement, intellectual property, the doctrine of equivalent

Procedia PDF Downloads 188
6527 Text Data Preprocessing Library: Bilingual Approach

Authors: Kabil Boukhari

Abstract:

In the context of information retrieval, the selection of the most relevant words is a very important step. In fact, the text cleaning allows keeping only the most representative words for a better use. In this paper, we propose a library for the purpose text preprocessing within an implemented application to facilitate this task. This study has two purposes. The first, is to present the related work of the various steps involved in text preprocessing, presenting the segmentation, stemming and lemmatization algorithms that could be efficient in the rest of study. The second, is to implement a developed tool for text preprocessing in French and English. This library accepts unstructured text as input and provides the preprocessed text as output, based on a set of rules and on a base of stop words for both languages. The proposed library has been made on different corpora and gave an interesting result.

Keywords: text preprocessing, segmentation, knowledge extraction, normalization, text generation, information retrieval

Procedia PDF Downloads 63
6526 The Use of Punctuation by Primary School Students Writing Texts Collaboratively: A Franco-Brazilian Comparative Study

Authors: Cristina Felipeto, Catherine Bore, Eduardo Calil

Abstract:

This work aims to analyze and compare the punctuation marks (PM) in school texts of Brazilian and French students and the comments on these PM made spontaneously by the students during the ongoing text. Assuming textual genetics as an investigative field within a dialogical and enunciative approach, we defined a common methodological design in two 1st year classrooms (7 years old) of the primary school, one classroom in Brazil (Maceio) and the other one in France (Paris). Through a multimodal capture system of writing processes in real time and space (Ramos System), we recorded the collaborative writing proposal in dyads in each of the classrooms. This system preserves the classroom’s ecological characteristics and provides a video recording synchronized with dialogues, gestures and facial expressions of the students, the stroke of the pen’s ink on the sheet of paper and the movement of the teacher and students in the classroom. The multimodal register of the writing process allowed access to the text in progress and the comments made by the students on what was being written. In each proposed text production, teachers organized their students in dyads and requested that they should talk, combine and write a fictional narrative. We selected a Dyad of Brazilian students (BD) and another Dyad of French students (FD) and we have filmed 6 proposals for each of the dyads. The proposals were collected during the 2nd Term of 2013 (Brazil) and 2014 (France). In 6 texts written by the BD there were identified 39 PMs and 825 written words (on average, a PM every 23 words): Of these 39 PMs, 27 were highlighted orally and commented by either student. In the texts written by the FD there were identified 48 PMs and 258 written words (on average, 1 PM every 5 words): Of these 48 PM, 39 were commented by the French students. Unlike what the studies on punctuation acquisition point out, the PM that occurred the most were hyphens (BD) and commas (FD). Despite the significant difference between the types and quantities of PM in the written texts, the recognition of the need for writing PM in the text in progress and the comments have some common characteristics: i) the writing of the PM was not anticipated in relation to the text in progress, then they were added after the end of a sentence or after the finished text itself; ii) the need to add punctuation marks in the text came after one of the students had ‘remembered’ that a particular sign was needed; iii) most of the PM inscribed were not related to their linguistic functions, but the graphic-visual feature of the text; iv) the comments justify or explain the PM, indicating metalinguistic reflections made by the students. Our results indicate how the comments of the BD and FD express the dialogic and subjective nature of knowledge acquisition. Our study suggests that the initial learning of PM depends more on its graphic features and interactional conditions than on its linguistic functions.

Keywords: collaborative writing, erasure, graphic marks, learning, metalinguistic awareness, textual genesis

Procedia PDF Downloads 141
6525 Effect of Noise Reducing Headphones on the Short-Term Memory Recall of College Students

Authors: Gregory W. Smith, Paul J. Riccomini

Abstract:

The goal of this empirical inquiry is to explore the effect of noise reducing headphones on the short-term memory recall of college students. Immediately following the presentation (via PowerPoint) of 12 unrelated and randomly selected one- and two-syllable words, students were asked to recall as many words as possible. Using a linear model with conditions marked with binary indicators, we examined the frequency and accuracy of words that were recalled. The findings indicate that for some students, a reduction of noise has a significant positive impact on their ability to recall information. As classrooms become more aurally distracting due to the implementation of cooperative learning activities, these findings highlight the need for a quiet learning environment for some learners.

Keywords: auditory distraction, education, instruction, noise, working memory

Procedia PDF Downloads 300
6524 Moving on or Deciding to Let Go: The Effects of Emotional and Decisional Forgiveness on Intentional Forgetting

Authors: Saima Noreen, Malcolm D. MacLeod

Abstract:

Different types of forgiveness (emotional and decisional) have been shown to have differential effects on incidental forgetting of information related to a prior transgression. The present study explored the extent to which emotional and decisional forgiveness also influenced intentional forgetting; that is, the extent to which forgetting occurs following an explicit instruction to forget. Using the List-Method Directed Forgetting (LMDF) paradigm, 236 participants were presented with a hypothetical transgression and then assigned to an emotional forgiveness, a decisional forgiveness, or a no-forgiveness manipulation. Participants were then presented with two-word lists each comprising transgression-relevant and transgression-irrelevant words. Following the presentation of the first list, participants were told to either remember or forget the previously learned list of words. Participants in the emotional forgiveness condition were found to remember fewer relevant and more irrelevant transgression-related words, while the opposite was true for both decisional forgiveness and no-forgiveness conditions. Furthermore, when directed to forget words in List 1, participants in the decisional and no-forgiveness conditions were less able to forget relevant transgression-related words in comparison to participants in the emotional forgiveness condition. This study suggests that emotional forgiveness plays a pivotal role in the intentional forgetting of transgression-related information. The potential implications of these findings for coping with unpleasant incidents will be considered.

Keywords: decisional forgiveness, directed forgetting, emotional forgiveness, executive control, forgiveness

Procedia PDF Downloads 204
6523 The Usage of Negative Emotive Words in Twitter

Authors: Martina Katalin Szabó, István Üveges

Abstract:

In this paper, the usage of negative emotive words is examined on the basis of a large Hungarian twitter-database via NLP methods. The data is analysed from a gender point of view, as well as changes in language usage over time. The term negative emotive word refers to those words that, on their own, without context, have semantic content that can be associated with negative emotion, but in particular cases, they may function as intensifiers (e.g. rohadt jó ’damn good’) or a sentiment expression with positive polarity despite their negative prior polarity (e.g. brutális, ahogy ez a férfi rajzol ’it’s awesome (lit. brutal) how this guy draws’. Based on the findings of several authors, the same phenomenon can be found in other languages, so it is probably a language-independent feature. For the recent analysis, 67783 tweets were collected: 37818 tweets (19580 tweets written by females and 18238 tweets written by males) in 2016 and 48344 (18379 tweets written by females and 29965 tweets written by males) in 2021. The goal of the research was to make up two datasets comparable from the viewpoint of semantic changes, as well as from gender specificities. An exhaustive lexicon of Hungarian negative emotive intensifiers was also compiled (containing 214 words). After basic preprocessing steps, tweets were processed by ‘magyarlanc’, a toolkit is written in JAVA for the linguistic processing of Hungarian texts. Then, the frequency and collocation features of all these words in our corpus were automatically analyzed (via the analysis of parts-of-speech and sentiment values of the co-occurring words). Finally, the results of all four subcorpora were compared. Here some of the main outcomes of our analyses are provided: There are almost four times fewer cases in the male corpus compared to the female corpus when the negative emotive intensifier modified a negative polarity word in the tweet (e.g., damn bad). At the same time, male authors used these intensifiers more frequently, modifying a positive polarity or a neutral word (e.g., damn good and damn big). Results also pointed out that, in contrast to female authors, male authors used these words much more frequently as a positive polarity word as well (e.g., brutális, ahogy ez a férfi rajzol ’it’s awesome (lit. brutal) how this guy draws’). We also observed that male authors use significantly fewer types of emotive intensifiers than female authors, and the frequency proportion of the words is more balanced in the female corpus. As for changes in language usage over time, some notable differences in the frequency and collocation features of the words examined were identified: some of the words collocate with more positive words in the 2nd subcorpora than in the 1st, which points to the semantic change of these words over time.

Keywords: gender differences, negative emotive words, semantic changes over time, twitter

Procedia PDF Downloads 179