Search results for: Adopted words
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1049

Search results for: Adopted words

1049 Functioning of Turkic Elements in Modern Hindi

Authors: B. S. Bokuleva, R. A. Avakova, A. A. Sultangubieva, U. Schamiloglu

Abstract:

It is discussed about modern usage of adopted words and their vocabularies, Turkism usage fields, phonetic, grammatical and lexis-semantic assimilation of the typological-morphological structures of entering to different Hindi languages in comparative typological aspects in this scientific article. The lexis vocabulary is rich, the prevalence area is wide and it has researched the entering process of vocabulary into the great languages of Turkic elements from the speakers- numbers. The research work has worked on the base of Hindi vocabulary.

Keywords: Adopted words, language communications, Turkism, Turkic languages.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2118
1048 Component-based Segmentation of Words from Handwritten Arabic Text

Authors: Jawad H AlKhateeb, Jianmin Jiang, Jinchang Ren, Stan S Ipson

Abstract:

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.

Keywords: Arabic OCR, off-line recognition, Baseline estimation, Word segmentation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2164
1047 Text Retrieval Relevance Feedback Techniques for Bag of Words Model in CBIR

Authors: Nhu Van NGUYEN, Jean-Marc OGIER, Salvatore TABBONE, Alain BOUCHER

Abstract:

The state-of-the-art Bag of Words model in Content- Based Image Retrieval has been used for years but the relevance feedback strategies for this model are not fully investigated. Inspired from text retrieval, the Bag of Words model has the ability to use the wealth of knowledge and practices available in text retrieval. We study and experiment the relevance feedback model in text retrieval for adapting it to image retrieval. The experiments show that the techniques from text retrieval give good results for image retrieval and that further improvements is possible.

Keywords: Relevance feedback, bag of words model, probabilistic model, vector space model, image retrieval

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2061
1046 Intensifier as Changed from the Impolite Word in Thai

Authors: Methawee Yuttapongtada

Abstract:

Intensifier is the linguistic term and device that is generally found in different languages in order to enhance and give additional quantity, quality or emotion to the words of each language. In fact, each language in the world has both of the similar and dissimilar intensifying device. More specially, the wide variety of intensifying device is used for Thai language and one of those is usage of the impolite word or the word that used to mean something negative as intensifier. The data collection in this study was done throughout the spoken language style by collecting from intensifiers regarded as impolite words because these words as employed in the other contexts will be held as the rude, swear words or the words with negative meaning. Then, backward study to the past was done in order to consider the historical change. Explanation of the original meaning and the contexts of words use from the past till the present time were done by use of both textual documents and dictionaries available in different periods. It was found that regarding the semantics and pragmatic aspects, subjectification also is the significant motivation that changed the impolite words to intensifiers. At last, it can explain pathway of the semantic change of these very words undoubtedly. Moreover, it is found that use tendency in the impolite word or the word that used to mean something negative will more be increased and this phenomenon is commonly found in many languages in the world and results of this research may support to the belief that human language in the world is universal and the same still reflected that human has the fundamental thought as the same to each other basically.

Keywords: Impolite word, intensifier, Thai, semantic change.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
1045 Information Filtering using Index Word Selection based on the Topics

Authors: Takeru YOKOI, Hidekazu YANAGIMOTO, Sigeru OMATU

Abstract:

We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user-s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.

Keywords: Information Filtering, Sparse NMF, Index wordSelection, User Profile, Chi-squared Measure

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407
1044 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2406
1043 Determining Senses for Word Sense Disambiguation in Turkish

Authors: Zeynep Orhan, Zeynep Altan

Abstract:

Word sense disambiguation is an important intermediate stage for many natural language processing applications. The senses of an ambiguous word are the classification of usages for that specific word. This paper deals with the methodologies of determining the senses for a given word if they can not be obtained from an already available resource like WordNet. We offer a method that helps us to determine the sense boundaries gradually. In this method, first we decide on some features that are thought to be effective on the senses and divide the instances first into two, then according to the results of evaluations we continue dividing instances gradually. In a second method we use the pseudo words. We devise artificial words depending on some criteria and evaluate classification algorithms on these previously classified words.

Keywords: Word sense disambiguation, sense determination, pseudo words, sense granularity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1363
1042 Online Topic Model for Broadcasting Contents Using Semantic Correlation Information

Authors: Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script, which is a series of texts including directions and dialogues. The other is blogposts, which possesses relatively abstracted contents, stories, and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. When unseen words appear, it needs a method to reflect to existing topic. In this paper, we introduce a semantic vocabulary expansion method to reflect unseen words. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can discover more salient topics for broadcasting contents.

Keywords: Broadcasting script analysis, topic expansion, semantic correlation analysis, word2vec.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1707
1041 A Novel Approach to Persian Online Hand Writing Recognition

Authors: Ramin Halavati, Mansour Jamzad, Mahdieh Soleymani

Abstract:

Persian (Farsi) script is totally cursive and each character is written in several different forms depending on its former and later characters in the word. These complexities make automatic handwriting recognition of Persian a very hard problem and there are few contributions trying to work it out. This paper presents a novel practical approach to online recognition of Persian handwriting which is based on representation of inputs and patterns with very simple visual features and comparison of these simple terms. This recognition approach is tested over a set of Persian words and the results have been quite acceptable when the possible words where unknown and they were almost all correct in cases that the words where chosen from a prespecified list.

Keywords: Image Processing, Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1269
1040 Fuzzy Set Approach to Study Appositives and Its Impact Due to Positional Alterations

Authors: E. Mike Dison, T. Pathinathan

Abstract:

Computing with Words (CWW) and Possibilistic Relational Universal Fuzzy (PRUF) are the two concepts which widely represent and measure the vaguely defined natural phenomenon. In this paper, we study the positional alteration of the phrases by which the impact of a natural language proposition gets affected and/or modified. We observe the gradations due to sensitivity/feeling of a statement towards the positional alterations. We derive the classification and modification of the meaning of words due to the positional alteration. We present the results with reference to set theoretic interpretations.

Keywords: Appositive, computing with words, PRUF, semantic sentiment analysis, set theoretic interpretations.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 780
1039 Words Reordering based on Statistical Language Model

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou

Abstract:

There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.

Keywords: Permutations filtering, Statistical languagemodel N-grams, Word order errors

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1546
1038 Web Search Engine Based Naming Procedure for Independent Topic

Authors: Takahiro Nishigaki, Takashi Onoda

Abstract:

In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.

Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 436
1037 Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words

Authors: Mohd Pouzi Hamzah, Tengku Mohd Tengku Sembok

Abstract:

Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.

Keywords: Information Retrieval, Malay Language, Semantic Relationship, Retrieval Effectiveness, Conceptual Indexing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1378
1036 Optical Multicast over OBS Networks: An Approach Based On Code-Words and Tunable Decoders

Authors: Maha Sliti, Walid Abdallah, Noureddine Boudriga

Abstract:

In the frame of this work, we present an optical multicasting approach based on optical code-words. Our approach associates, in the edge node, an optical code-word to a group multicast address. In the core node, a set of tunable decoders are used to send a traffic data to multiple destinations based on the received code-word. The use of code-words, which correspond to the combination of an input port and a set of output ports, allows the implementation of an optical switching matrix. At the reception of a burst, it will be delayed in an optical memory. And, the received optical code-word is split to a set of tunable optical decoders. When it matches a configured code-word, the delayed burst is switched to a set of output ports.

Keywords: Optical multicast, optical burst switching networks, optical code-words, tunable decoder, virtual optical memory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1630
1035 Optical Multicast over OBS Networks: An Approach Based On Code-Words and Tunable Decoders

Authors: Maha Sliti, Walid Abdallah, Noureddine Boudriga

Abstract:

In the frame of this work, we present an optical multicasting approach based on optical code-words. Our approach associates, in the edge node, an optical code-word to a group multicast address. In the core node, a set of tunable decoders are used to send a traffic data to multiple destinations based on the received code-word. The use of code-words, which correspond to the combination of an input port and a set of output ports, allows the implementation of an optical switching matrix. At the reception of a burst, it will be delayed in an optical memory. And, the received optical code-word is split to a set of tunable optical decoders. When it matches a configured code-word, the delayed burst is switched to a set of output ports.

Keywords: Optical multicast, optical burst switching networks, optical code-words, tunable decoder, virtual optical memory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1714
1034 Aspect-Level Sentiment Analysis with Multi-Channel and Graph Convolutional Networks

Authors: Jiajun Wang, Xiaoge Li

Abstract:

The purpose of the aspect-level sentiment analysis task is to identify the sentiment polarity of aspects in a sentence. Currently, most methods mainly focus on using neural networks and attention mechanisms to model the relationship between aspects and context, but they ignore the dependence of words in different ranges in the sentence, resulting in deviation when assigning relationship weight to other words other than aspect words. To solve these problems, we propose an aspect-level sentiment analysis model that combines a multi-channel convolutional network and graph convolutional network (GCN). Firstly, the context and the degree of association between words are characterized by Long Short-Term Memory (LSTM) and self-attention mechanism. Besides, a multi-channel convolutional network is used to extract the features of words in different ranges. Finally, a convolutional graph network is used to associate the node information of the dependency tree structure. We conduct experiments on four benchmark datasets. The experimental results are compared with those of other models, which shows that our model is better and more effective.

Keywords: Aspect-level sentiment analysis, attention, multi-channel convolution network, graph convolution network, dependency tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 404
1033 Structural Analysis of Username Segment in E-Mail Addresses of Engineering Institutes of Gujarat State of India

Authors: Jatinderkumar R. Saini

Abstract:

E-mail has become a key mechanism of electronic communication. This is true for professional organizations that like to communicate with their subjects online and are slowly shifting to paper-less office. The current paper focuses specifically on academic institutions offering Engineering course in Gujarat state and attempts for textual analysis of the usernames of the institutional e-mail addresses. We found that the institutions tend to design the username segment of their e-mail addresses by choosing words or combination of words from specific categories. The paper also highlights the use of special characters, digits and random words in designing the usernames. On the sidelines, the paper lists the style of employing department names and designations for the design process. To the best of our knowledge, this is the first formal attempt to analyze the selection of words employed for designing username segment of e-mail addresses of engineering institutions.

Keywords: E-mail address, Institute, Engineering, Username.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1641
1032 The Algorithm of Semi-Automatic Thai Spoonerism Words for Bi-Syllable

Authors: Nutthapat Kaewrattanapat, Wannarat Bunchongkien

Abstract:

The purposes of this research are to study and develop the algorithm of Thai spoonerism words by semi-automatic computer programs, that is to say, in part of data input, syllables are already separated and in part of spoonerism, the developed algorithm is utilized, which can establish rules and mechanisms in Thai spoonerism words for bi-syllables by utilizing analysis in elements of the syllables, namely cluster consonant, vowel, intonation mark and final consonant. From the study, it is found that bi-syllable Thai spoonerism has 1 case of spoonerism mechanism, namely transposition in value of vowel, intonation mark and consonant of both 2 syllables but keeping consonant value and cluster word (if any). From the study, the rules and mechanisms in Thai spoonerism word were applied to develop as Thai spoonerism word software, utilizing PHP program. the software was brought to conduct a performance test on software execution; it is found that the program performs bi-syllable Thai spoonerism correctly or 99% of all words used in the test and found faults on the program at 1% as the words obtained from spoonerism may not be spelling in conformity with Thai grammar and the answer in Thai spoonerism could be more than 1 answer.

Keywords: Algorithm, Spoonerism, Computational Linguistics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2309
1031 Computable Difference Matrix for Synonyms in the Holy Quran

Authors: Mohamed Ali AlShaari, Khalid M. ElFitori

Abstract:

In the field of Quran Studies known as GHAREEB AL QURAN (The study of the meanings of strange words and structures in Holy Quran), it is difficult to distinguish some pragmatic meanings from conceptual meanings. One who wants to study this subject may need to look for a common usage between any two words or more; to understand general meaning, and sometimes may need to look for common differences between them, even if there are synonyms (word sisters).

Some of the distinguished scholars of Arabic linguistics believe that there are no synonym words, they believe in varieties of meaning and multi-context usage. Based on this viewpoint, our method was designedto look for synonyms of a word, then the differences that distinct the word and their synonyms.

There are many available books that use such a method e.g. synonyms books, dictionaries, glossaries, and some books on the interpretations of strange vocabulary of the Holy Quran, but it is difficult to look up words in these written works.

For that reason, we proposed a logical entity, which we called Differences Matrix (DM).

DM groups the synonyms words to extract the relations between them and to know the general meaning, which defines the skeleton of all word synonyms; this meaning is expressed by a word of its sisters.

In Differences Matrix, we used  the sisters(words) as titles for rows and columns, and in the obtained  cells we tried to define the row title (word) by using column title (her sister), so the relations between sisters appear, the expected result is well defined groups of sisters for each word. We represented the obtained results formally, and used the defined groups as a base for building the ontology of the Holy Quran synonyms.

Keywords: Quran, synonyms, Differences Matrix, ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2066
1030 N-Grams: A Tool for Repairing Word Order Errors in Ill-formed Texts

Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Konstantinos Mamouras

Abstract:

This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.

Keywords: Permutations filtering, Statistical language model N-grams, Word order errors, TOEFL

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1617
1029 Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.

Keywords: Body Enhancement, Lexicon, Medicinal, Slang, Unigram, Unsolicited Bulk e-mail (UBE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1772
1028 Analysis of the Effect of 1980 Transformation on the Foreign Trade of Turkey with Chow Test

Authors: Zeynep Karaçor, Savaş Erdoğan, Perihan Hazel Er

Abstract:

While import-substituting industrialization policy constitute the basis for the industrialization strategies of the 1960s and 1970s in Turkey, this policy was no longer sustainable by the 1980s. For this reason, export-oriented industrialization policy was adopted with the decisions taken on January 24, 1980. In other words, the post-1980 period, Turkey's economy has adopted outwardoriented industrialization strategy. In this study, it is aimed to analyze the effect of the change in economic structure on foreign trade with the transformation of foreign trade and industrialization policies in the post-1980 period. In this respect, in order to analyze the relationship between import, export and economic growth by using variables of the 1960-2011 period, Chow test was applied. In the analysis the reason for using Chow test is whether there is any difference in economic terms between import-substituting industrialization policy applied in the 1960-1980 period and the 1981-2011 period during which exportoriented industrialization policy was applied as a result of the structural transformation.

Keywords: Chow Test, Export-Oriented Industrialization Policy, Import-Substituting Industrialization Policy, Turkey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1440
1027 Speech Recognition Using Scaly Neural Networks

Authors: Akram M. Othman, May H. Riadh

Abstract:

This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.

Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1695
1026 A Very Efficient Pseudo-Random Number Generator Based On Chaotic Maps and S-Box Tables

Authors: M. Hamdi, R. Rhouma, S. Belghith

Abstract:

Generating random numbers are mainly used to create secret keys or random sequences. It can be carried out by various techniques. In this paper we present a very simple and efficient pseudo random number generator (PRNG) based on chaotic maps and S-Box tables. This technique adopted two main operations one to generate chaotic values using two logistic maps and the second to transform them into binary words using random S-Box tables. The simulation analysis indicates that our PRNG possessing excellent statistical and cryptographic properties.

Keywords: Chaotic map, Cryptography, Random Numbers, Statistical tests, S-box.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3823
1025 Greek Compounds: A Challenging Case for the Parsing Techniques of PC-KIMMO v.2

Authors: Angela Ralli, Eleni Galiotou

Abstract:

In this paper we describe the recognition process of Greek compound words using the PC-KIMMO software. We try to show certain limitations of the system with respect to the principles of compound formation in Greek. Moreover, we discuss the computational processing of phenomena such as stress and syllabification which are indispensable for the analysis of such constructions and we try to propose linguistically-acceptable solutions within the particular system.

Keywords: Morpho-phonological parsing, compound words, two-level morphology, natural language processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1560
1024 The Role of Ideophones: Phonological and Morphological Characteristics in Literature

Authors: Cristina Bahón Arnaiz

Abstract:

Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature. 

Keywords: Ideophones, mimetic words, phonomimes, phenomimes, psychomimes, sound symbolism.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1050
1023 Abai Kunanbayev's Role in Enrichment of the Kazakh Language

Authors: Y.M. Paltore, B.N. Zhubatova, A.A. Mustafayeva

Abstract:

Abai Kunanbayev is famous for being enlightener, composer, interpreter, social agent, philosopher, reformer, who wanted to enrich Kazakh literature by emergence with Russian and European culture, and also as a founder of Kazakh written literary language. Abai Kunanbayev was born in 1845 in East Kazakhstan area and passed away in 1904 in his hometown. His oeuvre absorbed and reflected all changes in the life of Kazakh society of the second half of XIX century. Because ХІХ century, especially its second half, was an important transition period for Kazakhstan, which radically changed traditional way of Kazakh society and predetermined further development in consequence of activation of Russian colonial policy and approval of commodity-money relations in Steppe Land.Abai Kunanbayev, besides Arabic and Persian common words and loanwords from Quran in his words of edification, had used a lot of words of Arabic, Persian, Latin, Russian, Nogai, Shaghatai, Polish, Greek, Turkish, which are used in the Kazakh language.

Keywords: Abai Kunanbayev, the Kazakh, Russian languages, literature

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2809
1022 Recognition of Noisy Words Using the Time Delay Neural Networks Approach

Authors: Khenfer-Koummich Fatima, Mesbahi Larbi, Hendel Fatiha

Abstract:

This paper presents a recognition system for isolated words like robot commands. It’s carried out by Time Delay Neural Networks; TDNN. To teleoperate a robot for specific tasks as turn, close, etc… In industrial environment and taking into account the noise coming from the machine. The choice of TDNN is based on its generalization in terms of accuracy, in more it acts as a filter that allows the passage of certain desirable frequency characteristics of speech; the goal is to determine the parameters of this filter for making an adaptable system to the variability of speech signal and to noise especially, for this the back propagation technique was used in learning phase. The approach was applied on commands pronounced in two languages separately: The French and Arabic. The results for two test bases of 300 spoken words for each one are 87%, 97.6% in neutral environment and 77.67%, 92.67% when the white Gaussian noisy was added with a SNR of 35 dB.

Keywords: Neural networks, Noise, Speech Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1893
1021 Preliminary Study of the Phonological Development in Three- and Four-Year-Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of a research of phonological processes in three- and four-year-old children. A test, created for the purpose of the study, was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the research is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, elision of sound, metathesis of sound, elision of syllable, elision of consonants clustered in a syllable. Measuring the correlation between average length of repeated speech and average length of generated speech, the analysis does not prove that the more words a child can repeat in part “repeated speech”, the more words they can be expected to generate in part “generating sentence”. The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: Articulation, phonology, speech, language development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 303
1020 Adaptive Naïve Bayesian Anti-Spam Engine

Authors: Wojciech P. Gajewski

Abstract:

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.

Keywords: Text classification, naïve Bayesian classification, spam, email.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4365