Search results for: mimetic words
327 A Practical Solution of a Plant Pipes Monitoring System Using Bio-mimetic Robots
Authors: Seung You Na, Daejung Shin, Jin Young Kim, Bae-Ho Lee, Ji-Sung Lee
Abstract:
There has been a growing interest in the field of bio-mimetic robots that resemble the shape of an insect or an aquatic animal, among many others. One bio-mimetic robot serves the purpose of exploring pipelines, spotting any troubled areas or malfunctions and reporting its data. Moreover, the robot is able to prepare for and react to any abnormal routes in the pipeline. In order to move effectively inside a pipeline, the robot-s movement will resemble that of a lizard. When situated in massive pipelines with complex routes, the robot places fixed sensors in several important spots in order to complete its monitoring. This monitoring task is to prevent a major system failure by preemptively recognizing any minor or partial malfunctions. Areas uncovered by fixed sensors are usually impossible to provide real-time observation and examination, and thus are dependant on periodical offline monitoring. This paper provides the Monitoring System that is able to monitor the entire area of pipelines–with and without fixed sensors–by using the bio-mimetic robot.Keywords: Bio-mimetic robots, Plant pipes monitoring, Mobileand active monitoring.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1588326 Pipelines Monitoring System Using Bio-mimetic Robots
Authors: Seung You Na, Daejung Shin, Jin Young Kim, Seong-Joon Baek, Bae-Ho Lee
Abstract:
Recently there has been a growing interest in the field of bio-mimetic robots that resemble the behaviors of an insect or an aquatic animal, among many others. One of various bio-mimetic robot applications is to explore pipelines, spotting any troubled areas or malfunctions and reporting its data. Moreover, the robot is able to prepare for and react to any abnormal routes in the pipeline. Special types of mobile robots are necessary for the pipeline monitoring tasks. In order to move effectively along a pipeline, the robot-s movement will resemble that of insects or crawling animals. When situated in massive pipelines with complex routes, the robot places fixed sensors in several important spots in order to complete its monitoring. This monitoring task is to prevent a major system failure by preemptively recognizing any minor or partial malfunctions. Areas uncovered by fixed sensors are usually impossible to provide real-time observation and examination, and thus are dependent on periodical offline monitoring. This paper proposes a monitoring system that is able to monitor the entire area of pipelines–with and without fixed sensors–by using the bio-mimetic robot.Keywords: Bio-mimetic robots, Plant pipes monitoring, Mobile and active monitoring.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2268325 Development of a Pipeline Monitoring System by Bio-mimetic Robots
Authors: Seung You Na, Daejung Shin, Jin Young Kim, Joo Hyun Jung, Yong-Gwan Won
Abstract:
To explore pipelines is one of various bio-mimetic robot applications. The robot may work in common buildings such as between ceilings and ducts, in addition to complicated and massive pipeline systems of large industrial plants. The bio-mimetic robot finds any troubled area or malfunction and then reports its data. Importantly, it can not only prepare for but also react to any abnormal routes in the pipeline. The pipeline monitoring tasks require special types of mobile robots. For an effective movement along a pipeline, the movement of the robot will be similar to that of insects or crawling animals. During its movement along the pipelines, a pipeline monitoring robot has an important task of finding the shapes of the approaching path on the pipes. In this paper we propose an effective solution to the pipeline pattern recognition, based on the fuzzy classification rules for the measured IR distance data.Keywords: Bio-mimetic robots, Plant pipes monitoring, Pipepattern recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649324 The Role of Ideophones: Phonological and Morphological Characteristics in Literature
Authors: Cristina Bahón Arnaiz
Abstract:
Many Asian languages, such as Korean and Japanese, are well-known for their wide use of sound symbolic words or ideophones. This is a very particular characteristic which enriches its lexicon hugely. Ideophones are a class of sound symbolic words that utilize sound symbolism to express aspects, states, emotions, or conditions that can be experienced through the senses, such as shape, color, smell, action or movement. Ideophones have very particular characteristics in terms of sound symbolism and morphology, which distinguish them from other words. The phonological characteristics of ideophones are vowel ablaut or vowel gradation and consonant mutation. In the case of Korean, there are light vowels and dark vowels. Depending on the type of vowel that is used, the meaning will slightly change. Consonant mutation, also known as consonant ablaut, contributes to the level of intensity, emphasis, and volume of an expression. In addition to these phonological characteristics, there is one main morphological singularity, which is reduplication and it carries the meaning of continuity, repetition, intensity, emphasis, and plurality. All these characteristics play an important role in both linguistics and literature as they enhance the meaning of what is trying to be expressed with incredible semantic detail, expressiveness, and rhythm. The following study will analyze the ideophones used in a single paragraph of a Korean novel, which add incredible yet subtle detail to the meaning of the words, and advance the expressiveness and rhythm of the text. The results from analyzing one paragraph from a novel, after presenting the phonological and morphological characteristics of Korean ideophones, will evidence the important role that ideophones play in literature.
Keywords: Ideophones, mimetic words, phonomimes, phenomimes, psychomimes, sound symbolism.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1106323 Component-based Segmentation of Words from Handwritten Arabic Text
Authors: Jawad H AlKhateeb, Jianmin Jiang, Jinchang Ren, Stan S Ipson
Abstract:
Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition.Keywords: Arabic OCR, off-line recognition, Baseline estimation, Word segmentation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2206322 Text Retrieval Relevance Feedback Techniques for Bag of Words Model in CBIR
Authors: Nhu Van NGUYEN, Jean-Marc OGIER, Salvatore TABBONE, Alain BOUCHER
Abstract:
The state-of-the-art Bag of Words model in Content- Based Image Retrieval has been used for years but the relevance feedback strategies for this model are not fully investigated. Inspired from text retrieval, the Bag of Words model has the ability to use the wealth of knowledge and practices available in text retrieval. We study and experiment the relevance feedback model in text retrieval for adapting it to image retrieval. The experiments show that the techniques from text retrieval give good results for image retrieval and that further improvements is possible.Keywords: Relevance feedback, bag of words model, probabilistic model, vector space model, image retrieval
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117321 Intensifier as Changed from the Impolite Word in Thai
Authors: Methawee Yuttapongtada
Abstract:
Intensifier is the linguistic term and device that is generally found in different languages in order to enhance and give additional quantity, quality or emotion to the words of each language. In fact, each language in the world has both of the similar and dissimilar intensifying device. More specially, the wide variety of intensifying device is used for Thai language and one of those is usage of the impolite word or the word that used to mean something negative as intensifier. The data collection in this study was done throughout the spoken language style by collecting from intensifiers regarded as impolite words because these words as employed in the other contexts will be held as the rude, swear words or the words with negative meaning. Then, backward study to the past was done in order to consider the historical change. Explanation of the original meaning and the contexts of words use from the past till the present time were done by use of both textual documents and dictionaries available in different periods. It was found that regarding the semantics and pragmatic aspects, subjectification also is the significant motivation that changed the impolite words to intensifiers. At last, it can explain pathway of the semantic change of these very words undoubtedly. Moreover, it is found that use tendency in the impolite word or the word that used to mean something negative will more be increased and this phenomenon is commonly found in many languages in the world and results of this research may support to the belief that human language in the world is universal and the same still reflected that human has the fundamental thought as the same to each other basically.
Keywords: Impolite word, intensifier, Thai, semantic change.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309320 Information Filtering using Index Word Selection based on the Topics
Authors: Takeru YOKOI, Hidekazu YANAGIMOTO, Sigeru OMATU
Abstract:
We have proposed an information filtering system using index word selection from a document set based on the topics included in a set of documents. This method narrows down the particularly characteristic words in a document set and the topics are obtained by Sparse Non-negative Matrix Factorization. In information filtering, a document is often represented with the vector in which the elements correspond to the weight of the index words, and the dimension of the vector becomes larger as the number of documents is increased. Therefore, it is possible that useless words as index words for the information filtering are included. In order to address the problem, the dimension needs to be reduced. Our proposal reduces the dimension by selecting index words based on the topics included in a document set. We have applied the Sparse Non-negative Matrix Factorization to the document set to obtain these topics. The filtering is carried out based on a centroid of the learning document set. The centroid is regarded as the user-s interest. In addition, the centroid is represented with a document vector whose elements consist of the weight of the selected index words. Using the English test collection MEDLINE, thus, we confirm the effectiveness of our proposal. Hence, our proposed selection can confirm the improvement of the recommendation accuracy from the other previous methods when selecting the appropriate number of index words. In addition, we discussed the selected index words by our proposal and we found our proposal was able to select the index words covered some minor topics included in the document set.Keywords: Information Filtering, Sparse NMF, Index wordSelection, User Profile, Chi-squared Measure
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1456319 A Text Clustering System based on k-means Type Subspace Clustering and Ontology
Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang
Abstract:
This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.
Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2462318 Determining Senses for Word Sense Disambiguation in Turkish
Authors: Zeynep Orhan, Zeynep Altan
Abstract:
Word sense disambiguation is an important intermediate stage for many natural language processing applications. The senses of an ambiguous word are the classification of usages for that specific word. This paper deals with the methodologies of determining the senses for a given word if they can not be obtained from an already available resource like WordNet. We offer a method that helps us to determine the sense boundaries gradually. In this method, first we decide on some features that are thought to be effective on the senses and divide the instances first into two, then according to the results of evaluations we continue dividing instances gradually. In a second method we use the pseudo words. We devise artificial words depending on some criteria and evaluate classification algorithms on these previously classified words.
Keywords: Word sense disambiguation, sense determination, pseudo words, sense granularity.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1410317 Online Topic Model for Broadcasting Contents Using Semantic Correlation Information
Authors: Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park, Sang-Jo Lee
Abstract:
This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script, which is a series of texts including directions and dialogues. The other is blogposts, which possesses relatively abstracted contents, stories, and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. When unseen words appear, it needs a method to reflect to existing topic. In this paper, we introduce a semantic vocabulary expansion method to reflect unseen words. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can discover more salient topics for broadcasting contents.
Keywords: Broadcasting script analysis, topic expansion, semantic correlation analysis, word2vec.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760316 A Novel Approach to Persian Online Hand Writing Recognition
Authors: Ramin Halavati, Mansour Jamzad, Mahdieh Soleymani
Abstract:
Persian (Farsi) script is totally cursive and each character is written in several different forms depending on its former and later characters in the word. These complexities make automatic handwriting recognition of Persian a very hard problem and there are few contributions trying to work it out. This paper presents a novel practical approach to online recognition of Persian handwriting which is based on representation of inputs and patterns with very simple visual features and comparison of these simple terms. This recognition approach is tested over a set of Persian words and the results have been quite acceptable when the possible words where unknown and they were almost all correct in cases that the words where chosen from a prespecified list.
Keywords: Image Processing, Pattern Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1330315 Fuzzy Set Approach to Study Appositives and Its Impact Due to Positional Alterations
Authors: E. Mike Dison, T. Pathinathan
Abstract:
Computing with Words (CWW) and Possibilistic Relational Universal Fuzzy (PRUF) are the two concepts which widely represent and measure the vaguely defined natural phenomenon. In this paper, we study the positional alteration of the phrases by which the impact of a natural language proposition gets affected and/or modified. We observe the gradations due to sensitivity/feeling of a statement towards the positional alterations. We derive the classification and modification of the meaning of words due to the positional alteration. We present the results with reference to set theoretic interpretations.
Keywords: Appositive, computing with words, PRUF, semantic sentiment analysis, set theoretic interpretations.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 840314 Words Reordering based on Statistical Language Model
Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou
Abstract:
There are multiple reasons to expect that detecting the word order errors in a text will be a difficult problem, and detection rates reported in the literature are in fact low. Although grammatical rules constructed by computer linguists improve the performance of grammar checker in word order diagnosis, the repairing task is still very difficult. This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The comparative advantage of this method is that works with a large set of words, and avoids the laborious and costly process of collecting word order errors for creating error patterns.Keywords: Permutations filtering, Statistical languagemodel N-grams, Word order errors
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1586313 Web Search Engine Based Naming Procedure for Independent Topic
Authors: Takahiro Nishigaki, Takashi Onoda
Abstract:
In recent years, the number of document data has been increasing since the spread of the Internet. Many methods have been studied for extracting topics from large document data. We proposed Independent Topic Analysis (ITA) to extract topics independent of each other from large document data such as newspaper data. ITA is a method for extracting the independent topics from the document data by using the Independent Component Analysis. The topic represented by ITA is represented by a set of words. However, the set of words is quite different from the topics the user imagines. For example, the top five words with high independence of a topic are as follows. Topic1 = {"scor", "game", "lead", "quarter", "rebound"}. This Topic 1 is considered to represent the topic of "SPORTS". This topic name "SPORTS" has to be attached by the user. ITA cannot name topics. Therefore, in this research, we propose a method to obtain topics easy for people to understand by using the web search engine, topics given by the set of words given by independent topic analysis. In particular, we search a set of topical words, and the title of the homepage of the search result is taken as the topic name. And we also use the proposed method for some data and verify its effectiveness.Keywords: Independent topic analysis, topic extraction, topic naming, web search engine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 500312 Enhancing Retrieval Effectiveness of Malay Documents by Exploiting Implicit Semantic Relationship between Words
Authors: Mohd Pouzi Hamzah, Tengku Mohd Tengku Sembok
Abstract:
Phrases has a long history in information retrieval, particularly in commercial systems. Implicit semantic relationship between words in a form of BaseNP have shown significant improvement in term of precision in many IR studies. Our research focuses on linguistic phrases which is language dependent. Our results show that using BaseNP can improve performance although above 62% of words formation in Malay Language based on derivational affixes and suffixes.
Keywords: Information Retrieval, Malay Language, Semantic Relationship, Retrieval Effectiveness, Conceptual Indexing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1428311 Optical Multicast over OBS Networks: An Approach Based On Code-Words and Tunable Decoders
Authors: Maha Sliti, Walid Abdallah, Noureddine Boudriga
Abstract:
In the frame of this work, we present an optical multicasting approach based on optical code-words. Our approach associates, in the edge node, an optical code-word to a group multicast address. In the core node, a set of tunable decoders are used to send a traffic data to multiple destinations based on the received code-word. The use of code-words, which correspond to the combination of an input port and a set of output ports, allows the implementation of an optical switching matrix. At the reception of a burst, it will be delayed in an optical memory. And, the received optical code-word is split to a set of tunable optical decoders. When it matches a configured code-word, the delayed burst is switched to a set of output ports.
Keywords: Optical multicast, optical burst switching networks, optical code-words, tunable decoder, virtual optical memory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1686310 Optical Multicast over OBS Networks: An Approach Based On Code-Words and Tunable Decoders
Authors: Maha Sliti, Walid Abdallah, Noureddine Boudriga
Abstract:
In the frame of this work, we present an optical multicasting approach based on optical code-words. Our approach associates, in the edge node, an optical code-word to a group multicast address. In the core node, a set of tunable decoders are used to send a traffic data to multiple destinations based on the received code-word. The use of code-words, which correspond to the combination of an input port and a set of output ports, allows the implementation of an optical switching matrix. At the reception of a burst, it will be delayed in an optical memory. And, the received optical code-word is split to a set of tunable optical decoders. When it matches a configured code-word, the delayed burst is switched to a set of output ports.
Keywords: Optical multicast, optical burst switching networks, optical code-words, tunable decoder, virtual optical memory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1760309 Aspect-Level Sentiment Analysis with Multi-Channel and Graph Convolutional Networks
Authors: Jiajun Wang, Xiaoge Li
Abstract:
The purpose of the aspect-level sentiment analysis task is to identify the sentiment polarity of aspects in a sentence. Currently, most methods mainly focus on using neural networks and attention mechanisms to model the relationship between aspects and context, but they ignore the dependence of words in different ranges in the sentence, resulting in deviation when assigning relationship weight to other words other than aspect words. To solve these problems, we propose an aspect-level sentiment analysis model that combines a multi-channel convolutional network and graph convolutional network (GCN). Firstly, the context and the degree of association between words are characterized by Long Short-Term Memory (LSTM) and self-attention mechanism. Besides, a multi-channel convolutional network is used to extract the features of words in different ranges. Finally, a convolutional graph network is used to associate the node information of the dependency tree structure. We conduct experiments on four benchmark datasets. The experimental results are compared with those of other models, which shows that our model is better and more effective.
Keywords: Aspect-level sentiment analysis, attention, multi-channel convolution network, graph convolution network, dependency tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 506308 Structural Analysis of Username Segment in E-Mail Addresses of Engineering Institutes of Gujarat State of India
Authors: Jatinderkumar R. Saini
Abstract:
E-mail has become a key mechanism of electronic communication. This is true for professional organizations that like to communicate with their subjects online and are slowly shifting to paper-less office. The current paper focuses specifically on academic institutions offering Engineering course in Gujarat state and attempts for textual analysis of the usernames of the institutional e-mail addresses. We found that the institutions tend to design the username segment of their e-mail addresses by choosing words or combination of words from specific categories. The paper also highlights the use of special characters, digits and random words in designing the usernames. On the sidelines, the paper lists the style of employing department names and designations for the design process. To the best of our knowledge, this is the first formal attempt to analyze the selection of words employed for designing username segment of e-mail addresses of engineering institutions.
Keywords: E-mail address, Institute, Engineering, Username.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1683307 The Algorithm of Semi-Automatic Thai Spoonerism Words for Bi-Syllable
Authors: Nutthapat Kaewrattanapat, Wannarat Bunchongkien
Abstract:
The purposes of this research are to study and develop the algorithm of Thai spoonerism words by semi-automatic computer programs, that is to say, in part of data input, syllables are already separated and in part of spoonerism, the developed algorithm is utilized, which can establish rules and mechanisms in Thai spoonerism words for bi-syllables by utilizing analysis in elements of the syllables, namely cluster consonant, vowel, intonation mark and final consonant. From the study, it is found that bi-syllable Thai spoonerism has 1 case of spoonerism mechanism, namely transposition in value of vowel, intonation mark and consonant of both 2 syllables but keeping consonant value and cluster word (if any). From the study, the rules and mechanisms in Thai spoonerism word were applied to develop as Thai spoonerism word software, utilizing PHP program. the software was brought to conduct a performance test on software execution; it is found that the program performs bi-syllable Thai spoonerism correctly or 99% of all words used in the test and found faults on the program at 1% as the words obtained from spoonerism may not be spelling in conformity with Thai grammar and the answer in Thai spoonerism could be more than 1 answer.
Keywords: Algorithm, Spoonerism, Computational Linguistics.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2358306 Computable Difference Matrix for Synonyms in the Holy Quran
Authors: Mohamed Ali AlShaari, Khalid M. ElFitori
Abstract:
In the field of Quran Studies known as GHAREEB AL QURAN (The study of the meanings of strange words and structures in Holy Quran), it is difficult to distinguish some pragmatic meanings from conceptual meanings. One who wants to study this subject may need to look for a common usage between any two words or more; to understand general meaning, and sometimes may need to look for common differences between them, even if there are synonyms (word sisters).
Some of the distinguished scholars of Arabic linguistics believe that there are no synonym words, they believe in varieties of meaning and multi-context usage. Based on this viewpoint, our method was designedto look for synonyms of a word, then the differences that distinct the word and their synonyms.
There are many available books that use such a method e.g. synonyms books, dictionaries, glossaries, and some books on the interpretations of strange vocabulary of the Holy Quran, but it is difficult to look up words in these written works.
For that reason, we proposed a logical entity, which we called Differences Matrix (DM).
DM groups the synonyms words to extract the relations between them and to know the general meaning, which defines the skeleton of all word synonyms; this meaning is expressed by a word of its sisters.
In Differences Matrix, we used the sisters(words) as titles for rows and columns, and in the obtained cells we tried to define the row title (word) by using column title (her sister), so the relations between sisters appear, the expected result is well defined groups of sisters for each word. We represented the obtained results formally, and used the defined groups as a base for building the ontology of the Holy Quran synonyms.
Keywords: Quran, synonyms, Differences Matrix, ontology
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2113305 N-Grams: A Tool for Repairing Word Order Errors in Ill-formed Texts
Authors: Theologos Athanaselis, Stelios Bakamidis, Ioannis Dologlou, Konstantinos Mamouras
Abstract:
This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. A possible way for reordering the words is to use all the permutations. The problem is that for a sentence with length N words the number of all permutations is N!. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. The confusion matrix technique has been designed in order to reduce the search space among permuted sentences. The limitation of search space is succeeded using the statistical inference of N-grams. The results of this technique are very interesting and prove that the number of permuted sentences can be reduced by 98,16%. For experimental purposes a test set of TOEFL sentences was used and the results show that more than 95% can be repaired using the proposed method.
Keywords: Permutations filtering, Statistical language model N-grams, Word order errors, TOEFL
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1668304 Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE
Authors: Jatinderkumar R. Saini, Apurva A. Desai
Abstract:
Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.Keywords: Body Enhancement, Lexicon, Medicinal, Slang, Unigram, Unsolicited Bulk e-mail (UBE)
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1820303 Speech Recognition Using Scaly Neural Networks
Authors: Akram M. Othman, May H. Riadh
Abstract:
This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1737302 Greek Compounds: A Challenging Case for the Parsing Techniques of PC-KIMMO v.2
Authors: Angela Ralli, Eleni Galiotou
Abstract:
In this paper we describe the recognition process of Greek compound words using the PC-KIMMO software. We try to show certain limitations of the system with respect to the principles of compound formation in Greek. Moreover, we discuss the computational processing of phenomena such as stress and syllabification which are indispensable for the analysis of such constructions and we try to propose linguistically-acceptable solutions within the particular system.
Keywords: Morpho-phonological parsing, compound words, two-level morphology, natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609301 Abai Kunanbayev's Role in Enrichment of the Kazakh Language
Authors: Y.M. Paltore, B.N. Zhubatova, A.A. Mustafayeva
Abstract:
Abai Kunanbayev is famous for being enlightener, composer, interpreter, social agent, philosopher, reformer, who wanted to enrich Kazakh literature by emergence with Russian and European culture, and also as a founder of Kazakh written literary language. Abai Kunanbayev was born in 1845 in East Kazakhstan area and passed away in 1904 in his hometown. His oeuvre absorbed and reflected all changes in the life of Kazakh society of the second half of XIX century. Because ХІХ century, especially its second half, was an important transition period for Kazakhstan, which radically changed traditional way of Kazakh society and predetermined further development in consequence of activation of Russian colonial policy and approval of commodity-money relations in Steppe Land.Abai Kunanbayev, besides Arabic and Persian common words and loanwords from Quran in his words of edification, had used a lot of words of Arabic, Persian, Latin, Russian, Nogai, Shaghatai, Polish, Greek, Turkish, which are used in the Kazakh language.Keywords: Abai Kunanbayev, the Kazakh, Russian languages, literature
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2864300 Recognition of Noisy Words Using the Time Delay Neural Networks Approach
Authors: Khenfer-Koummich Fatima, Mesbahi Larbi, Hendel Fatiha
Abstract:
This paper presents a recognition system for isolated words like robot commands. It’s carried out by Time Delay Neural Networks; TDNN. To teleoperate a robot for specific tasks as turn, close, etc… In industrial environment and taking into account the noise coming from the machine. The choice of TDNN is based on its generalization in terms of accuracy, in more it acts as a filter that allows the passage of certain desirable frequency characteristics of speech; the goal is to determine the parameters of this filter for making an adaptable system to the variability of speech signal and to noise especially, for this the back propagation technique was used in learning phase. The approach was applied on commands pronounced in two languages separately: The French and Arabic. The results for two test bases of 300 spoken words for each one are 87%, 97.6% in neutral environment and 77.67%, 92.67% when the white Gaussian noisy was added with a SNR of 35 dB.
Keywords: Neural networks, Noise, Speech Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1936299 Preliminary Study of the Phonological Development in Three- and Four-Year-Old Bulgarian Children
Authors: Tsvetomira Braynova, Miglena Simonska
Abstract:
The article presents the results of a research of phonological processes in three- and four-year-old children. A test, created for the purpose of the study, was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the research is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, elision of sound, metathesis of sound, elision of syllable, elision of consonants clustered in a syllable. Measuring the correlation between average length of repeated speech and average length of generated speech, the analysis does not prove that the more words a child can repeat in part “repeated speech”, the more words they can be expected to generate in part “generating sentence”. The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.
Keywords: Articulation, phonology, speech, language development.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 384298 Adaptive Naïve Bayesian Anti-Spam Engine
Authors: Wojciech P. Gajewski
Abstract:
The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.
Keywords: Text classification, naïve Bayesian classification, spam, email.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4415