Search results for: Impolite word
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 211

Search results for: Impolite word

61 Online Topic Model for Broadcasting Contents Using Semantic Correlation Information

Authors: Chang-Uk Kwak, Sun-Joong Kim, Seong-Bae Park, Sang-Jo Lee

Abstract:

This paper proposes a method of learning topics for broadcasting contents. There are two kinds of texts related to broadcasting contents. One is a broadcasting script, which is a series of texts including directions and dialogues. The other is blogposts, which possesses relatively abstracted contents, stories, and diverse information of broadcasting contents. Although two texts range over similar broadcasting contents, words in blogposts and broadcasting script are different. When unseen words appear, it needs a method to reflect to existing topic. In this paper, we introduce a semantic vocabulary expansion method to reflect unseen words. We expand topics of the broadcasting script by incorporating the words in blogposts. Each word in blogposts is added to the most semantically correlated topics. We use word2vec to get the semantic correlation between words in blogposts and topics of scripts. The vocabularies of topics are updated and then posterior inference is performed to rearrange the topics. In experiments, we verified that the proposed method can discover more salient topics for broadcasting contents.

Keywords: Broadcasting script analysis, topic expansion, semantic correlation analysis, word2vec.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1711
60 SySRA: A System of a Continuous Speech Recognition in Arab Language

Authors: Samir Abdelhamid, Noureddine Bouguechal

Abstract:

We report in this paper the model adopted by our system of continuous speech recognition in Arab language SySRA and the results obtained until now. This system uses the database Arabdic-10 which is a corpus of word for the Arab language and which was manually segmented. Phonetic decoding is represented by an expert system where the knowledge base is translated in the form of production rules. This expert system transforms a vocal signal into a phonetic lattice. The higher level of the system takes care of the recognition of the lattice thus obtained by deferring it in the form of written sentences (orthographical Form). This level contains initially the lexical analyzer which is not other than the module of recognition. We subjected this analyzer to a set of spectrograms obtained by dictating a score of sentences in Arab language. The rate of recognition of these sentences is about 70% which is, to our knowledge, the best result for the recognition of the Arab language. The test set consists of twenty sentences from four speakers not having taken part in the training.

Keywords: Continuous speech recognition, lexical analyzer, phonetic decoding, phonetic lattice, vocal signal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1346
59 Efficient System for Speech Recognition using General Regression Neural Network

Authors: Abderrahmane Amrouche, Jean Michel Rouvaen

Abstract:

In this paper we present an efficient system for independent speaker speech recognition based on neural network approach. The proposed architecture comprises two phases: a preprocessing phase which consists in segmental normalization and features extraction and a classification phase which uses neural networks based on nonparametric density estimation namely the general regression neural network (GRNN). The relative performances of the proposed model are compared to the similar recognition systems based on the Multilayer Perceptron (MLP), the Recurrent Neural Network (RNN) and the well known Discrete Hidden Markov Model (HMM-VQ) that we have achieved also. Experimental results obtained with Arabic digits have shown that the use of nonparametric density estimation with an appropriate smoothing factor (spread) improves the generalization power of the neural network. The word error rate (WER) is reduced significantly over the baseline HMM method. GRNN computation is a successful alternative to the other neural network and DHMM.

Keywords: Speech Recognition, General Regression NeuralNetwork, Hidden Markov Model, Recurrent Neural Network, ArabicDigits.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2137
58 Providing Medical Information in Braille: Research and Development of Automatic Braille Translation Program for Japanese “eBraille“

Authors: Aki Sugano, Mika Ohta, Mineko Ikegami, Kenji Miura, Sayo Tsukamoto, Akihiro Ichinose, Toshiko Ohshima, Eiichi Maeda, Masako Matsuura, Yutaka Takao

Abstract:

Along with the advances in medicine, providing medical information to individual patient is becoming more important. In Japan such information via Braille is hardly provided to blind and partially sighted people. Thus we are researching and developing a Web-based automatic translation program “eBraille" to translate Japanese text into Japanese Braille. First we analyzed the Japanese transcription rules to implement them on our program. We then added medical words to the dictionary of the program to improve its translation accuracy for medical text. Finally we examined the efficacy of statistical learning models (SLMs) for further increase of word segmentation accuracy in braille translation. As a result, eBraille had the highest translation accuracy in the comparison with other translation programs, improved the accuracy for medical text and is utilized to make hospital brochures in braille for outpatients and inpatients.

Keywords: Automatic Braille translation, Medical text, Partially sighted people.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1558
57 Post-Modernist Tragi-Comedy: A Study of Tom Stoppard’s Rosencrantz and Guildenstern Are Dead

Authors: Azza Taha Zaki

Abstract:

The death of tragedy is probably one of the most distinctive literary controversies of the twentieth century. There is common critical consent that tragedy in the classical sense of the word is no longer possible. Thinkers, philosophers and critics such as Nietzsche, Durrenmatt and George Steiner have all agreed that the decline of the genre in the modern age is due to the total lack of a unified world image and the absence of a shared vision in a fragmented and ideologically diversified world. The production of Rosencrantz and Guildenstern Are Dead in 1967 marked the rise of the genre of tragi-comedy as a more appropriate reflection of the spirit of the age. At the hands of such great dramatists as Tom Stoppard (1937- ), the revived genre was not used as an extra comic element to give some comic relief to an otherwise tragic text, but it was given a postmodernist touch to serve the interpretation of the dilemma of man in the postmodernist world. This paper will study features of postmodernist tragi-comedy in Rosencrantz and Guildenstern Are Dead as one of the most important plays in the modern British theatre and investigate Stoppard’s vision of man and life as influenced by postmodernist thought and philosophy.

Keywords: British, drama, postmodernist, Stoppard, tragi-comedy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 345
56 Brand Position Communication Channel for Rajabhat University

Authors: Narong Anurak

Abstract:

The objective of this research was to study Brand Position Communication Channel in Brand Building in Rajabhat University Affecting Decision Making of Higher Education from of qualitative research and in-depth interview with executive members Rajabhat University and also quantitative by questionnaires which are personal data of students, study of the acceptance and the finding of the information of Rajabhat University, study of pattern or Brand Position Communication Channel affecting the decision making of studying in Rajabhat University and the result of the communication in Brand Position Communication Channel. It is found that online channel and word of mount are highly important and necessary for education business since media channel is a tool and the management of marketing communication to create brand awareness, brand credibility and to achieve the high acclaim in terms of bringing out qualified graduates. Also, off-line channel can enable the institution to survive from the high competition especially in education business regarding management of the Rajabhat University. Therefore, Rajabhat University has to communicate by the various communication channel strategies for brand building for attractive student to make decision making of higher education.

Keywords: Brand Position, Communication Channel, Rajabhat University.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1352
55 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2412
54 ABURAS Index: A Statistically Developed Index for Dengue-Transmitting Vector Population Prediction

Authors: Hani M. Aburas

Abstract:

“Dengue" is an African word meaning “bone breaking" because it causes severe joint and muscle pain that feels like bones are breaking. It is an infectious disease mainly transmitted by female mosquito, Aedes aegypti, and causes four serotypes of dengue viruses. In recent years, a dramatic increase in the dengue fever confirmed cases around the equator-s belt has been reported. Several conventional indices have been designed so far to monitor the transmitting vector populations known as House Index (HI), Container Index (CI), Breteau Index (BI). However, none of them describes the adult mosquito population size which is important to direct and guide comprehensive control strategy operations since number of infected people has a direct relationship with the vector density. Therefore, it is crucial to know the population size of the transmitting vector in order to design a suitable and effective control program. In this context, a study is carried out to report a new statistical index, ABURAS Index, using Poisson distribution based on the collection of vector population in Jeddah Governorate, Saudi Arabia.

Keywords: Poisson distribution, statistical index, prediction, Aedes aegypti.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1875
53 The Effect of Ambient Occlusion Shading on Perception of Sign Language Animations

Authors: Nicoletta Adamo-Villani, Joe Kasenga, Tiffany Jen, Bryan Colbourn

Abstract:

The goal of the study reported in the paper was to determine whether Ambient Occlusion Shading (AOS) has a significant effect on users' perception of American Sign Language (ASL) finger spelling animations. Seventy-one (71) subjects participated in the study; all subjects were fluent in ASL. The participants were asked to watch forty (40) sign language animation clips representing twenty (20) finger spelled words. Twenty (20) clips did not show ambient occlusion, whereas the other twenty (20) were rendered using ambient occlusion shading. After viewing each animation, subjects were asked to type the word being finger-spelled and rate its legibility. Findings show that the presence of AOS had a significant effect on the subjects perception of the signed words. Subjects were able to recognize the animated words rendered with AOS with higher level of accuracy, and the legibility ratings of the animations showing AOS were consistently higher across subjects.

Keywords: Sign Language, Animation, Ambient Occlusion Shading, Deaf Education

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1631
52 Developmental Differences in the Construction of Concepts by Children from 3 to 14-Year-Olds: Perception, Language and Instruction

Authors: Mehmet Ozcan

Abstract:

This study was designed to investigate the relationship between language and children’s construction of the concept of objects, actions, and states. Participants of this study are 120 children whose ages range from 3 to 14 years. Ten children participated from each age group and 10 adults participated as normative group. Data were collected using 28 words which were identified and grouped according to the purpose of this study. Participants were asked the question “What is x?’ for each word in a reserved room. The audio recorded data were transcribed and coded. The data were analyzed primarily qualitatively but quantitatively as well to support qualitative findings. The findings reveal that younger children rely more on their perceptual experience and linguistic input while 7-year-olds and older ones rely more on instructional language in the construction of the concepts related to objects, actions and states. Adults differ from all age groups with their usage of metaphors to refer to objects. It has been noted that linguistic, perceptual and instructional experiences work in an interwoven way but each one seems to be dominant at certain ages.

Keywords: Cognition, concept construction, first language acquisition, language, thought.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1004
51 The Role of Female Population as a Consumer in Modern Marketing Strategy and Management

Authors: Jana Aleksić, Marijana Petković

Abstract:

Female population has an increasing role when it comes to purchase. Consequently, the female population has a greater role in modern marketing. Although it is thought that women buy more than men, marketing strategy was not directed specifically towards women. The thing that has changed regarding women’s role in modern marketing is the fact that the female population has a leading position when it comes to decision making in various fields and various sectors, which was not the case in the past. Marketing should be directed towards women but it should be done in the right way. Compared to men, women buy in a different way, and they look for more various advantages in the product itself, than men do. This paper aims to show the importance of the female role in the modern marketing and management and to redirect marketing in some way towards female population through new marketing strategies and management systems. Hypothesis is that women have an important role in marketing, and marketing strategy of modern society could and should be based on and directed towards female population and their tastes when it comes to purchasing. It is necessary and desirable to apply marketing strategy with a special strategy that has an emphasis on women and their purchase or in a word to apply WS- woman strategy. This research was carried out as a random sample research, where were obtained 212 valid surveys whose results serve as a basis for drawing conclusions about the research as well as to verify the formulated hypotheses. The research was carried out during 2011 and 2012. The study has shown a significant role of the female population in the marketing process.

Keywords: Marketing, management, female, purchase, strategy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1339
50 Speech Recognition Using Scaly Neural Networks

Authors: Akram M. Othman, May H. Riadh

Abstract:

This research work is aimed at speech recognition using scaly neural networks. A small vocabulary of 11 words were established first, these words are “word, file, open, print, exit, edit, cut, copy, paste, doc1, doc2". These chosen words involved with executing some computer functions such as opening a file, print certain text document, cutting, copying, pasting, editing and exit. It introduced to the computer then subjected to feature extraction process using LPC (linear prediction coefficients). These features are used as input to an artificial neural network in speaker dependent mode. Half of the words are used for training the artificial neural network and the other half are used for testing the system; those are used for information retrieval. The system components are consist of three parts, speech processing and feature extraction, training and testing by using neural networks and information retrieval. The retrieve process proved to be 79.5-88% successful, which is quite acceptable, considering the variation to surrounding, state of the person, and the microphone type.

Keywords: Feature extraction, Liner prediction coefficients, neural network, Speech Recognition, Scaly ANN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1696
49 Efficient DTW-Based Speech Recognition System for Isolated Words of Arabic Language

Authors: Khalid A. Darabkh, Ala F. Khalifeh, Baraa A. Bathech, Saed W. Sabah

Abstract:

Despite the fact that Arabic language is currently one of the most common languages worldwide, there has been only a little research on Arabic speech recognition relative to other languages such as English and Japanese. Generally, digital speech processing and voice recognition algorithms are of special importance for designing efficient, accurate, as well as fast automatic speech recognition systems. However, the speech recognition process carried out in this paper is divided into three stages as follows: firstly, the signal is preprocessed to reduce noise effects. After that, the signal is digitized and hearingized. Consequently, the voice activity regions are segmented using voice activity detection (VAD) algorithm. Secondly, features are extracted from the speech signal using Mel-frequency cepstral coefficients (MFCC) algorithm. Moreover, delta and acceleration (delta-delta) coefficients have been added for the reason of improving the recognition accuracy. Finally, each test word-s features are compared to the training database using dynamic time warping (DTW) algorithm. Utilizing the best set up made for all affected parameters to the aforementioned techniques, the proposed system achieved a recognition rate of about 98.5% which outperformed other HMM and ANN-based approaches available in the literature.

Keywords: Arabic speech recognition, MFCC, DTW, VAD.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4032
48 Digital Control Algorithm Based on Delta-Operator for High-Frequency DC-DC Switching Converters

Authors: Renkai Wang, Tingcun Wei

Abstract:

In this paper, a digital control algorithm based on delta-operator is presented for high-frequency digitally-controlled DC-DC switching converters. The stability and the controlling accuracy of the DC-DC switching converters are improved by using the digital control algorithm based on delta-operator without increasing the hardware circuit scale. The design method of voltage compensator in delta-domain using PID (Proportion-Integration- Differentiation) control is given in this paper, and the simulation results based on Simulink platform are provided, which have verified the theoretical analysis results very well. It can be concluded that, the presented control algorithm based on delta-operator has better stability and controlling accuracy, and easier hardware implementation than the existed control algorithms based on z-operator, therefore it can be used for the voltage compensator design in high-frequency digitally- controlled DC-DC switching converters.

Keywords: Digitally-controlled DC-DC switching converter, finite word length, control algorithm based on delta-operator, high-frequency, stability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1215
47 Using Teager Energy Cepstrum and HMM distancesin Automatic Speech Recognition and Analysis of Unvoiced Speech

Authors: Panikos Heracleous

Abstract:

In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented. NAM microphones are special acoustic sensors, which are attached behind the talker-s ear and can capture not only normal (audible) speech, but also very quietly uttered speech (non-audible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech conversion etc.) for sound-impaired people. Using a small amount of training data and adaptation approaches, 93.9% word accuracy was achieved for a 20k Japanese vocabulary dictation task. Non-audible murmur recognition in noisy environments is also investigated. In this study, further analysis of the NAM speech has been made using distance measures between hidden Markov model (HMM) pairs. It has been shown the reduced spectral space of NAM speech using a metric distance, however the location of the different phonemes of NAM are similar to the location of the phonemes of normal speech, and the NAM sounds are well discriminated. Promising results in using nonlinear features are also introduced, especially under noisy conditions.

Keywords: Speech recognition, unvoiced speech, nonlinear features, HMM distance measures

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1607
46 Iranian Bazaars: The Illustration of Stable Thoughts

Authors: Aida Amirazodi

Abstract:

"Bazaar" is a Persian word from the language of Iranians of 2500 years ago which has entered the languages of other countries. “Bazaar", the trading or marketing place with the architectural principles and concerns, was formed in Iran because of the long experience of marketing. This has become a valuable inheritance of Islamic ideological civilization and Iranian advanced architecture and a model of Islamic-marketing places with spectacular elements and parts, and the place for economical, social and cultural exchanges. “Bazaars" are found in cities of Iran and many Islamic countries in west of Asia and north of Africa. With the stable structure and function as a symbol of social values, this place has become the economic center and the illustration of stable architecture and advanced principles. “Bazaars" as the heart of Iranian cities economy with several major and minor rows of shops, in closed and open areas, along a fixed line or branches with beautiful arcs, patios, and frameworks are among the main national inheritance of Iran and one of the important Iranian architectural treasures because of its Iranian nobility.

Keywords: Traditional Bazaar, Form of Bazaar, Iranian Architecture

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1692
45 Understanding the Influence on Drivers’ Recommendation and Review-Writing Behavior in the P2P Taxi Service

Authors: Liwen Hou

Abstract:

The booming mobile business has been penetrating the taxi industry worldwide with P2P (peer to peer) taxi services, as an emerging business model, transforming the industry. Parallel with other mobile businesses, member recommendations and online reviews are believed to be very effective with regard to acquiring new users for P2P taxi services. Based on an empirical dataset of the taxi industry in China, this study aims to reveal which factors influence users’ recommendations and review-writing behaviors. Differing from the existing literature, this paper takes the taxi driver’s perspective into consideration and hence selects a group of variables related to the drivers. We built two models to reflect the factors that influence the number of recommendations and reviews posted on the platform (i.e., the app). Our models show that all factors, except the driver’s score, significantly influence the recommendation behavior. Likewise, only one factor, passengers’ bad reviews, is insignificant in generating more drivers’ reviews. In the conclusion, we summarize the findings and limitations of the research.

Keywords: Online recommendation, P2P taxi service, review-writing, word of mouth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1333
44 Encryption Efficiency Analysis and Security Evaluation of RC6 Block Cipher for Digital Images

Authors: Hossam El-din H. Ahmed, Hamdy M. Kalash, Osama S. Farag Allah

Abstract:

This paper investigates the encryption efficiency of RC6 block cipher application to digital images, providing a new mathematical measure for encryption efficiency, which we will call the encryption quality instead of visual inspection, The encryption quality of RC6 block cipher is investigated among its several design parameters such as word size, number of rounds, and secret key length and the optimal choices for the best values of such design parameters are given. Also, the security analysis of RC6 block cipher for digital images is investigated from strict cryptographic viewpoint. The security estimations of RC6 block cipher for digital images against brute-force, statistical, and differential attacks are explored. Experiments are made to test the security of RC6 block cipher for digital images against all aforementioned types of attacks. Experiments and results verify and prove that RC6 block cipher is highly secure for real-time image encryption from cryptographic viewpoint. Thorough experimental tests are carried out with detailed analysis, demonstrating the high security of RC6 block cipher algorithm. So, RC6 block cipher can be considered to be a real-time secure symmetric encryption for digital images.

Keywords: Block cipher, Image encryption, Encryption quality, and Security analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2356
43 Simulation and Design of Single Fed Circularly Polarized Triangular Microstrip Antenna with Wide Band Tuning Stub

Authors: R. Irani, A. Ghavidel, F. Hodjat Kashani

Abstract:

Recently, several designs of single fed circularly polarized microstrip antennas have been studied. Relatively, a few designs for achieving circular polarization using triangular microstrip antenna are available. Typically existing design of single fed circularly polarized triangular microstrip antennas include the use of equilateral triangular patch with a slit or a horizontal slot on the patch or addition a narrow band stub on the edge or a vertex of triangular patch. In other word, with using a narrow band tune stub on middle of an edge of triangle causes of facility to compensate the possible fabrication error and substrate materials with easier adjusting the tuner stub length. Even though disadvantages of this method is very long of stub (approximate 1/3 length of triangle edge). In this paper, instead of narrow band stub, a wide band stub has been applied, therefore the length of stub by this method has been decreased around 1/10 edge of triangle in addition changing the aperture angle of stub, provides more facility for designing and producing circular polarization wave.

Keywords: Circular polarization, Microstrip antenna, single feed, wide band stub.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1962
42 Meditation Based Brain Painting Promoting Foreign Language Memory through Establishing a Brain-Computer Interface

Authors: Zhepeng Rui, Zhenyu Gu, Caitilin de Bérigny

Abstract:

In the current study, we designed an interactive meditation and brain painting application to cultivate users’ creativity, promote meditation, reduce stress, and improve cognition while attempting to learn a foreign language. User tests and data analyses were conducted on 42 male and 42 female participants to better understand sex-associated psychological and aesthetic differences. Our method utilized brain-computer interfaces to import meditation and attention data to create artwork in meditation-based applications. Female participants showed statistically significantly different language learning outcomes following three meditation paradigms. The art style of brain painting helped females with language memory. Our results suggest that the most ideal methods for promoting memory attention were meditation methods and brain painting exercises contributing to language learning, memory concentration promotion, and foreign word memorization. We conclude that a short period of meditation practice can help in learning a foreign language. These findings provide insights into meditation, creative language education, brain-computer interface, and human-computer interactions.

Keywords: Brain-computer interface, creative thinking, meditation, mental health.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 498
41 Using Suffix Tree Document Representation in Hierarchical Agglomerative Clustering

Authors: Daniel I. Morariu, Radu G. Cretulescu, Lucian N. Vintan

Abstract:

In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any “word context" information found in the document. In this article we make a comparison between the classical method of document representation and a method called Suffix Tree Document Model (STDM) that is based on representing documents in the Suffix Tree format. For the STDM model we proposed a new approach for documents representation and a new formula for computing the similarity between two documents. Thus we propose to build the suffix tree only for any two documents at a time. This approach is faster, it has lower memory consumption and use entire document representation without using methods for disposing nodes. Also for this method is proposed a formula for computing the similarity between documents, which improves substantially the clustering quality. This representation method was validated using HAC - Hierarchical Agglomerative Clustering. In this context we experiment also the stemming influence in the document preprocessing step and highlight the difference between similarity or dissimilarity measures to find “closer" documents.

Keywords: Text Clustering, Suffix tree documentrepresentation, Hierarchical Agglomerative Clustering

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1866
40 Identification of Most Frequently Occurring Lexis in Winnings-announcing Unsolicited Bulke-mails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

e-mail has become an important means of electronic communication but the viability of its usage is marred by Unsolicited Bulk e-mail (UBE) messages. UBE consists of many types like pornographic, virus infected and 'cry-for-help' messages as well as fake and fraudulent offers for jobs, winnings and medicines. UBE poses technical and socio-economic challenges to usage of e-mails. To meet this challenge and combat this menace, we need to understand UBE. Towards this end, the current paper presents a content-based textual analysis of nearly 3000 winnings-announcing UBE. Technically, this is an application of Text Parsing and Tokenization for an un-structured textual document and we approach it using Bag Of Words (BOW) and Vector Space Document Model techniques. We have attempted to identify the most frequently occurring lexis in the winnings-announcing UBE documents. The analysis of such top 100 lexis is also presented. We exhibit the relationship between occurrence of a word from the identified lexisset in the given UBE and the probability that the given UBE will be the one announcing fake winnings. To the best of our knowledge and survey of related literature, this is the first formal attempt for identification of most frequently occurring lexis in winningsannouncing UBE by its textual analysis. Finally, this is a sincere attempt to bring about alertness against and mitigate the threat of such luring but fake UBE.

Keywords: Lexis, Unsolicited Bulk e-mail (UBE), Vector SpaceDocument Model, Winnings, Lottery

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490
39 Reduced Dynamic Time Warping for Handwriting Recognition Based on Multidimensional Time Series of a Novel Pen Device

Authors: Muzaffar Bashir, Jürgen Kempf

Abstract:

The purpose of this paper is to present a Dynamic Time Warping technique which reduces significantly the data processing time and memory size of multi-dimensional time series sampled by the biometric smart pen device BiSP. The acquisition device is a novel ballpoint pen equipped with a diversity of sensors for monitoring the kinematics and dynamics of handwriting movement. The DTW algorithm has been applied for time series analysis of five different sensor channels providing pressure, acceleration and tilt data of the pen generated during handwriting on a paper pad. But the standard DTW has processing time and memory space problems which limit its practical use for online handwriting recognition. To face with this problem the DTW has been applied to the sum of the five sensor signals after an adequate down-sampling of the data. Preliminary results have shown that processing time and memory size could significantly be reduced without deterioration of performance in single character and word recognition. Further excellent accuracy in recognition was achieved which is mainly due to the reduced dynamic time warping RDTW technique and a novel pen device BiSP.

Keywords: Biometric character recognition, biometric person authentication, biometric smart pen BiSP, dynamic time warping DTW, online-handwriting recognition, multidimensional time series.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2361
38 Testing Loaded Programs Using Fault Injection Technique

Authors: S. Manaseer, F. A. Masooud, A. A. Sharieh

Abstract:

Fault tolerance is critical in many of today's large computer systems. This paper focuses on improving fault tolerance through testing. Moreover, it concentrates on the memory faults: how to access the editable part of a process memory space and how this part is affected. A special Software Fault Injection Technique (SFIT) is proposed for this purpose. This is done by sequentially scanning the memory of the target process, and trying to edit maximum number of bytes inside that memory. The technique was implemented and tested on a group of programs in software packages such as jet-audio, Notepad, Microsoft Word, Microsoft Excel, and Microsoft Outlook. The results from the test sample process indicate that the size of the scanned area depends on several factors. These factors are: process size, process type, and virtual memory size of the machine under test. The results show that increasing the process size will increase the scanned memory space. They also show that input-output processes have more scanned area size than other processes. Increasing the virtual memory size will also affect the size of the scanned area but to a certain limit.

Keywords: Complex software systems, Error detection, Fault tolerance, Injection and testing methodology, Memory faults, Process and virtual memory.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1833
37 Learning to Order Terms: Supervised Interestingness Measures in Terminology Extraction

Authors: Jérôme Azé, Mathieu Roche, Yves Kodratoff, Michèle Sebag

Abstract:

Term Extraction, a key data preparation step in Text Mining, extracts the terms, i.e. relevant collocation of words, attached to specific concepts (e.g. genetic-algorithms and decisiontrees are terms associated to the concept “Machine Learning" ). In this paper, the task of extracting interesting collocations is achieved through a supervised learning algorithm, exploiting a few collocations manually labelled as interesting/not interesting. From these examples, the ROGER algorithm learns a numerical function, inducing some ranking on the collocations. This ranking is optimized using genetic algorithms, maximizing the trade-off between the false positive and true positive rates (Area Under the ROC curve). This approach uses a particular representation for the word collocations, namely the vector of values corresponding to the standard statistical interestingness measures attached to this collocation. As this representation is general (over corpora and natural languages), generality tests were performed by experimenting the ranking function learned from an English corpus in Biology, onto a French corpus of Curriculum Vitae, and vice versa, showing a good robustness of the approaches compared to the state-of-the-art Support Vector Machine (SVM).

Keywords: Text-mining, Terminology Extraction, Evolutionary algorithm, ROC Curve.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1618
36 Inferring Hierarchical Pronunciation Rules from a Phonetic Dictionary

Authors: Erika Pigliapoco, Valerio Freschi, Alessandro Bogliolo

Abstract:

This work presents a new phonetic transcription system based on a tree of hierarchical pronunciation rules expressed as context-specific grapheme-phoneme correspondences. The tree is automatically inferred from a phonetic dictionary by incrementally analyzing deeper context levels, eventually representing a minimum set of exhaustive rules that pronounce without errors all the words in the training dictionary and that can be applied to out-of-vocabulary words. The proposed approach improves upon existing rule-tree-based techniques in that it makes use of graphemes, rather than letters, as elementary orthographic units. A new linear algorithm for the segmentation of a word in graphemes is introduced to enable outof- vocabulary grapheme-based phonetic transcription. Exhaustive rule trees provide a canonical representation of the pronunciation rules of a language that can be used not only to pronounce out-of-vocabulary words, but also to analyze and compare the pronunciation rules inferred from different dictionaries. The proposed approach has been implemented in C and tested on Oxford British English and Basic English. Experimental results show that grapheme-based rule trees represent phonetically sound rules and provide better performance than letter-based rule trees.

Keywords: Automatic phonetic transcription, pronunciation rules, hierarchical tree inference.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1883
35 Preliminary Study of the Phonological Development in Three- and Four-Year-Old Bulgarian Children

Authors: Tsvetomira Braynova, Miglena Simonska

Abstract:

The article presents the results of a research of phonological processes in three- and four-year-old children. A test, created for the purpose of the study, was developed and conducted among 120 children. The study included three areas of research - at the level of words (96 words), at the level of sentence repetition (10 sentences) and at the level of generating own speech from a picture (15 pictures). The test also gives us additional information about the articulation errors of the assessed children. The main purpose of the research is to analyze all phonological processes that occur at this age in Bulgarian children and to identify which are typical and atypical for this age. The results show that the most common phonology errors that children make are: sound substitution, elision of sound, metathesis of sound, elision of syllable, elision of consonants clustered in a syllable. Measuring the correlation between average length of repeated speech and average length of generated speech, the analysis does not prove that the more words a child can repeat in part “repeated speech”, the more words they can be expected to generate in part “generating sentence”. The results of this study show that the task of naming a word provides sufficient and representative information to assess the child's phonology.

Keywords: Articulation, phonology, speech, language development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 305
34 Continuous Feature Adaptation for Non-Native Speech Recognition

Authors: Y. Deng, X. Li, C. Kwan, B. Raj, R. Stern

Abstract:

The current speech interfaces in many military applications may be adequate for native speakers. However, the recognition rate drops quite a lot for non-native speakers (people with foreign accents). This is mainly because the nonnative speakers have large temporal and intra-phoneme variations when they pronounce the same words. This problem is also complicated by the presence of large environmental noise such as tank noise, helicopter noise, etc. In this paper, we proposed a novel continuous acoustic feature adaptation algorithm for on-line accent and environmental adaptation. Implemented by incremental singular value decomposition (SVD), the algorithm captures local acoustic variation and runs in real-time. This feature-based adaptation method is then integrated with conventional model-based maximum likelihood linear regression (MLLR) algorithm. Extensive experiments have been performed on the NATO non-native speech corpus with baseline acoustic model trained on native American English. The proposed feature-based adaptation algorithm improved the average recognition accuracy by 15%, while the MLLR model based adaptation achieved 11% improvement. The corresponding word error rate (WER) reduction was 25.8% and 2.73%, as compared to that without adaptation. The combined adaptation achieved overall recognition accuracy improvement of 29.5%, and WER reduction of 31.8%, as compared to that without adaptation.

Keywords: speaker adaptation; environment adaptation; robust speech recognition; SVD; non-native speech recognition

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3177
33 Principle Components Updates via Matrix Perturbations

Authors: Aiman Elragig, Hanan Dreiwi, Dung Ly, Idriss Elmabrook

Abstract:

This paper highlights a new approach to look at online principle components analysis (OPCA). Given a data matrix X R,^m x n we characterise the online updates of its covariance as a matrix perturbation problem. Up to the principle components, it turns out that online updates of the batch PCA can be captured by symmetric matrix perturbation of the batch covariance matrix. We have shown that as n→ n0 >> 1, the batch covariance and its update become almost similar. Finally, utilize our new setup of online updates to find a bound on the angle distance of the principle components of X and its update.

Keywords: Online data updates, covariance matrix, online principle component analysis (OPCA), matrix perturbation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 990
32 Investigating Iraqi EFL University Students' Productive Knowledge of Grammatical Collocations in English

Authors: Adnan Z. Mkhelif

Abstract:

Grammatical collocations (GCs) are word combinations containing a preposition or a grammatical structure, such as an infinitive (e.g. smile at, interested in, easy to learn, etc.). Such collocations tend to be difficult for Iraqi EFL university students (IUS) to master. To help address this problem, it is important to identify the factors causing it. This study aims at investigating the effects of L2 proficiency, frequency of GCs and their transparency on IUSs’ productive knowledge of GCs. The study involves 112 undergraduate participants with different proficiency levels, learning English in formal contexts in Iraq. The data collection instruments include (but not limited to) a productive knowledge test (designed by the researcher using the British National Corpus (BNC)), as well as the grammar part of the Oxford Placement Test (OPT). The study findings have shown that all the above-mentioned factors have significant effects on IUSs’ productive knowledge of GCs. In addition to establishing evidence of which factors of L2 learning might be relevant to learning GCs, it is hoped that the findings of the present study will contribute to more effective methods of teaching that can better address and help overcome the problems IUSs encounter in learning GCs. The study is thus hoped to have significant theoretical and pedagogical implications for researchers, syllabus designers as well as teachers of English as a foreign/second language.

Keywords: Corpus linguistics, frequency, grammatical collocations, L2 vocabulary learning, productive knowledge, proficiency, transparency.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 797