Search results for: isolated word recognition
3975 Evaluation of Features Extraction Algorithms for a Real-Time Isolated Word Recognition System
Authors: Tomyslav Sledevič, Artūras Serackis, Gintautas Tamulevičius, Dalius Navakauskas
Abstract:
This paper presents a comparative evaluation of features extraction algorithm for a real-time isolated word recognition system based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were implemented in hardware/software design. The proposed system was investigated in the speaker-dependent mode for 100 different Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signals to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients give best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose, the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfies the real-time requirements and is suitable for applications in embedded systems.Keywords: isolated word recognition, features extraction, MFCC, LFCC, LPCC, LPC, FPGA, DTW
Procedia PDF Downloads 4943974 Hybrid SVM/DBN Model for Arabic Isolated Words Recognition
Authors: Elyes Zarrouk, Yassine Benayed, Faiez Gargouri
Abstract:
This paper presents a new hybrid model for isolated Arabic words recognition. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Dynamic Bayesian networks (DBN). This paper deals a comparative study between DBN and SVM/DBN systems for multi-dialect isolated Arabic words. Performance using SVM/DBN is found to exceed that of DBNs trained on an identical task, giving higher recognition accuracy for four different Arabic dialects. In fact, the average of recognition rates for the four dialects with SVM/DBN was 87.67% while 83.01% with DBN.Keywords: dynamic Bayesian networks, hybrid models, supports vectors machine, Arabic isolated words
Procedia PDF Downloads 5583973 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi
Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty
Abstract:
The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity
Procedia PDF Downloads 3553972 A Neural Approach for the Offline Recognition of the Arabic Handwritten Words of the Algerian Departments
Authors: Salim Ouchtati, Jean Sequeira, Mouldi Bedda
Abstract:
In this work we present an off line system for the recognition of the Arabic handwritten words of the Algerian departments. The study is based mainly on the evaluation of neural network performances, trained with the gradient back propagation algorithm. The used parameters to form the input vector of the neural network are extracted on the binary images of the handwritten word by several methods: the parameters of distribution, the moments centered of the different projections and the Barr features. It should be noted that these methods are applied on segments gotten after the division of the binary image of the word in six segments. The classification is achieved by a multi layers perceptron. Detailed experiments are carried and satisfactory recognition results are reported.Keywords: handwritten word recognition, neural networks, image processing, pattern recognition, features extraction
Procedia PDF Downloads 5133971 Automatic Speech Recognition Systems Performance Evaluation Using Word Error Rate Method
Authors: João Rato, Nuno Costa
Abstract:
The human verbal communication is a two-way process which requires a mutual understanding that will result in some considerations. This kind of communication, also called dialogue, besides the supposed human agents it can also be performed between human agents and machines. The interaction between Men and Machines, by means of a natural language, has an important role concerning the improvement of the communication between each other. Aiming at knowing the performance of some speech recognition systems, this document shows the results of the accomplished tests according to the Word Error Rate evaluation method. Besides that, it is also given a set of information linked to the systems of Man-Machine communication. After this work has been made, conclusions were drawn regarding the Speech Recognition Systems, among which it can be mentioned their poor performance concerning the voice interpretation in noisy environments.Keywords: automatic speech recognition, man-machine conversation, speech recognition, spoken dialogue systems, word error rate
Procedia PDF Downloads 3203970 Labyrinthine Venous Vasculature Ablation for the Treatment of Sudden Sensorineural Hearing Loss: Two Case Reports
Authors: Kritin K. Verma, Bailey Duhon, Patrick W. Slater
Abstract:
Objective: To introduce the possible etiological role that the Labyrinthine Venous Vasculature (LVV) has in venous congestion of the cochlear system in Sudden Sensorineural Hearing Loss (SSNHL) patients. Patients: Two patients (62-year-old female, 50-year-old male) presented within twenty-four hours of onset of SSNHL. Intervention: Following failed conservative and salvage techniques, the patients underwent ablation of the labyrinthine venous vasculature ipsilateral to the side of the loss. Main Outcome Measures: Improvement of sudden SSNHL based on an improvement of pure-tone audiometric (PTA) low-tone scoring averages at 250, 500, and 1000 Hz. Word recognition scoring using the NU-6 word list was used to assess quality of life. Results: Case 1 experienced a 51.7 dB increase in low-tone PTA and an increased word recognition scoring of 90%. Case 2 experienced a 33.4 dB increase in low-tone PTA and 60% increase in word recognition score. No major complications noted. Conclusion: Two patients experienced significant improvement in their low-tone PTA and word recognition scoring following the labyrinthine venous vasculature ablation.Keywords: case report, sudden sensorineural hearing loss, venous congestion, vascular ablation
Procedia PDF Downloads 1353969 Features Vector Selection for the Recognition of the Fragmented Handwritten Numeric Chains
Authors: Salim Ouchtati, Aissa Belmeguenai, Mouldi Bedda
Abstract:
In this study, we propose an offline system for the recognition of the fragmented handwritten numeric chains. Firstly, we realized a recognition system of the isolated handwritten digits, in this part; the study is based mainly on the evaluation of neural network performances, trained with the gradient backpropagation algorithm. The used parameters to form the input vector of the neural network are extracted from the binary images of the isolated handwritten digit by several methods: the distribution sequence, sondes application, the Barr features, and the centered moments of the different projections and profiles. Secondly, the study is extended for the reading of the fragmented handwritten numeric chains constituted of a variable number of digits. The vertical projection was used to segment the numeric chain at isolated digits and every digit (or segment) was presented separately to the entry of the system achieved in the first part (recognition system of the isolated handwritten digits).Keywords: features extraction, handwritten numeric chains, image processing, neural networks
Procedia PDF Downloads 2653968 Perceiving Casual Speech: A Gating Experiment with French Listeners of L2 English
Authors: Naouel Zoghlami
Abstract:
Spoken-word recognition involves the simultaneous activation of potential word candidates which compete with each other for final correct recognition. In continuous speech, the activation-competition process gets more complicated due to speech reductions existing at word boundaries. Lexical processing is more difficult in L2 than in L1 because L2 listeners often lack phonetic, lexico-semantic, syntactic, and prosodic knowledge in the target language. In this study, we investigate the on-line lexical segmentation hypotheses that French listeners of L2 English form and then revise as subsequent perceptual evidence is revealed. Our purpose is to shed further light on the processes of L2 spoken-word recognition in context and better understand L2 listening difficulties through a comparison of skilled and unskilled reactions at the point where their working hypothesis is rejected. We use a variant of the gating experiment in which subjects transcribe an English sentence presented in increments of progressively greater duration. The spoken sentence was “And this amazing athlete has just broken another world record”, chosen mainly because it included common reductions and phonetic features in English, such as elision and assimilation. Our preliminary results show that there is an important difference in the manner in which proficient and less-proficient L2 listeners handle connected speech. Less-proficient listeners delay recognition of words as they wait for lexical and syntactic evidence to appear in the gates. Further statistical results are currently being undertaken.Keywords: gating paradigm, spoken word recognition, online lexical segmentation, L2 listening
Procedia PDF Downloads 4623967 Words Spotting in the Images Handwritten Historical Documents
Authors: Issam Ben Jami
Abstract:
Information retrieval in digital libraries is very important because most famous historical documents occupy a significant value. The word spotting in historical documents is a very difficult notion, because automatic recognition of such documents is naturally cursive, it represents a wide variability in the level scale and translation words in the same documents. We first present a system for the automatic recognition, based on the extraction of interest points words from the image model. The extraction phase of the key points is chosen from the representation of the image as a synthetic description of the shape recognition in a multidimensional space. As a result, we use advanced methods that can find and describe interesting points invariant to scale, rotation and lighting which are linked to local configurations of pixels. We test this approach on documents of the 15th century. Our experiments give important results.Keywords: feature matching, historical documents, pattern recognition, word spotting
Procedia PDF Downloads 2743966 Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses
Authors: G. Tamulevičius, A. Serackis, T. Sledevič, D. Navakauskas
Abstract:
We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved.Keywords: transient noise pulses, noise reduction, dynamic time warping, speech recognition
Procedia PDF Downloads 5573965 Sarcasm Recognition System Using Hybrid Tone-Word Spotting Audio Mining Technique
Authors: Sandhya Baskaran, Hari Kumar Nagabushanam
Abstract:
Sarcasm sentiment recognition is an area of natural language processing that is being probed into in the recent times. Even with the advancements in NLP, typical translations of words, sentences in its context fail to provide the exact information on a sentiment or emotion of a user. For example, if something bad happens, the statement ‘That's just what I need, great! Terrific!’ is expressed in a sarcastic tone which could be misread as a positive sign by any text-based analyzer. In this paper, we are presenting a unique real time ‘word with its tone’ spotting technique which would provide the sentiment analysis for a tone or pitch of a voice in combination with the words being expressed. This hybrid approach increases the probability for identification of special sentiment like sarcasm much closer to the real world than by mining text or speech individually. The system uses a tone analyzer such as YIN-FFT which extracts pitch segment-wise that would be used in parallel with a speech recognition system. The clustered data is classified for sentiments and sarcasm score for each of it determined. Our Simulations demonstrates the improvement in f-measure of around 12% compared to existing detection techniques with increased precision and recall.Keywords: sarcasm recognition, tone-word spotting, natural language processing, pitch analyzer
Procedia PDF Downloads 2933964 Understanding the Interactive Nature in Auditory Recognition of Phonological/Grammatical/Semantic Errors at the Sentence Level: An Investigation Based upon Japanese EFL Learners’ Self-Evaluation and Actual Language Performance
Authors: Hirokatsu Kawashima
Abstract:
One important element of teaching/learning listening is intensive listening such as listening for precise sounds, words, grammatical, and semantic units. Several classroom-based investigations have been conducted to explore the usefulness of auditory recognition of phonological, grammatical and semantic errors in such a context. The current study reports the results of one such investigation, which targeted auditory recognition of phonological, grammatical, and semantic errors at the sentence level. 56 Japanese EFL learners participated in this investigation, in which their recognition performance of phonological, grammatical and semantic errors was measured on a 9-point scale by learners’ self-evaluation from the perspective of 1) two types of similar English sound (vowel and consonant minimal pair words), 2) two types of sentence word order (verb phrase-based and noun phrase-based word orders), and 3) two types of semantic consistency (verb-purpose and verb-place agreements), respectively, and their general listening proficiency was examined using standardized tests. A number of findings have been made about the interactive relationships between the three types of auditory error recognition and general listening proficiency. Analyses based on the OPLS (Orthogonal Projections to Latent Structure) regression model have disclosed, for example, that the three types of auditory error recognition are linked in a non-linear way: the highest explanatory power for general listening proficiency may be attained when quadratic interactions between auditory recognition of errors related to vowel minimal pair words and that of errors related to noun phrase-based word order are embraced (R2=.33, p=.01).Keywords: auditory error recognition, intensive listening, interaction, investigation
Procedia PDF Downloads 5113963 Chemical Constituents of Silene Arenarioides Desf
Authors: Haba Hamada, Lavaud Cathrine, Benkhaled Mohammed
Abstract:
The Silene genus is the most representative of the caryophyllaceae family for their rich content in secondary metabolites; saponins, flavonoids and flavonoids glycosides, phytoecdysones, oligosaccharides have been isolated and identified. The Silene genus represented by about 700 species in the temrerate region of the word, the main concentration of spcies is Europe, Asia and North Africa. Three known compounds 1-3 were isolated from the aerial parts of Silene arenarioides Desf. by using different chromatographic methods. The structures of the isolated compounds were determined as stigmasterolglycoside, Soyacerebroside, maltol glycoside. The structures of the isolated compounds were determined by using the NMR (1H-NMR, 13C-NMR, COSY, HSQC, and HMBC) techniques and mass spectroscopy. The antimicrobial and antioxydant activities of the different extracts and compound have been reported.Keywords: caryophyllaceae, flavonoids, saponosids, flavonoids glycosides
Procedia PDF Downloads 4023962 The Implementation of Word Study Wall in an Online English Word Memorization Class
Authors: Yidan Shao
Abstract:
With the advancement of the economy, technology promotes online teaching, and learning has become one of the common features in the educational field. Meanwhile, the dramatic expansion of the online environment provides opportunities for more learners, including second language learners. A greater command of vocabulary improves students’ learning capacity, and word acquisition and development play a critical role in learning. Furthermore, the Word Wall is an effective tool to improve students’ knowledge of words, which works for a wide range of age groups. Therefore, this study is going to use the Word Wall as an intervention to examine whether it can bring some memorization changes in an online English language class for a second language learner based on the word morphology method. The participant will take ten courses in the experiment as it plans. The findings show that the Word Wall activity plays a slight role in improving word memorizing, but it does affect instant memorization. If longer periods and more comprehensive designs of research can be applied, it is expected to have more value.Keywords: second language acquisition, word morphology, word memorization, the Word Wall
Procedia PDF Downloads 1183961 Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian
Authors: Sanja Seljan, Ivan Dunđer
Abstract:
The paper presents combined automatic speech recognition (ASR) for English and machine translation (MT) for English and Croatian in the domain of business correspondence. The first part presents results of training the ASR commercial system on two English data sets, enriched by error analysis. The second part presents results of machine translation performed by online tool Google Translate for English and Croatian and Croatian-English language pairs. Human evaluation in terms of usability is conducted and internal consistency calculated by Cronbach's alpha coefficient, enriched by error analysis. Automatic evaluation is performed by WER (Word Error Rate) and PER (Position-independent word Error Rate) metrics, followed by investigation of Pearson’s correlation with human evaluation.Keywords: automatic machine translation, integrated language technologies, quality evaluation, speech recognition
Procedia PDF Downloads 4833960 A Chinese Nested Named Entity Recognition Model Based on Lexical Features
Abstract:
In the field of named entity recognition, most of the research has been conducted around simple entities. However, for nested named entities, which still contain entities within entities, it has been difficult to identify them accurately due to their boundary ambiguity. In this paper, a hierarchical recognition model is constructed based on the grammatical structure and semantic features of Chinese text for boundary calculation based on lexical features. The analysis is carried out at different levels in terms of granularity, semantics, and lexicality, respectively, avoiding repetitive work to reduce computational effort and using the semantic features of words to calculate the boundaries of entities to improve the accuracy of the recognition work. The results of the experiments carried out on web-based microblogging data show that the model achieves an accuracy of 86.33% and an F1 value of 89.27% in recognizing nested named entities, making up for the shortcomings of some previous recognition models and improving the efficiency of recognition of nested named entities.Keywords: coarse-grained, nested named entity, Chinese natural language processing, word embedding, T-SNE dimensionality reduction algorithm
Procedia PDF Downloads 1273959 Using Maximization Entropy in Developing a Filipino Phonetically Balanced Wordlist for a Phoneme-Level Speech Recognition System
Authors: John Lorenzo Bautista, Yoon-Joong Kim
Abstract:
In this paper, a set of Filipino Phonetically Balanced Word list consisting of 250 words (PBW250) were constructed for a phoneme-level ASR system for the Filipino language. The Entropy Maximization is used to obtain phonological balance in the list. Entropy of phonemes in a word is maximized, providing an optimal balance in each word’s phonological distribution using the Add-Delete Method (PBW algorithm) and is compared to the modified PBW algorithm implemented in a dynamic algorithm approach to obtain optimization. The gained entropy score of 4.2791 and 4.2902 for the PBW and modified algorithm respectively. The PBW250 was recorded by 40 respondents, each with 2 sets data. Recordings from 30 respondents were trained to produce an acoustic model that were tested using recordings from 10 respondents using the HMM Toolkit (HTK). The results of test gave the maximum accuracy rate of 97.77% for a speaker dependent test and 89.36% for a speaker independent test.Keywords: entropy maximization, Filipino language, Hidden Markov Model, phonetically balanced words, speech recognition
Procedia PDF Downloads 4563958 Investigating the Influences of Long-Term, as Compared to Short-Term, Phonological Memory on the Word Recognition Abilities of Arabic Readers vs. Arabic Native Speakers: A Word-Recognition Study
Authors: Insiya Bhalloo
Abstract:
It is quite common in the Muslim faith for non-Arabic speakers to be able to convert written Arabic, especially Quranic Arabic, into a phonological code without significant semantic or syntactic knowledge. This is due to prior experience learning to read the Quran (a religious text written in Classical Arabic), from a very young age such as via enrolment in Quranic Arabic classes. As compared to native speakers of Arabic, these Arabic readers do not have a comprehensive morpho-syntactic knowledge of the Arabic language, nor can understand, or engage in Arabic conversation. The study seeks to investigate whether mere phonological experience (as indicated by the Arabic readers’ experience with Arabic phonology and the sound-system) is sufficient to cause phonological-interference during word recognition of previously-heard words, despite the participants’ non-native status. Both native speakers of Arabic and non-native speakers of Arabic, i.e., those individuals that learned to read the Quran from a young age, will be recruited. Each experimental session will include two phases: An exposure phase and a test phase. During the exposure phase, participants will be presented with Arabic words (n=40) on a computer screen. Half of these words will be common words found in the Quran while the other half will be words commonly found in Modern Standard Arabic (MSA) but either non-existent or prevalent at a significantly lower frequency within the Quran. During the test phase, participants will then be presented with both familiar (n = 20; i.e., those words presented during the exposure phase) and novel Arabic words (n = 20; i.e., words not presented during the exposure phase. ½ of these presented words will be common Quranic Arabic words and the other ½ will be common MSA words but not Quranic words. Moreover, ½ the Quranic Arabic and MSA words presented will be comprised of nouns, while ½ the Quranic Arabic and MSA will be comprised of verbs, thereby eliminating word-processing issues affected by lexical category. Participants will then determine if they had seen that word during the exposure phase. This study seeks to investigate whether long-term phonological memory, such as via childhood exposure to Quranic Arabic orthography, has a differential effect on the word-recognition capacities of native Arabic speakers and Arabic readers; we seek to compare the effects of long-term phonological memory in comparison to short-term phonological exposure (as indicated by the presentation of familiar words from the exposure phase). The researcher’s hypothesis is that, despite the lack of lexical knowledge, early experience with converting written Quranic Arabic text into a phonological code will help participants recall the familiar Quranic words that appeared during the exposure phase more accurately than those that were not presented during the exposure phase. Moreover, it is anticipated that the non-native Arabic readers will also report more false alarms to the unfamiliar Quranic words, due to early childhood phonological exposure to Quranic Arabic script - thereby causing false phonological facilitatory effects.Keywords: modern standard arabic, phonological facilitation, phonological memory, Quranic arabic, word recognition
Procedia PDF Downloads 3553957 Voice Commands Recognition of Mentor Robot in Noisy Environment Using HTK
Authors: Khenfer-Koummich Fatma, Hendel Fatiha, Mesbahi Larbi
Abstract:
this paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a man-machine interface with a voice recognition system that allows the operator to tele-operate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands spoken in two languages: French and Arabic. The recognition rate obtained is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equal to 30 db, the Arabic speech recognition rate is 69% and 80% for French speech recognition rate. This can be explained by the ability of phonetic context of each speech when the noise is added.Keywords: voice command, HMM, TIMIT, noise, HTK, Arabic, speech recognition
Procedia PDF Downloads 3813956 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language
Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim
Abstract:
The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition
Procedia PDF Downloads 3213955 Word of Mouth and Its Impact on Marketing
Authors: Fatima Naz, Ayesha Tariq
Abstract:
In view of growing of the internet users for e-commerce and taking into account, the emergent impact of word of mouth phenomenon this research has different aims. The aims of this study were built following dissimilar discussion with teachers and colleagues enlightening that word of mouth information for online purchasing do not have the same effect for everybody. Then they were born following dissimilar researchers together with what was already done in previous researches and what was completed. As a result different aims were drawn; the initial aim of this research is to study the attention of the customers in the word of mouth to power their online purchasing activities. The next aim is to analyze the people influenced by the interest of word of mouth. The following aim is to examine the marketing behavior bearing in mind the internet progress and word of mouth, their consideration for word of mouth marketing. In the form of research questions the aims of the study are: 1) How community utilizes and multiplies word of mouth information about online purchasing experience? 2) How communities perceive the word of mouth marketing? 3) How marketers take the word of mouth phenomenon and how they handle it?Keywords: belief, power, inspiration, self-expression, positive attitude to online marketing, forwarding of contents, purchasing decision, standard marketing
Procedia PDF Downloads 4223954 Recognition of Grocery Products in Images Captured by Cellular Phones
Authors: Farshideh Einsele, Hassan Foroosh
Abstract:
In this paper, we present a robust algorithm to recognize extracted text from grocery product images captured by mobile phone cameras. Recognition of such text is challenging since text in grocery product images varies in its size, orientation, style, illumination, and can suffer from perspective distortion. Pre-processing is performed to make the characters scale and rotation invariant. Since text degradations can not be appropriately defined using wellknown geometric transformations such as translation, rotation, affine transformation and shearing, we use the whole character black pixels as our feature vector. Classification is performed with minimum distance classifier using the maximum likelihood criterion, which delivers very promising Character Recognition Rate (CRR) of 89%. We achieve considerably higher Word Recognition Rate (WRR) of 99% when using lower level linguistic knowledge about product words during the recognition process.Keywords: camera-based OCR, feature extraction, document, image processing, grocery products
Procedia PDF Downloads 4053953 Improved Dynamic Bayesian Networks Applied to Arabic On Line Characters Recognition
Authors: Redouane Tlemsani, Abdelkader Benyettou
Abstract:
Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology. This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data. Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables. In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization. The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition, computer vision
Procedia PDF Downloads 4273952 Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models (HMMs)
Authors: Rabi Mouhcine, Amrouch Mustapha, Mahani Zouhir, Mammass Driss
Abstract:
In this paper, we present a system for offline recognition cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The system is analytical without explicit segmentation used embedded training to perform and enhance the character models. Extraction features preceded by baseline estimation are statistical and geometric to integrate both the peculiarities of the text and the pixel distribution characteristics in the word image. These features are modelled using hidden Markov models and trained by embedded training. The experiments on images of the benchmark IFN/ENIT database show that the proposed system improves recognition.Keywords: recognition, handwriting, Arabic text, HMMs, embedded training
Procedia PDF Downloads 3533951 The Facilitatory Effect of Phonological Priming on Visual Word Recognition in Arabic as a Function of Lexicality and Overlap Positions
Authors: Ali Al Moussaoui
Abstract:
An experiment was designed to assess the performance of 24 Lebanese adults (mean age 29:5 years) in a lexical decision making (LDM) task to find out how the facilitatory effect of phonological priming (PP) affects the speed of visual word recognition in Arabic as lexicality (wordhood) and phonological overlap positions (POP) vary. The experiment falls in line with previous research on phonological priming in the light of the cohort theory and in relation to visual word recognition. The experiment also departs from the research on the Arabic language in which the importance of the consonantal root as a distinct morphological unit is confirmed. Based on previous research, it is hypothesized that (1) PP has a facilitating effect in LDM with words but not with nonwords and (2) final phonological overlap between the prime and the target is more facilitatory than initial overlap. An LDM task was programmed on PsychoPy application. Participants had to decide if a target (e.g., bayn ‘between’) preceded by a prime (e.g., bayt ‘house’) is a word or not. There were 4 conditions: no PP (NP), nonwords priming nonwords (NN), nonwords priming words (NW), and words priming words (WW). The conditions were simultaneously controlled for word length, wordhood, and POP. The interstimulus interval was 700 ms. Within the PP conditions, POP was controlled for in which there were 3 overlap positions between the primes and the targets: initial (e.g., asad ‘lion’ and asaf ‘sorrow’), final (e.g., kattab ‘cause to write’ 2sg-mas and rattab ‘organize’ 2sg-mas), or two-segmented (e.g., namle ‘ant’ and naħle ‘bee’). There were 96 trials, 24 in each condition, using a within-subject design. The results show that concerning (1), the highest average reaction time (RT) is that in NN, followed firstly by NW and finally by WW. There is statistical significance only between the pairs NN-NW and NN-WW. Regarding (2), the shortest RT is that in the two-segmented overlap condition, followed by the final POP in the first place and the initial POP in the last place. The difference between the two-segmented and the initial overlap is significant, while other pairwise comparisons are not. Based on these results, PP emerges as a facilitatory phenomenon that is highly sensitive to lexicality and POP. While PP can have a facilitating effect under lexicality, it shows no facilitation in its absence, which intersects with several previous findings. Participants are found to be more sensitive to the final phonological overlap than the initial overlap, which also coincides with a body of earlier literature. The results contradict the cohort theory’s stress on the onset overlap position and, instead, give more weight to final overlap, and even heavier weight to the two-segmented one. In conclusion, this study confirms the facilitating effect of PP with words but not when stimuli (at least the primes and at most both the primes and targets) are nonwords. It also shows that the two-segmented priming is the most influential in LDM in Arabic.Keywords: lexicality, phonological overlap positions, phonological priming, visual word recognition
Procedia PDF Downloads 1843950 Recognition of Voice Commands of Mentor Robot in Noisy Environment Using Hidden Markov Model
Authors: Khenfer Koummich Fatma, Hendel Fatiha, Mesbahi Larbi
Abstract:
This paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a human-machine interface with a voice recognition system that allows the operator to teleoperate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands pronounced in two languages: French and Arabic. The obtained recognition rate is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equals 30 dB, in this case; the Arabic speech recognition rate is 69%, and the French speech recognition rate is 80%. This can be explained by the ability of phonetic context of each speech when the noise is added.Keywords: Arabic speech recognition, Hidden Markov Model (HMM), HTK, noise, TIMIT, voice command
Procedia PDF Downloads 3843949 An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System
Authors: John Lorenzo Bautista, Yoon-Joong Kim
Abstract:
This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures.Keywords: Filipino language, Hidden Markov Model, HTK system, speech recognition
Procedia PDF Downloads 4793948 A Word-to-Vector Formulation for Word Representation
Authors: Sandra Rizkallah, Amir F. Atiya
Abstract:
This work presents a novel word to vector representation that is based on embedding the words into a sphere, whereby the dot product of the corresponding vectors represents the similarity between any two words. Embedding the vectors into a sphere enabled us to take into consideration the antonymity between words, not only the synonymity, because of the suitability to handle the polarity nature of words. For example, a word and its antonym can be represented as a vector and its negative. Moreover, we have managed to extract an adequate vocabulary. The obtained results show that the proposed approach can capture the essence of the language, and can be generalized to estimate a correct similarity of any new pair of words.Keywords: natural language processing, word to vector, text similarity, text mining
Procedia PDF Downloads 2733947 The Investigation of Women Civil Engineers’ Identity Development through the Lens of Recognition Theory
Authors: Hasan Sungur, Evrim Baran, Benjamin Ahn, Aliye Karabulut Ilgu, Chris Rehmann, Cassandra Rutherford
Abstract:
Engineering identity contributes to the professional and educational persistence of women engineers. A crucial factor contributing to the development of the engineering identity is recognition. Those without adequate recognition often do not succeed in positively building their identities. This research draws on Honneth’s recognition theory to identify factors impacting women civil engineers’ feelings of recognition as civil engineers. A survey was composed and distributed to 330 female alumni who graduated from the Department of Civil, Construction, and Environmental Engineering at Iowa State University in the last ten years. The survey items include demographics, perceptions of the identity of civil engineering, and factors that influence the recognition of civil engineering identities, such as views of society and family. Descriptive analysis of the survey responses revealed that the perceptions of civil engineering varied widely. Participants’ definitions of civil engineering included the terms: construction, design, and infrastructure. Almost half of the participants reported that the major reason to study civil engineering was their interest in the subject matter, and most reported that they were proud to be civil engineers. Many study participants reported that their parents see them as civil engineers. Treatment of institutions and the workplace were also considered as having a significant impact on the recognition of women civil engineers. Almost half of the participants reported that they felt isolated or ignored at work because of their gender. This research emphasizes the importance of recognition for the development of the civil engineering identity of womenKeywords: civil engineering, gender, identity, recognition
Procedia PDF Downloads 2543946 Text Emotion Recognition by Multi-Head Attention based Bidirectional LSTM Utilizing Multi-Level Classification
Authors: Vishwanath Pethri Kamath, Jayantha Gowda Sarapanahalli, Vishal Mishra, Siddhesh Balwant Bandgar
Abstract:
Recognition of emotional information is essential in any form of communication. Growing HCI (Human-Computer Interaction) in recent times indicates the importance of understanding of emotions expressed and becomes crucial for improving the system or the interaction itself. In this research work, textual data for emotion recognition is used. The text being the least expressive amongst the multimodal resources poses various challenges such as contextual information and also sequential nature of the language construction. In this research work, the proposal is made for a neural architecture to resolve not less than 8 emotions from textual data sources derived from multiple datasets using google pre-trained word2vec word embeddings and a Multi-head attention-based bidirectional LSTM model with a one-vs-all Multi-Level Classification. The emotions targeted in this research are Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame, and Surprise. Textual data from multiple datasets were used for this research work such as ISEAR, Go Emotions, Affect datasets for creating the emotions’ dataset. Data samples overlap or conflicts were considered with careful preprocessing. Our results show a significant improvement with the modeling architecture and as good as 10 points improvement in recognizing some emotions.Keywords: text emotion recognition, bidirectional LSTM, multi-head attention, multi-level classification, google word2vec word embeddings
Procedia PDF Downloads 173