Search results for: speech recognition systems.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 5125

Search results for: speech recognition systems.

4915 Topology-Based Character Recognition Method for Coin Date Detection

Authors: Xingyu Pan, Laure Tougne

Abstract:

For recognizing coins, the graved release date is important information to identify precisely its monetary type. However, reading characters in coins meets much more obstacles than traditional character recognition tasks in the other fields, such as reading scanned documents or license plates. To address this challenging issue in a numismatic context, we propose a training-free approach dedicated to detection and recognition of the release date of the coin. In the first step, the date zone is detected by comparing histogram features; in the second step, a topology-based algorithm is introduced to recognize coin numbers with various font types represented by binary gradient map. Our method obtained a recognition rate of 92% on synthetic data and of 44% on real noised data.

Keywords: Coin, detection, character recognition, topology.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1418
4914 Syntactic Recognition of Distorted Patterns

Authors: Marek Skomorowski

Abstract:

In syntactic pattern recognition a pattern can be represented by a graph. Given an unknown pattern represented by a graph g, the problem of recognition is to determine if the graph g belongs to a language L(G) generated by a graph grammar G. The so-called IE graphs have been defined in [1] for a description of patterns. The IE graphs are generated by so-called ETPL(k) graph grammars defined in [1]. An efficient, parsing algorithm for ETPL(k) graph grammars for syntactic recognition of patterns represented by IE graphs has been presented in [1]. In practice, structural descriptions may contain pattern distortions, so that the assignment of a graph g, representing an unknown pattern, to a graph language L(G) generated by an ETPL(k) graph grammar G is rejected by the ETPL(k) type parsing. Therefore, there is a need for constructing effective parsing algorithms for recognition of distorted patterns. The purpose of this paper is to present a new approach to syntactic recognition of distorted patterns. To take into account all variations of a distorted pattern under study, a probabilistic description of the pattern is needed. A random IE graph approach is proposed here for such a description ([2]).

Keywords: Syntactic pattern recognition, Distorted patterns, Random graphs, Graph grammars.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1352
4913 Hand Gesture Recognition Based on Combined Features Extraction

Authors: Mahmoud Elmezain, Ayoub Al-Hamadi, Bernd Michaelis

Abstract:

Hand gesture is an active area of research in the vision community, mainly for the purpose of sign language recognition and Human Computer Interaction. In this paper, we propose a system to recognize alphabet characters (A-Z) and numbers (0-9) in real-time from stereo color image sequences using Hidden Markov Models (HMMs). Our system is based on three main stages; automatic segmentation and preprocessing of the hand regions, feature extraction and classification. In automatic segmentation and preprocessing stage, color and 3D depth map are used to detect hands where the hand trajectory will take place in further step using Mean-shift algorithm and Kalman filter. In the feature extraction stage, 3D combined features of location, orientation and velocity with respected to Cartesian systems are used. And then, k-means clustering is employed for HMMs codeword. The final stage so-called classification, Baum- Welch algorithm is used to do a full train for HMMs parameters. The gesture of alphabets and numbers is recognized using Left-Right Banded model in conjunction with Viterbi algorithm. Experimental results demonstrate that, our system can successfully recognize hand gestures with 98.33% recognition rate.

Keywords: Gesture Recognition, Computer Vision & Image Processing, Pattern Recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3987
4912 Pakistan Sign Language Recognition Using Statistical Template Matching

Authors: Aleem Khalid Alvi, M. Yousuf Bin Azhar, Mehmood Usman, Suleman Mumtaz, Sameer Rafiq, RaziUr Rehman, Israr Ahmed

Abstract:

Sign language recognition has been a topic of research since the first data glove was developed. Many researchers have attempted to recognize sign language through various techniques. However none of them have ventured into the area of Pakistan Sign Language (PSL). The Boltay Haath project aims at recognizing PSL gestures using Statistical Template Matching. The primary input device is the DataGlove5 developed by 5DT. Alternative approaches use camera-based recognition which, being sensitive to environmental changes are not always a good choice.This paper explains the use of Statistical Template Matching for gesture recognition in Boltay Haath. The system recognizes one handed alphabet signs from PSL.

Keywords: Gesture Recognition, Pakistan Sign Language, DataGlove, Human Computer Interaction, Template Matching, BoltayHaath

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2962
4911 SMaTTS: Standard Malay Text to Speech System

Authors: Othman O. Khalifa, Zakiah Hanim Ahmad, Teddy Surya Gunawan

Abstract:

This paper presents a rule-based text- to- speech (TTS) Synthesis System for Standard Malay, namely SMaTTS. The proposed system using sinusoidal method and some pre- recorded wave files in generating speech for the system. The use of phone database significantly decreases the amount of computer memory space used, thus making the system very light and embeddable. The overall system was comprised of two phases the Natural Language Processing (NLP) that consisted of the high-level processing of text analysis, phonetic analysis, text normalization and morphophonemic module. The module was designed specially for SM to overcome few problems in defining the rules for SM orthography system before it can be passed to the DSP module. The second phase is the Digital Signal Processing (DSP) which operated on the low-level process of the speech waveform generation. A developed an intelligible and adequately natural sounding formant-based speech synthesis system with a light and user-friendly Graphical User Interface (GUI) is introduced. A Standard Malay Language (SM) phoneme set and an inclusive set of phone database have been constructed carefully for this phone-based speech synthesizer. By applying the generative phonology, a comprehensive letter-to-sound (LTS) rules and a pronunciation lexicon have been invented for SMaTTS. As for the evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was compiled and several experiments have been performed to evaluate the quality of the synthesized speech by analyzing the Mean Opinion Score (MOS) obtained. The overall performance of the system as well as the room for improvements was thoroughly discussed.

Keywords: Natural Language Processing, Text-To-Speech (TTS), Diphone, source filter, low-/ high- level synthesis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1931
4910 Comparative Study of Filter Characteristics as Statistical Vocal Correlates of Clinical Psychiatric State in Human

Authors: Thaweesak Yingthawornsuk, Chusak Thanawattano

Abstract:

Acoustical properties of speech have been shown to be related to mental states of speaker with symptoms: depression and remission. This paper describes way to address the issue of distinguishing depressed patients from remitted subjects based on measureable acoustics change of their spoken sound. The vocal-tract related frequency characteristics of speech samples from female remitted and depressed patients were analyzed via speech processing techniques and consequently, evaluated statistically by cross-validation with Support Vector Machine. Our results comparatively show the classifier's performance with effectively correct separation of 93% determined from testing with the subjectbased feature model and 88% from the frame-based model based on the same speech samples collected from hospital visiting interview sessions between patients and psychiatrists.

Keywords: Depression, SVM, Vocal Extract, Vocal Tract

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1508
4909 Object Recognition Approach Based on Generalized Hough Transform and Color Distribution Serving in Generating Arabic Sentences

Authors: Nada Farhani, Naim Terbeh, Mounir Zrigui

Abstract:

The recognition of the objects contained in images has always presented a challenge in the field of research because of several difficulties that the researcher can envisage because of the variability of shape, position, contrast of objects, etc. In this paper, we will be interested in the recognition of objects. The classical Hough Transform (HT) presented a tool for detecting straight line segments in images. The technique of HT has been generalized (GHT) for the detection of arbitrary forms. With GHT, the forms sought are not necessarily defined analytically but rather by a particular silhouette. For more precision, we proposed to combine the results from the GHT with the results from a calculation of similarity between the histograms and the spatiograms of the images. The main purpose of our work is to use the concepts from recognition to generate sentences in Arabic that summarize the content of the image.

Keywords: Recognition of shape, generalized hough transformation, histogram, Spatiogram, learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 567
4908 Quantitative Analysis of PCA, ICA, LDA and SVM in Face Recognition

Authors: Liton Jude Rozario, Mohammad Reduanul Haque, Md. Ziarul Islam, Mohammad Shorif Uddin

Abstract:

Face recognition is a technique to automatically identify or verify individuals. It receives great attention in identification, authentication, security and many more applications. Diverse methods had been proposed for this purpose and also a lot of comparative studies were performed. However, researchers could not reach unified conclusion. In this paper, we are reporting an extensive quantitative accuracy analysis of four most widely used face recognition algorithms: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) using AT&T, Sheffield and Bangladeshi people face databases under diverse situations such as illumination, alignment and pose variations.

Keywords: PCA, ICA, LDA, SVM, face recognition, noise.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2383
4907 Local Linear Model Tree (LOLIMOT) Reconfigurable Parallel Hardware

Authors: A. Pedram, M. R. Jamali, T. Pedram, S. M. Fakhraie, C. Lucas

Abstract:

Local Linear Neuro-Fuzzy Models (LLNFM) like other neuro- fuzzy systems are adaptive networks and provide robust learning capabilities and are widely utilized in various applications such as pattern recognition, system identification, image processing and prediction. Local linear model tree (LOLIMOT) is a type of Takagi-Sugeno-Kang neuro fuzzy algorithm which has proven its efficiency compared with other neuro fuzzy networks in learning the nonlinear systems and pattern recognition. In this paper, a dedicated reconfigurable and parallel processing hardware for LOLIMOT algorithm and its applications are presented. This hardware realizes on-chip learning which gives it the capability to work as a standalone device in a system. The synthesis results on FPGA platforms show its potential to improve the speed at least 250 of times faster than software implemented algorithms.

Keywords: LOLIMOT, hardware, neurofuzzy systems, reconfigurable, parallel.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3833
4906 Inverse Sets-based Recognition of Video Clips

Authors: Alexei M. Mikhailov

Abstract:

The paper discusses the mathematics of pattern indexing and its applications to recognition of visual patterns that are found in video clips. It is shown that (a) pattern indexes can be represented by collections of inverted patterns, (b) solutions to pattern classification problems can be found as intersections and histograms of inverted patterns and, thus, matching of original patterns avoided.

Keywords: Artificial neural cortex, computational biology, data mining, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2069
4905 Virtual Speaking Head for Hearing Impaired Students

Authors: Eva Pajorová, Ladislav Hluchý

Abstract:

Developed tool is one of system tools for easier access to various scientific areas and real time interactive learning between lecturer and for hearing impaired students. There is no demand for the lecturer to know Sign Language (SL). Instead, the new software tools will perform the translation of the regular speech into SL, after which it will be transferred to the student. On the other side, the questions of the student (in SL) will be translated and transferred to the lecturer in text or speech. One of those tools is presented tool. It-s too for developing the correct Speech Visemes as a root of total communication method for hearing impared students.

Keywords: Impared people, sing language, communication methods.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1805
4904 Improved Dynamic Bayesian Networks Applied to Arabic on Line Characters Recognition

Authors: Redouane Tlemsani, Abdelkader Benyettou

Abstract:

Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology.

This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data.

Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables.

In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization.

The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.

Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1690
4903 Noise Estimation for Speech Enhancement in Non-Stationary Environments-A New Method

Authors: Ch.V.Rama Rao, Gowthami., Harsha., Rajkumar., M.B.Rama Murthy, K.Srinivasa Rao, K.AnithaSheela

Abstract:

This paper presents a new method for estimating the nonstationary noise power spectral density given a noisy signal. The method is based on averaging the noisy speech power spectrum using time and frequency dependent smoothing factors. These factors are adjusted based on signal-presence probability in individual frequency bins. Signal presence is determined by computing the ratio of the noisy speech power spectrum to its local minimum, which is updated continuously by averaging past values of the noisy speech power spectra with a look-ahead factor. This method adapts very quickly to highly non-stationary noise environments. The proposed method achieves significant improvements over a system that uses voice activity detector (VAD) in noise estimation.

Keywords: Noise estimation, Non-stationary noise, Speechenhancement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2298
4902 Initialization Method of Reference Vectors for Improvement of Recognition Accuracy in LVQ

Authors: Yuji Mizuno, Hiroshi Mabuchi

Abstract:

Initial values of reference vectors have significant influence on recognition accuracy in LVQ. There are several existing techniques, such as SOM and k-means, for setting initial values of reference vectors, each of which has provided some positive results. However, those results are not sufficient for the improvement of recognition accuracy. This study proposes an ACO-used method for initializing reference vectors with an aim to achieve recognition accuracy higher than those obtained through conventional methods. Moreover, we will demonstrate the effectiveness of the proposed method by applying it to the wine data and English vowel data and comparing its results with those of conventional methods.

Keywords: Clustering, LVQ, ACO, SOM, k-means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1211
4901 Posture Recognition using Combined Statistical and Geometrical Feature Vectors based on SVM

Authors: Omer Rashid, Ayoub Al-Hamadi, Axel Panning, Bernd Michaelis

Abstract:

It is hard to percept the interaction process with machines when visual information is not available. In this paper, we have addressed this issue to provide interaction through visual techniques. Posture recognition is done for American Sign Language to recognize static alphabets and numbers. 3D information is exploited to obtain segmentation of hands and face using normal Gaussian distribution and depth information. Features for posture recognition are computed using statistical and geometrical properties which are translation, rotation and scale invariant. Hu-Moment as statistical features and; circularity and rectangularity as geometrical features are incorporated to build the feature vectors. These feature vectors are used to train SVM for classification that recognizes static alphabets and numbers. For the alphabets, curvature analysis is carried out to reduce the misclassifications. The experimental results show that proposed system recognizes posture symbols by achieving recognition rate of 98.65% and 98.6% for ASL alphabets and numbers respectively.

Keywords: Feature Extraction, Posture Recognition, Pattern Recognition, Application.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1477
4900 Finger Vein Recognition using PCA-based Methods

Authors: Sepehr Damavandinejadmonfared, Ali Khalili Mobarakeh, Mohsen Pashna, , Jiangping Gou Sayedmehran Mirsafaie Rizi, Saba Nazari, Shadi Mahmoodi Khaniabadi, Mohamad Ali Bagheri

Abstract:

In this paper a novel algorithm is proposed to merit the accuracy of finger vein recognition. The performances of Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), and Kernel Entropy Component Analysis (KECA) in this algorithm are validated and compared with each other in order to determine which one is the most appropriate one in terms of finger vein recognition.

Keywords: Biometrics, finger vein recognition, PrincipalComponent Analysis (PCA), Kernel Principal Component Analysis(KPCA), Kernel Entropy Component Analysis (KPCA).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2630
4899 Absence of Developmental Change in Epenthetic Vowel Duration in Japanese Speakers’ English

Authors: Takayuki Konishi, Kakeru Yazawa, Mariko Kondo

Abstract:

This study examines developmental change in the production of epenthetic vowels by Japanese learners of English in relation to acquisition of L2 English speech rhythm. Seventy-two Japanese learners of English in the J-AESOP corpus were divided into lower- and higher-level learners according to their proficiency score and the frequency of vowel epenthesis. Three learners were excluded because no vowel epenthesis was observed in their utterances. The analysis of their read English speech data showed no statistical difference between lower- and higher-level learners, implying the absence of any developmental change in durations of epenthetic vowels. This result, together with the findings of previous studies, will be discussed in relation to the transfer of L1 phonology and manifestation of L2 English rhythm.

Keywords: Vowel epenthesis, Japanese learners of English, L2 speech corpus, speech rhythm.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1084
4898 Automatic Landmark Selection Based on Feature Clustering for Visual Autonomous Unmanned Aerial Vehicle Navigation

Authors: Paulo Fernando Silva Filho, Elcio Hideiti Shiguemori

Abstract:

The selection of specific landmarks for an Unmanned Aerial Vehicles’ Visual Navigation systems based on Automatic Landmark Recognition has significant influence on the precision of the system’s estimated position. At the same time, manual selection of the landmarks does not guarantee a high recognition rate, which would also result on a poor precision. This work aims to develop an automatic landmark selection that will take the image of the flight area and identify the best landmarks to be recognized by the Visual Navigation Landmark Recognition System. The criterion to select a landmark is based on features detected by ORB or AKAZE and edges information on each possible landmark. Results have shown that disposition of possible landmarks is quite different from the human perception.

Keywords: Clustering, edges, feature points, landmark selection, X-Means.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 753
4897 Delineating Students’ Speaking Anxieties and Assessment Gaps in Online Speech Performances

Authors: Mary Jane B. Suarez

Abstract:

Speech anxiety is innumerable in any traditional communication classes especially for ESL students. The speech anxiety intensifies when communication skills assessments have taken its toll in an online mode of learning due to the perils of the COVID-19 virus. Teachers and students have experienced vast ambiguity on how to realize a still effective way to teach and learn various speaking skills amidst the pandemic. This mixed method study determined the factors that affected the public speaking skills of students in online performances, delineated the assessment gaps in assessing speaking skills in an online setup, and recommended ways to address students’ speech anxieties. Using convergent parallel design, quantitative data were gathered by examining the desired learning competencies of the English course including a review of the teacher’s class record to analyze how students’ performances reflected a significantly high level of anxiety in online speech delivery. Focus group discussion was also conducted for qualitative data describing students’ public speaking anxiety and assessment gaps. Results showed a significantly high level of students’ speech anxiety affected by time constraints, use of technology, lack of audience response, being conscious of making mistakes, and the use of English as a second language. The study presented recommendations to redesign curricular assessments of English teachers and to have a robust diagnosis of students’ speaking anxiety to better cater to the needs of learners in attempt to bridge any gaps in cultivating public speaking skills of students as educational institutions segue from the pandemic to the post-pandemic milieu.

Keywords: Blended learning, communication skills assessment, online speech delivery, public speaking anxiety, speech anxiety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 114
4896 Sentence Modality Recognition in French based on Prosody

Authors: Pavel Král, Jana Klečková, Christophe Cerisara

Abstract:

This paper deals with automatic sentence modality recognition in French. In this work, only prosodic features are considered. The sentences are recognized according to the three following modalities: declarative, interrogative and exclamatory sentences. This information will be used to animate a talking head for deaf and hearing-impaired children. We first statistically study a real radio corpus in order to assess the feasibility of the automatic modeling of sentence types. Then, we test two sets of prosodic features as well as two different classifiers and their combination. We further focus our attention on questions recognition, as this modality is certainly the most important one for the target application.

Keywords: Automatic sentences modality recognition (ASMR), fundamental frequency (F0), energy, modal corpus, prosody.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1621
4895 The Performance Improvement of Automatic Modulation Recognition Using Simple Feature Manipulation, Analysis of the HOS, and Voted Decision

Authors: Heroe Wijanto, Sugihartono, Suhartono Tjondronegoro, Kuspriyanto

Abstract:

The use of High Order Statistics (HOS) analysis is expected to provide so many candidates of features that can be selected for pattern recognition. More candidates of the feature can be extracted using simple manipulation through a specific mathematical function prior to the HOS analysis. Feature extraction method using HOS analysis combined with Difference to the Nth-Power manipulation has been examined in application for Automatic Modulation Recognition (AMR) to perform scheme recognition of three digital modulation signal, i.e. QPSK-16QAM-64QAM in the AWGN transmission channel. The simulation results is reported when the analysis of HOS up to order-12 and the manipulation of Difference to the Nth-Power up to N = 4. The obtained accuracy rate of AMR using the method of Simple Decision obtained 90% in SNR > 10 dB in its classifier, while using the method of Voted Decision is 96% in SNR > 2 dB.

Keywords: modulation, automatic modulation recognition, feature analysis, feature manipulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2072
4894 Power System Security Assessment using Binary SVM Based Pattern Recognition

Authors: S Kalyani, K Shanti Swarup

Abstract:

Power System Security is a major concern in real time operation. Conventional method of security evaluation consists of performing continuous load flow and transient stability studies by simulation program. This is highly time consuming and infeasible for on-line application. Pattern Recognition (PR) is a promising tool for on-line security evaluation. This paper proposes a Support Vector Machine (SVM) based binary classification for static and transient security evaluation. The proposed SVM based PR approach is implemented on New England 39 Bus and IEEE 57 Bus systems. The simulation results of SVM classifier is compared with the other classifier algorithms like Method of Least Squares (MLS), Multi- Layer Perceptron (MLP) and Linear Discriminant Analysis (LDA) classifiers.

Keywords: Static Security, Transient Security, Pattern Recognition, Classifier, Support Vector Machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1824
4893 Word Recognition and Learning based on Associative Memories and Hidden Markov Models

Authors: Zöhre Kara Kayikci, Günther Palm

Abstract:

A word recognition architecture based on a network of neural associative memories and hidden Markov models has been developed. The input stream, composed of subword-units like wordinternal triphones consisting of diphones and triphones, is provided to the network of neural associative memories by hidden Markov models. The word recognition network derives words from this input stream. The architecture has the ability to handle ambiguities on subword-unit level and is also able to add new words to the vocabulary during performance. The architecture is implemented to perform the word recognition task in a language processing system for understanding simple command sentences like “bot show apple".

Keywords: Hebbian learning, hidden Markov models, neuralassociative memories, word recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1470
4892 An Improved Illumination Normalization based on Anisotropic Smoothing for Face Recognition

Authors: Sanghoon Kim, Sun-Tae Chung, Souhwan Jung, Seongwon Cho

Abstract:

Robust face recognition under various illumination environments is very difficult and needs to be accomplished for successful commercialization. In this paper, we propose an improved illumination normalization method for face recognition. Illumination normalization algorithm based on anisotropic smoothing is well known to be effective among illumination normalization methods but deteriorates the intensity contrast of the original image, and incurs less sharp edges. The proposed method in this paper improves the previous anisotropic smoothing-based illumination normalization method so that it increases the intensity contrast and enhances the edges while diminishing the effect of illumination variations. Due to the result of these improvements, face images preprocessed by the proposed illumination normalization method becomes to have more distinctive feature vectors (Gabor feature vectors) for face recognition. Through experiments of face recognition based on Gabor feature vector similarity, the effectiveness of the proposed illumination normalization method is verified.

Keywords: Illumination Normalization, Face Recognition, Anisotropic smoothing, Gabor feature vector.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1504
4891 Foot Recognition Using Deep Learning for Knee Rehabilitation

Authors: Rakkrit Duangsoithong, Jermphiphut Jaruenpunyasak, Alba Garcia

Abstract:

The use of foot recognition can be applied in many medical fields such as the gait pattern analysis and the knee exercises of patients in rehabilitation. Generally, a camera-based foot recognition system is intended to capture a patient image in a controlled room and background to recognize the foot in the limited views. However, this system can be inconvenient to monitor the knee exercises at home. In order to overcome these problems, this paper proposes to use the deep learning method using Convolutional Neural Networks (CNNs) for foot recognition. The results are compared with the traditional classification method using LBP and HOG features with kNN and SVM classifiers. According to the results, deep learning method provides better accuracy but with higher complexity to recognize the foot images from online databases than the traditional classification method.

Keywords: Convolutional neural networks, deep learning, foot recognition, knee rehabilitation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1379
4890 Robust Face Recognition using AAM and Gabor Features

Authors: Sanghoon Kim, Sun-Tae Chung, Souhwan Jung, Seoungseon Jeon, Jaemin Kim, Seongwon Cho

Abstract:

In this paper, we propose a face recognition algorithm using AAM and Gabor features. Gabor feature vectors which are well known to be robust with respect to small variations of shape, scaling, rotation, distortion, illumination and poses in images are popularly employed for feature vectors for many object detection and recognition algorithms. EBGM, which is prominent among face recognition algorithms employing Gabor feature vectors, requires localization of facial feature points where Gabor feature vectors are extracted. However, localization method employed in EBGM is based on Gabor jet similarity and is sensitive to initial values. Wrong localization of facial feature points affects face recognition rate. AAM is known to be successfully applied to localization of facial feature points. In this paper, we devise a facial feature point localization method which first roughly estimate facial feature points using AAM and refine facial feature points using Gabor jet similarity-based facial feature localization method with initial points set by the rough facial feature points obtained from AAM, and propose a face recognition algorithm using the devised localization method for facial feature localization and Gabor feature vectors. It is observed through experiments that such a cascaded localization method based on both AAM and Gabor jet similarity is more robust than the localization method based on only Gabor jet similarity. Also, it is shown that the proposed face recognition algorithm using this devised localization method and Gabor feature vectors performs better than the conventional face recognition algorithm using Gabor jet similarity-based localization method and Gabor feature vectors like EBGM.

Keywords: Face Recognition, AAM, Gabor features, EBGM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2162
4889 Orchestra/Percussion Classification Algorithm for United Speech Audio Coding System

Authors: Yueming Wang, Rendong Ying, Sumxin Jiang, Peilin Liu

Abstract:

Unified Speech Audio Coding (USAC), the latest MPEG standardization for unified speech and audio coding, uses a speech/audio classification algorithm to distinguish speech and audio segments of the input signal. The quality of the recovered audio can be increased by well-designed orchestra/percussion classification and subsequent processing. However, owing to the shortcoming of the system, introducing an orchestra/percussion classification and modifying subsequent processing can enormously increase the quality of the recovered audio. This paper proposes an orchestra/percussion classification algorithm for the USAC system which only extracts 3 scales of Mel-Frequency Cepstral Coefficients (MFCCs) rather than traditional 13 scales of MFCCs and use Iterative Dichotomiser 3 (ID3) Decision Tree rather than other complex learning method, thus the proposed algorithm has lower computing complexity than most existing algorithms. Considering that frequent changing of attributes may lead to quality loss of the recovered audio signal, this paper also design a modified subsequent process to help the whole classification system reach an accurate rate as high as 97% which is comparable to classical 99%.

Keywords: ID3 Decision Tree, MFCC, Orchestra/Percussion Classification, USAC

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1635
4888 Javanese Character Recognition Using Hidden Markov Model

Authors: Anastasia Rita Widiarti, Phalita Nari Wastu

Abstract:

Hidden Markov Model (HMM) is a stochastic method which has been used in various signal processing and character recognition. This study proposes to use HMM to recognize Javanese characters from a number of different handwritings, whereby HMM is used to optimize the number of state and feature extraction. An 85.7 % accuracy is obtained as the best result in 16-stated vertical model using pure HMM. This initial result is satisfactory for prompting further research.

Keywords: Character recognition, off-line handwritingrecognition, Hidden Markov Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1944
4887 A New Recognition Scheme for Machine- Printed Arabic Texts based on Neural Networks

Authors: Z. Shaaban

Abstract:

This paper presents a new approach to tackle the problem of recognizing machine-printed Arabic texts. Because of the difficulty of recognizing cursive Arabic words, the text has to be normalized and segmented to be ready for the recognition stage. The new scheme for recognizing Arabic characters depends on multiple parallel neural networks classifier. The classifier has two phases. The first phase categories the input character into one of eight groups. The second phase classifies the character into one of the Arabic character classes in the group. The system achieved high recognition rate.

Keywords: Neural Networks, character recognition, feature extraction, multiple networks, Arabic text.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1438
4886 Automatic Product Identification Based on Deep-Learning Theory in an Assembly Line

Authors: Fidel Lòpez Saca, Carlos Avilés-Cruz, Miguel Magos-Rivera, José Antonio Lara-Chávez

Abstract:

Automated object recognition and identification systems are widely used throughout the world, particularly in assembly lines, where they perform quality control and automatic part selection tasks. This article presents the design and implementation of an object recognition system in an assembly line. The proposed shapes-color recognition system is based on deep learning theory in a specially designed convolutional network architecture. The used methodology involve stages such as: image capturing, color filtering, location of object mass centers, horizontal and vertical object boundaries, and object clipping. Once the objects are cut out, they are sent to a convolutional neural network, which automatically identifies the type of figure. The identification system works in real-time. The implementation was done on a Raspberry Pi 3 system and on a Jetson-Nano device. The proposal is used in an assembly course of bachelor’s degree in industrial engineering. The results presented include studying the efficiency of the recognition and processing time.

Keywords: Deep-learning, image classification, image identification, industrial engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 667