Search results for: Face synthesis and recognition
1658 Comparing Arabic and Latin Handwritten Digits Recognition Problems
Authors: Sherif Abdelazeem
Abstract:
A comparison between the performance of Latin and Arabic handwritten digits recognition problems is presented. The performance of ten different classifiers is tested on two similar Arabic and Latin handwritten digits databases. The analysis shows that Arabic handwritten digits recognition problem is easier than that of Latin digits. This is because the interclass difference in case of Latin digits is smaller than in Arabic digits and variances in writing Latin digits are larger. Consequently, weaker yet fast classifiers are expected to play more prominent role in Arabic handwritten digits recognition.Keywords: Handwritten recognition, Arabic recognition, Digits recognition, Document recognition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19861657 Implementation of a Multimodal Biometrics Recognition System with Combined Palm Print and Iris Features
Authors: Rabab M. Ramadan, Elaraby A. Elgallad
Abstract:
With extensive application, the performance of unimodal biometrics systems has to face a diversity of problems such as signal and background noise, distortion, and environment differences. Therefore, multimodal biometric systems are proposed to solve the above stated problems. This paper introduces a bimodal biometric recognition system based on the extracted features of the human palm print and iris. Palm print biometric is fairly a new evolving technology that is used to identify people by their palm features. The iris is a strong competitor together with face and fingerprints for presence in multimodal recognition systems. In this research, we introduced an algorithm to the combination of the palm and iris-extracted features using a texture-based descriptor, the Scale Invariant Feature Transform (SIFT). Since the feature sets are non-homogeneous as features of different biometric modalities are used, these features will be concatenated to form a single feature vector. Particle swarm optimization (PSO) is used as a feature selection technique to reduce the dimensionality of the feature. The proposed algorithm will be applied to the Institute of Technology of Delhi (IITD) database and its performance will be compared with various iris recognition algorithms found in the literature.
Keywords: Iris recognition, particle swarm optimization, feature extraction, feature selection, palm print, scale invariant feature transform.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 8851656 Bangla Vowel Characterization Based on Analysis by Synthesis
Authors: Syed Akhter Hossain, M. Lutfar Rahman, Farruk Ahmed
Abstract:
Bangla Vowel characterization determines the spectral properties of Bangla vowels for efficient synthesis as well as recognition of Bangla vowels. In this paper, Bangla vowels in isolated word have been analyzed based on speech production model within the framework of Analysis-by-Synthesis. This has led to the extraction of spectral parameters for the production model in order to produce different Bangla vowel sounds. The real and synthetic spectra are compared and a weighted square error has been computed along with the error in the formant bandwidths for efficient representation of Bangla vowels. The extracted features produced good representation of targeted Bangla vowel. Such a representation also plays essential role in low bit rate speech coding and vocoders.
Keywords: Speech, vowel, formant, synthesis, spectrum, LPC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 23711655 Make Up Flash: Web Application for the Improvement of Physical Appearance in Images Based on Recognition Methods
Authors: Stefania Arguelles Reyes, Octavio José Salcedo Parra, Alberto Acosta López
Abstract:
This paper presents a web application for the improvement of images through recognition. The web application is based on the analysis of picture-based recognition methods that allow an improvement on the physical appearance of people posting in social networks. The basis relies on the study of tools that can correct or improve some features of the face, with the help of a wide collection of user images taken as reference to build a facial profile. Automatic facial profiling can be achieved with a deeper study of the Object Detection Library. It was possible to improve the initial images with the help of MATLAB and its filtering functions. The user can have a direct interaction with the program and manually adjust his preferences.
Keywords: Application, MATLAB, make up, model, recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5711654 OCR/ICR Text Recognition Using ABBYY FineReader as an Example Text
Authors: A. R. Bagirzade, A. Sh. Najafova, S. M. Yessirkepova, E. S. Albert
Abstract:
This article describes a text recognition method based on Optical Character Recognition (OCR). The features of the OCR method were examined using the ABBYY FineReader program. It describes automatic text recognition in images. OCR is necessary because optical input devices can only transmit raster graphics as a result. Text recognition describes the task of recognizing letters shown as such, to identify and assign them an assigned numerical value in accordance with the usual text encoding (ASCII, Unicode). The peculiarity of this study conducted by the authors using the example of the ABBYY FineReader, was confirmed and shown in practice, the improvement of digital text recognition platforms developed by Electronic Publication.
Keywords: ABBYY FineReader system, algorithm symbol recognition, OCR/ICR techniques, recognition technologies.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7821653 Reliable Face Alignment Using Two-Stage AAM
Authors: Sunho Ki, Daehwan Kim, Seongwon Cho, Sun-Tae Chung, Jaemin Kim, Yun-Kwang Hong, Chang Joon Park, Dongmin Kwon, Minhee Kang, Yusung Kim, Younghan Yoon
Abstract:
AAM (active appearance model) has been successfully applied to face and facial feature localization. However, its performance is sensitive to initial parameter values. In this paper, we propose a two-stage AAM for robust face alignment, which first fits an inner face-AAM model to the inner facial feature points of the face and then localizes the whole face and facial features by optimizing the whole face-AAM model parameters. Experiments show that the proposed face alignment method using two-stage AAM is more reliable to the background and the head pose than the standard AAM-based face alignment method.Keywords: AAM, Face Alignment, Feature Extraction, PCA
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14771652 Analysis of Feature Space for a 2d/3d Vision based Emotion Recognition Method
Authors: Robert Niese, Ayoub Al-Hamadi, Bernd Michaelis
Abstract:
In modern human computer interaction systems (HCI), emotion recognition is becoming an imperative characteristic. The quest for effective and reliable emotion recognition in HCI has resulted in a need for better face detection, feature extraction and classification. In this paper we present results of feature space analysis after briefly explaining our fully automatic vision based emotion recognition method. We demonstrate the compactness of the feature space and show how the 2d/3d based method achieves superior features for the purpose of emotion classification. Also it is exposed that through feature normalization a widely person independent feature space is created. As a consequence, the classifier architecture has only a minor influence on the classification result. This is particularly elucidated with the help of confusion matrices. For this purpose advanced classification algorithms, such as Support Vector Machines and Artificial Neural Networks are employed, as well as the simple k- Nearest Neighbor classifier.Keywords: Facial expression analysis, Feature extraction, Image processing, Pattern Recognition, Application.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19231651 Accent Identification by Clustering and Scoring Formants
Authors: Dejan Stantic, Jun Jo
Abstract:
There have been significant improvements in automatic voice recognition technology. However, existing systems still face difficulties, particularly when used by non-native speakers with accents. In this paper we address a problem of identifying the English accented speech of speakers from different backgrounds. Once an accent is identified the speech recognition software can utilise training set from appropriate accent and therefore improve the efficiency and accuracy of the speech recognition system. We introduced the Q factor, which is defined by the sum of relationships between frequencies of the formants. Four different accents were considered and experimented for this research. A scoring method was introduced in order to effectively analyse accents. The proposed concept indicates that the accent could be identified by analysing their formants.Keywords: Accent Identification, Formants, Q Factor.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20911650 Curvelet Features with Mouth and Face Edge Ratios for Facial Expression Identification
Authors: S. Kherchaoui, A. Houacine
Abstract:
This paper presents a facial expression recognition system. It performs identification and classification of the seven basic expressions; happy, surprise, fear, disgust, sadness, anger, and neutral states. It consists of three main parts. The first one is the detection of a face and the corresponding facial features to extract the most expressive portion of the face, followed by a normalization of the region of interest. Then calculus of curvelet coefficients is performed with dimensionality reduction through principal component analysis. The resulting coefficients are combined with two ratios; mouth ratio and face edge ratio to constitute the whole feature vector. The third step is the classification of the emotional state using the SVM method in the feature space.
Keywords: Facial expression identification, curvelet coefficients, support vector machine (SVM).
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18421649 Automatic Lip Contour Tracking and Visual Character Recognition for Computerized Lip Reading
Authors: Harshit Mehrotra, Gaurav Agrawal, M.C. Srivastava
Abstract:
Computerized lip reading has been one of the most actively researched areas of computer vision in recent past because of its crime fighting potential and invariance to acoustic environment. However, several factors like fast speech, bad pronunciation, poor illumination, movement of face, moustaches and beards make lip reading difficult. In present work, we propose a solution for automatic lip contour tracking and recognizing letters of English language spoken by speakers using the information available from lip movements. Level set method is used for tracking lip contour using a contour velocity model and a feature vector of lip movements is then obtained. Character recognition is performed using modified k nearest neighbor algorithm which assigns more weight to nearer neighbors. The proposed system has been found to have accuracy of 73.3% for character recognition with speaker lip movements as the only input and without using any speech recognition system in parallel. The approach used in this work is found to significantly solve the purpose of lip reading when size of database is small.Keywords: Contour Velocity Model, Lip Contour Tracking, LipReading, Visual Character Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24011648 Rapid Study on Feature Extraction and Classification Models in Healthcare Applications
Authors: S. Sowmyayani
Abstract:
The advancement of computer-aided design helps the medical force and security force. Some applications include biometric recognition, elderly fall detection, face recognition, cancer recognition, tumor recognition, etc. This paper deals with different machine learning algorithms that are more generically used for any health care system. The most focused problems are classification and regression. With the rise of big data, machine learning has become particularly important for solving problems. Machine learning uses two types of techniques: supervised learning and unsupervised learning. The former trains a model on known input and output data and predicts future outputs. Classification and regression are supervised learning techniques. Unsupervised learning finds hidden patterns in input data. Clustering is one such unsupervised learning technique. The above-mentioned models are discussed briefly in this paper.
Keywords: Supervised learning, unsupervised learning, regression, neural network.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3461647 The Main Principles of Text-to-Speech Synthesis System
Authors: K.R. Aida–Zade, C. Ardil, A.M. Sharifova
Abstract:
In this paper, the main principles of text-to-speech synthesis system are presented. Associated problems which arise when developing speech synthesis system are described. Used approaches and their application in the speech synthesis systems for Azerbaijani language are shown.
Keywords: synthesis of Azerbaijani language, morphemes, phonemes, sounds, sentence, speech synthesizer, intonation, accent, pronunciation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 56521646 Persian Printed Numeral Characters Recognition Using Geometrical Central Moments and Fuzzy Min-Max Neural Network
Authors: Hamid Reza Boveiri
Abstract:
In this paper, a new proposed system for Persian printed numeral characters recognition with emphasis on representation and recognition stages is introduced. For the first time, in Persian optical character recognition, geometrical central moments as character image descriptor and fuzzy min-max neural network for Persian numeral character recognition has been used. Set of different experiments on binary images of regular, translated, rotated and scaled Persian numeral characters has been done and variety of results has been presented. The best result was 99.16% correct recognition demonstrating geometrical central moments and fuzzy min-max neural network are adequate for Persian printed numeral character recognition.Keywords: Fuzzy min-max neural network, geometrical centralmoments, optical character recognition, Persian digits recognition, Persian printed numeral characters recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 17251645 Using Different Aspects of the Signings for Appearance-based Sign Language Recognition
Authors: Morteza Zahedi, Philippe Dreuw, Thomas Deselaers, Hermann Ney
Abstract:
Sign language is used by the deaf and hard of hearing people for communication. Automatic sign language recognition is a challenging research area since sign language often is the only way of communication for the deaf people. Sign language includes different components of visual actions made by the signer using the hands, the face, and the torso, to convey his/her meaning. To use different aspects of signs, we combine the different groups of features which have been extracted from the image frames recorded directly by a stationary camera. We combine the features in two levels by employing three techniques. At the feature level, an early feature combination can be performed by concatenating and weighting different feature groups, or by concatenating feature groups over time and using LDA to choose the most discriminant elements. At the model level, a late fusion of differently trained models can be carried out by a log-linear model combination. In this paper, we investigate these three combination techniques in an automatic sign language recognition system and show that the recognition rate can be significantly improved.
Keywords: American sign language, appearance-based features, Feature combination, Sign language recognition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13991644 Mouse Pointer Tracking with Eyes
Authors: H. Mhamdi, N. Hamrouni, A. Temimi, M. Bouhlel
Abstract:
In this article, we expose our research work in Human-machine Interaction. The research consists in manipulating the workspace by eyes. We present some of our results, in particular the detection of eyes and the mouse actions recognition. Indeed, the handicaped user becomes able to interact with the machine in a more intuitive way in diverse applications and contexts. To test our application we have chooses to work in real time on videos captured by a camera placed in front of the user.Keywords: Computer vision, Face and Eyes Detection, Mouse pointer recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21291643 Reduced Dynamic Time Warping for Handwriting Recognition Based on Multidimensional Time Series of a Novel Pen Device
Authors: Muzaffar Bashir, Jürgen Kempf
Abstract:
The purpose of this paper is to present a Dynamic Time Warping technique which reduces significantly the data processing time and memory size of multi-dimensional time series sampled by the biometric smart pen device BiSP. The acquisition device is a novel ballpoint pen equipped with a diversity of sensors for monitoring the kinematics and dynamics of handwriting movement. The DTW algorithm has been applied for time series analysis of five different sensor channels providing pressure, acceleration and tilt data of the pen generated during handwriting on a paper pad. But the standard DTW has processing time and memory space problems which limit its practical use for online handwriting recognition. To face with this problem the DTW has been applied to the sum of the five sensor signals after an adequate down-sampling of the data. Preliminary results have shown that processing time and memory size could significantly be reduced without deterioration of performance in single character and word recognition. Further excellent accuracy in recognition was achieved which is mainly due to the reduced dynamic time warping RDTW technique and a novel pen device BiSP.Keywords: Biometric character recognition, biometric person authentication, biometric smart pen BiSP, dynamic time warping DTW, online-handwriting recognition, multidimensional time series.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24061642 Integrating Low and High Level Object Recognition Steps
Authors: András Barta, István Vajk
Abstract:
In pattern recognition applications the low level segmentation and the high level object recognition are generally considered as two separate steps. The paper presents a method that bridges the gap between the low and the high level object recognition. It is based on a Bayesian network representation and network propagation algorithm. At the low level it uses hierarchical structure of quadratic spline wavelet image bases. The method is demonstrated for a simple circuit diagram component identification problem.Keywords: Object recognition, Bayesian network, Wavelets, Document processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 14861641 Integrating Low and High Level Object Recognition Steps by Probabilistic Networks
Authors: András Barta, István Vajk
Abstract:
In pattern recognition applications the low level segmentation and the high level object recognition are generally considered as two separate steps. The paper presents a method that bridges the gap between the low and the high level object recognition. It is based on a Bayesian network representation and network propagation algorithm. At the low level it uses hierarchical structure of quadratic spline wavelet image bases. The method is demonstrated for a simple circuit diagram component identification problem.
Keywords: Object recognition, Bayesian network, Wavelets, Document processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 16711640 Recognition-based Segmentation in Persian Character Recognition
Authors: Mohsen Zand, Ahmadreza Naghsh Nilchi, S. Amirhassan Monadjemi
Abstract:
Optical character recognition of cursive scripts presents a number of challenging problems in both segmentation and recognition processes in different languages, including Persian. In order to overcome these problems, we use a newly developed Persian word segmentation method and a recognition-based segmentation technique to overcome its segmentation problems. This method is robust as well as flexible. It also increases the system-s tolerances to font variations. The implementation results of this method on a comprehensive database show a high degree of accuracy which meets the requirements for commercial use. Extended with a suitable pre and post-processing, the method offers a simple and fast framework to develop a full OCR system.Keywords: OCR, Persian, Recognition, Segmentation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18401639 Evaluation of Haar Cascade Classifiers Designed for Face Detection
Authors: R. Padilla, C. F. F. Costa Filho, M. G. F. Costa
Abstract:
In the past years a lot of effort has been made in the field of face detection. The human face contains important features that can be used by vision-based automated systems in order to identify and recognize individuals. Face location, the primary step of the vision-based automated systems, finds the face area in the input image. An accurate location of the face is still a challenging task. Viola-Jones framework has been widely used by researchers in order to detect the location of faces and objects in a given image. Face detection classifiers are shared by public communities, such as OpenCV. An evaluation of these classifiers will help researchers to choose the best classifier for their particular need. This work focuses of the evaluation of face detection classifiers minding facial landmarks.Keywords: Face datasets, face detection, facial landmarking, haar wavelets, Viola-Jones detectors.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 54101638 Face Detection using Variance based Haar-Like feature and SVM
Authors: Cuong Nguyen Khac, Ju H. Park, Ho-Youl Jung
Abstract:
This paper proposes a new approach to perform the problem of real-time face detection. The proposed method combines primitive Haar-Like feature and variance value to construct a new feature, so-called Variance based Haar-Like feature. Face in image can be represented with a small quantity of features using this new feature. We used SVM instead of AdaBoost for training and classification. We made a database containing 5,000 face samples and 10,000 non-face samples extracted from real images for learning purposed. The 5,000 face samples contain many images which have many differences of light conditions. And experiments showed that face detection system using Variance based Haar-Like feature and SVM can be much more efficient than face detection system using primitive Haar-Like feature and AdaBoost. We tested our method on two Face databases and one Non-Face database. We have obtained 96.17% of correct detection rate on YaleB face database, which is higher 4.21% than that of using primitive Haar-Like feature and AdaBoost.Keywords: AdaBoost, Haar-Like feature, SVM, variance, Variance based Haar-Like feature.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 37361637 Offline Handwritten Signature Recognition
Authors: Gulzar A. Khuwaja, Mohammad S. Laghari
Abstract:
Biometrics, which refers to identifying an individual based on his or her physiological or behavioral characteristics, has the capability to reliably distinguish between an authorized person and an imposter. Signature verification systems can be categorized as offline (static) and online (dynamic). This paper presents a neural network based recognition of offline handwritten signatures system that is trained with low-resolution scanned signature images.Keywords: Pattern Recognition, Computer Vision, AdaptiveClassification, Handwritten Signature Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 29031636 Comparing Emotion Recognition from Voice and Facial Data Using Time Invariant Features
Authors: Vesna Kirandziska, Nevena Ackovska, Ana Madevska Bogdanova
Abstract:
The problem of emotion recognition is a challenging problem. It is still an open problem from the aspect of both intelligent systems and psychology. In this paper, both voice features and facial features are used for building an emotion recognition system. A Support Vector Machine classifiers are built by using raw data from video recordings. In this paper, the results obtained for the emotion recognition are given, and a discussion about the validity and the expressiveness of different emotions is presented. A comparison between the classifiers build from facial data only, voice data only and from the combination of both data is made here. The need for a better combination of the information from facial expression and voice data is argued.
Keywords: Emotion recognition, facial recognition, signal processing, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 20191635 A Novel Approach to Persian Online Hand Writing Recognition
Authors: Ramin Halavati, Mansour Jamzad, Mahdieh Soleymani
Abstract:
Persian (Farsi) script is totally cursive and each character is written in several different forms depending on its former and later characters in the word. These complexities make automatic handwriting recognition of Persian a very hard problem and there are few contributions trying to work it out. This paper presents a novel practical approach to online recognition of Persian handwriting which is based on representation of inputs and patterns with very simple visual features and comparison of these simple terms. This recognition approach is tested over a set of Persian words and the results have been quite acceptable when the possible words where unknown and they were almost all correct in cases that the words where chosen from a prespecified list.
Keywords: Image Processing, Pattern Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 13301634 Off-Line Signature Recognition Based On Angle Features and GRNN Neural Networks
Authors: Laila Y. Fannas, Ahmed Y. Ben Sasi
Abstract:
This research presents a handwritten signature recognition based on angle feature vector using Artificial Neural Network (ANN). Each signature image will be represented by an Angle vector. The feature vector will constitute the input to the ANN. The collection of signature images will be divided into two sets. One set will be used for training the ANN in a supervised fashion. The other set which is never seen by the ANN will be used for testing. After training, the ANN will be tested for recognition of the signature. When the signature is classified correctly, it is considered correct recognition otherwise it is a failure.
Keywords: Signature Recognition, Artificial Neural Network, Angle Features.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 24961633 Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control
Authors: Van Nhan Nguyen, Harald Holone
Abstract:
Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.Keywords: Automatic Speech Recognition, ASR, Air Traffic Control, ATC.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 40431632 Analysis of Combined Use of NN and MFCC for Speech Recognition
Authors: Safdar Tanweer, Abdul Mobin, Afshar Alam
Abstract:
The performance and analysis of speech recognition system is illustrated in this paper. An approach to recognize the English word corresponding to digit (0-9) spoken by 2 different speakers is captured in noise free environment. For feature extraction, speech Mel frequency cepstral coefficients (MFCC) has been used which gives a set of feature vectors from recorded speech samples. Neural network model is used to enhance the recognition performance. Feed forward neural network with back propagation algorithm model is used. However other speech recognition techniques such as HMM, DTW exist. All experiments are carried out on Matlab.
Keywords: Speech Recognition, MFCC, Neural Network, classifier.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 32681631 Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems
Authors: К. R. Aida–Zade, C. Ardil, S. S. Rustamov
Abstract:
Statement of the automatic speech recognition problem, the assignment of speech recognition and the application fields are shown in the paper. At the same time as Azerbaijan speech, the establishment principles of speech recognition system and the problems arising in the system are investigated. The computing algorithms of speech features, being the main part of speech recognition system, are analyzed. From this point of view, the determination algorithms of Mel Frequency Cepstral Coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients expressing the basic speech features are developed. Combined use of cepstrals of MFCC and LPC in speech recognition system is suggested to improve the reliability of speech recognition system. To this end, the recognition system is divided into MFCC and LPC-based recognition subsystems. The training and recognition processes are realized in both subsystems separately, and recognition system gets the decision being the same results of each subsystems. This results in decrease of error rate during recognition. The training and recognition processes are realized by artificial neural networks in the automatic speech recognition system. The neural networks are trained by the conjugate gradient method. In the paper the problems observed by the number of speech features at training the neural networks of MFCC and LPC-based speech recognition subsystems are investigated. The variety of results of neural networks trained from different initial points in training process is analyzed. Methodology of combined use of neural networks trained from different initial points in speech recognition system is suggested to improve the reliability of recognition system and increase the recognition quality, and obtained practical results are shown.Keywords: Speech recognition, cepstral analysis, Voice activation detection algorithm, Mel Frequency Cepstral Coefficients, features of speech, Cepstral Mean Subtraction, neural networks, Linear Predictive Coding.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 9141630 Real Time Detection, Tracking and Recognition of Medication Intake
Authors: H. H. Huynh, J. Meunier, J.Sequeira, M.Daniel
Abstract:
In this paper, the detection and tracking of face, mouth, hands and medication bottles in the context of medication intake monitoring with a camera is presented. This is aimed at recognizing medication intake for elderly in their home setting to avoid an inappropriate use. Background subtraction is used to isolate moving objects, and then, skin and bottle segmentations are done in the RGB normalized color space. We use a minimum displacement distance criterion to track skin color regions and the R/G ratio to detect the mouth. The color-labeled medication bottles are simply tracked based on the color space distance to their mean color vector. For the recognition of medication intake, we propose a three-level hierarchal approach, which uses activity-patterns to recognize the normal medication intake activity. The proposed method was tested with three persons, with different medication intake scenarios, and gave an overall precision of over 98%.
Keywords: Activity recognition, background subtraction, tracking, medication intake, video surveillance
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 19861629 Recognizing an Individual, Their Topic of Conversation, and Cultural Background from 3D Body Movement
Authors: Gheida J. Shahrour, Martin J. Russell
Abstract:
The 3D body movement signals captured during human-human conversation include clues not only to the content of people’s communication but also to their culture and personality. This paper is concerned with automatic extraction of this information from body movement signals. For the purpose of this research, we collected a novel corpus from 27 subjects, arranged them into groups according to their culture. We arranged each group into pairs and each pair communicated with each other about different topics. A state-of-art recognition system is applied to the problems of person, culture, and topic recognition. We borrowed modeling, classification, and normalization techniques from speech recognition. We used Gaussian Mixture Modeling (GMM) as the main technique for building our three systems, obtaining 77.78%, 55.47%, and 39.06% from the person, culture, and topic recognition systems respectively. In addition, we combined the above GMM systems with Support Vector Machines (SVM) to obtain 85.42%, 62.50%, and 40.63% accuracy for person, culture, and topic recognition respectively. Although direct comparison among these three recognition systems is difficult, it seems that our person recognition system performs best for both GMM and GMM-SVM, suggesting that intersubject differences (i.e. subject’s personality traits) are a major source of variation. When removing these traits from culture and topic recognition systems using the Nuisance Attribute Projection (NAP) and the Intersession Variability Compensation (ISVC) techniques, we obtained 73.44% and 46.09% accuracy from culture and topic recognition systems respectively.
Keywords: Person Recognition, Topic Recognition, Culture Recognition, 3D Body Movement Signals, Variability Compensation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2174