Search results for: speech recognition performance
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 14737

Search results for: speech recognition performance

14617 On the Implementation of The Pulse Coupled Neural Network (PCNN) in the Vision of Cognitive Systems

Authors: Hala Zaghloul, Taymoor Nazmy

Abstract:

One of the great challenges of the 21st century is to build a robot that can perceive and act within its environment and communicate with people, while also exhibiting the cognitive capabilities that lead to performance like that of people. The Pulse Coupled Neural Network, PCNN, is a relative new ANN model that derived from a neural mammal model with a great potential in the area of image processing as well as target recognition, feature extraction, speech recognition, combinatorial optimization, compressed encoding. PCNN has unique feature among other types of neural network, which make it a candid to be an important approach for perceiving in cognitive systems. This work show and emphasis on the potentials of PCNN to perform different tasks related to image processing. The main drawback or the obstacle that prevent the direct implementation of such technique, is the need to find away to control the PCNN parameters toward perform a specific task. This paper will evaluate the performance of PCNN standard model for processing images with different properties, and select the important parameters that give a significant result, also, the approaches towards find a way for the adaptation of the PCNN parameters to perform a specific task.

Keywords: cognitive system, image processing, segmentation, PCNN kernels

Procedia PDF Downloads 280
14616 Using Maximization Entropy in Developing a Filipino Phonetically Balanced Wordlist for a Phoneme-Level Speech Recognition System

Authors: John Lorenzo Bautista, Yoon-Joong Kim

Abstract:

In this paper, a set of Filipino Phonetically Balanced Word list consisting of 250 words (PBW250) were constructed for a phoneme-level ASR system for the Filipino language. The Entropy Maximization is used to obtain phonological balance in the list. Entropy of phonemes in a word is maximized, providing an optimal balance in each word’s phonological distribution using the Add-Delete Method (PBW algorithm) and is compared to the modified PBW algorithm implemented in a dynamic algorithm approach to obtain optimization. The gained entropy score of 4.2791 and 4.2902 for the PBW and modified algorithm respectively. The PBW250 was recorded by 40 respondents, each with 2 sets data. Recordings from 30 respondents were trained to produce an acoustic model that were tested using recordings from 10 respondents using the HMM Toolkit (HTK). The results of test gave the maximum accuracy rate of 97.77% for a speaker dependent test and 89.36% for a speaker independent test.

Keywords: entropy maximization, Filipino language, Hidden Markov Model, phonetically balanced words, speech recognition

Procedia PDF Downloads 457
14615 Performance Analysis of VoIP Coders for Different Modulations Under Pervasive Environment

Authors: Jasbinder Singh, Harjit Pal Singh, S. A. Khan

Abstract:

The work, in this paper, presents the comparison of encoded speech signals by different VoIP narrow-band and wide-band codecs for different modulation schemes. The simulation results indicate that codec has an impact on the speech quality and also effected by modulation schemes.

Keywords: VoIP, coders, modulations, BER, MOS

Procedia PDF Downloads 516
14614 Subband Coding and Glottal Closure Instant (GCI) Using SEDREAMS Algorithm

Authors: Harisudha Kuresan, Dhanalakshmi Samiappan, T. Rama Rao

Abstract:

In modern telecommunication applications, Glottal Closure Instants location finding is important and is directly evaluated from the speech waveform. Here, we study the GCI using Speech Event Detection using Residual Excitation and the Mean Based Signal (SEDREAMS) algorithm. Speech coding uses parameter estimation using audio signal processing techniques to model the speech signal combined with generic data compression algorithms to represent the resulting modeled in a compact bit stream. This paper proposes a sub-band coder SBC, which is a type of transform coding and its performance for GCI detection using SEDREAMS are evaluated. In SBCs code in the speech signal is divided into two or more frequency bands and each of these sub-band signal is coded individually. The sub-bands after being processed are recombined to form the output signal, whose bandwidth covers the whole frequency spectrum. Then the signal is decomposed into low and high-frequency components and decimation and interpolation in frequency domain are performed. The proposed structure significantly reduces error, and precise locations of Glottal Closure Instants (GCIs) are found using SEDREAMS algorithm.

Keywords: SEDREAMS, GCI, SBC, GOI

Procedia PDF Downloads 356
14613 Detecting Characters as Objects Towards Character Recognition on Licence Plates

Authors: Alden Boby, Dane Brown, James Connan

Abstract:

Character recognition is a well-researched topic across disciplines. Regardless, creating a solution that can cater to multiple situations is still challenging. Vehicle licence plates lack an international standard, meaning that different countries and regions have their own licence plate format. A problem that arises from this is that the typefaces and designs from different regions make it difficult to create a solution that can cater to a wide range of licence plates. The main issue concerning detection is the character recognition stage. This paper aims to create an object detection-based character recognition model trained on a custom dataset that consists of typefaces of licence plates from various regions. Given that characters have featured consistently maintained across an array of fonts, YOLO can be trained to recognise characters based on these features, which may provide better performance than OCR methods such as Tesseract OCR.

Keywords: computer vision, character recognition, licence plate recognition, object detection

Procedia PDF Downloads 121
14612 Prosody Generation in Neutral Speech Storytelling Application Using Tilt Model

Authors: Manjare Chandraprabha A., S. D. Shirbahadurkar, Manjare Anil S., Paithne Ajay N.

Abstract:

This paper proposes Intonation Modeling for Prosody generation in Neutral speech for Marathi (language spoken in Maharashtra, India) story telling applications. Nowadays audio story telling devices are very eminent for children. In this paper, we proposed tilt model for stressed words in Marathi for speech modification. Tilt model predicts modification in tone of neutral speech. GMM is used to identify stressed words for modification.

Keywords: tilt model, fundamental frequency, statistical parametric speech synthesis, GMM

Procedia PDF Downloads 392
14611 The Importance of Right Speech in Buddhism and Its Relevance Today

Authors: Gautam Sharda

Abstract:

The concept of right speech is the third stage of the noble eightfold path as prescribed by the Buddha and followed by millions of practicing Buddhists. The Buddha lays a lot of importance on the notion of right speech (Samma Vacca). In the Angutara Nikaya, the Buddha mentioned what constitutes right speech, which is basically four kinds of abstentions; namely abstaining from false speech, abstaining from slanderous speech, abstaining from harsh or hateful speech and abstaining from idle chatter. The Buddha gives reasons in support of his view as to why abstaining from these four kinds of speeches is favourable not only for maintaining the peace and equanimity within an individual but also within a society. It is a known fact that when we say something harsh or slanderous to others, it eventually affects our individual peace of mind too. We also know about the many examples of hate speeches which have led to senseless cases of violence and which are well documented within our country and the world. Also, indulging in false speech is not a healthy sign for individuals within a group as this kind of a social group which is based on falsities and lies cannot really survive for long and will eventually lead to chaos. Buddha also told us to refrain from idle chatter or gossip as generally we have seen that idle chatter or gossip does more harm than any good to the individual and the society. Hence, if most of us actually inculcate this third stage (namely, right speech) of the noble eightfold path of the Buddha in our daily life, it would be highly beneficial both for the individual and for the harmony of the society.

Keywords: Buddhism, speech, individual, society

Procedia PDF Downloads 264
14610 Video Based Automatic License Plate Recognition System

Authors: Ali Ganoun, Wesam Algablawi, Wasim BenAnaif

Abstract:

Video based traffic surveillance based on License Plate Recognition (LPR) system is an essential part for any intelligent traffic management system. The LPR system utilizes computer vision and pattern recognition technologies to obtain traffic and road information by detecting and recognizing vehicles based on their license plates. Generally, the video based LPR system is a challenging area of research due to the variety of environmental conditions. The LPR systems used in a wide range of commercial applications such as collision warning systems, finding stolen cars, controlling access to car parks and automatic congestion charge systems. This paper presents an automatic LPR system of Libyan license plate. The performance of the proposed system is evaluated with three video sequences.

Keywords: license plate recognition, localization, segmentation, recognition

Procedia PDF Downloads 464
14609 Facial Recognition on the Basis of Facial Fragments

Authors: Tetyana Baydyk, Ernst Kussul, Sandra Bonilla Meza

Abstract:

There are many articles that attempt to establish the role of different facial fragments in face recognition. Various approaches are used to estimate this role. Frequently, authors calculate the entropy corresponding to the fragment. This approach can only give approximate estimation. In this paper, we propose to use a more direct measure of the importance of different fragments for face recognition. We propose to select a recognition method and a face database and experimentally investigate the recognition rate using different fragments of faces. We present two such experiments in the paper. We selected the PCNC neural classifier as a method for face recognition and parts of the LFW (Labeled Faces in the Wild) face database as training and testing sets. The recognition rate of the best experiment is comparable with the recognition rate obtained using the whole face.

Keywords: face recognition, labeled faces in the wild (LFW) database, random local descriptor (RLD), random features

Procedia PDF Downloads 360
14608 ICanny: CNN Modulation Recognition Algorithm

Authors: Jingpeng Gao, Xinrui Mao, Zhibin Deng

Abstract:

Aiming at the low recognition rate on the composite signal modulation in low signal to noise ratio (SNR), this paper proposes a modulation recognition algorithm based on ICanny-CNN. Firstly, the radar signal is transformed into the time-frequency image by Choi-Williams Distribution (CWD). Secondly, we propose an image processing algorithm using the Guided Filter and the threshold selection method, which is combined with the hole filling and the mask operation. Finally, the shallow convolutional neural network (CNN) is combined with the idea of the depth-wise convolution (Dw Conv) and the point-wise convolution (Pw Conv). The proposed CNN is designed to complete image classification and realize modulation recognition of radar signal. The simulation results show that the proposed algorithm can reach 90.83% at 0dB and 71.52% at -8dB. Therefore, the proposed algorithm has a good classification and anti-noise performance in radar signal modulation recognition and other fields.

Keywords: modulation recognition, image processing, composite signal, improved Canny algorithm

Procedia PDF Downloads 191
14607 Application of the Bionic Wavelet Transform and Psycho-Acoustic Model for Speech Compression

Authors: Chafik Barnoussi, Mourad Talbi, Adnane Cherif

Abstract:

In this paper we propose a new speech compression system based on the application of the Bionic Wavelet Transform (BWT) combined with the psychoacoustic model. This compression system is a modified version of the compression system using a MDCT (Modified Discrete Cosine Transform) filter banks of 32 filters each and the psychoacoustic model. This modification consists in replacing the banks of the MDCT filter banks by the bionic wavelet coefficients which are obtained from the application of the BWT to the speech signal to be compressed. These two methods are evaluated and compared with each other by computing bits before and bits after compression. They are tested on different speech signals and the obtained simulation results show that the proposed technique outperforms the second technique and this in term of compressed file size. In term of SNR, PSNR and NRMSE, the outputs speech signals of the proposed compression system are with acceptable quality. In term of PESQ and speech signal intelligibility, the proposed speech compression technique permits to obtain reconstructed speech signals with good quality.

Keywords: speech compression, bionic wavelet transform, filterbanks, psychoacoustic model

Procedia PDF Downloads 384
14606 Hybrid SVM/DBN Model for Arabic Isolated Words Recognition

Authors: Elyes Zarrouk, Yassine Benayed, Faiez Gargouri

Abstract:

This paper presents a new hybrid model for isolated Arabic words recognition. To do this, we apply Support Vectors Machine (SVM) as an estimator of posterior probabilities within the Dynamic Bayesian networks (DBN). This paper deals a comparative study between DBN and SVM/DBN systems for multi-dialect isolated Arabic words. Performance using SVM/DBN is found to exceed that of DBNs trained on an identical task, giving higher recognition accuracy for four different Arabic dialects. In fact, the average of recognition rates for the four dialects with SVM/DBN was 87.67% while 83.01% with DBN.

Keywords: dynamic Bayesian networks, hybrid models, supports vectors machine, Arabic isolated words

Procedia PDF Downloads 560
14605 Two Concurrent Convolution Neural Networks TC*CNN Model for Face Recognition Using Edge

Authors: T. Alghamdi, G. Alaghband

Abstract:

In this paper we develop a model that couples Two Concurrent Convolution Neural Network with different filters (TC*CNN) for face recognition and compare its performance to an existing sequential CNN (base model). We also test and compare the quality and performance of the models on three datasets with various levels of complexity (easy, moderate, and difficult) and show that for the most complex datasets, edges will produce the most accurate and efficient results. We further show that in such cases while Support Vector Machine (SVM) models are fast, they do not produce accurate results.

Keywords: Convolution Neural Network, Edges, Face Recognition , Support Vector Machine.

Procedia PDF Downloads 153
14604 A Machine Learning Based Method to Detect System Failure in Resource Constrained Environment

Authors: Payel Datta, Abhishek Das, Abhishek Roychoudhury, Dhiman Chattopadhyay, Tanushyam Chattopadhyay

Abstract:

Machine learning (ML) and deep learning (DL) is most predominantly used in image/video processing, natural language processing (NLP), audio and speech recognition but not that much used in system performance evaluation. In this paper, authors are going to describe the architecture of an abstraction layer constructed using ML/DL to detect the system failure. This proposed system is used to detect the system failure by evaluating the performance metrics of an IoT service deployment under constrained infrastructure environment. This system has been tested on the manually annotated data set containing different metrics of the system, like number of threads, throughput, average response time, CPU usage, memory usage, network input/output captured in different hardware environments like edge (atom based gateway) and cloud (AWS EC2). The main challenge of developing such system is that the accuracy of classification should be 100% as the error in the system has an impact on the degradation of the service performance and thus consequently affect the reliability and high availability which is mandatory for an IoT system. Proposed ML/DL classifiers work with 100% accuracy for the data set of nearly 4,000 samples captured within the organization.

Keywords: machine learning, system performance, performance metrics, IoT, edge

Procedia PDF Downloads 195
14603 The Language Use of Middle Eastern Freedom Activists' Speeches: A Gender Perspective

Authors: Sulistyaningtyas

Abstract:

Examining the role of Middle Eastern freedom activists’ speech based on gender perspective is considered noteworthy because the society in the Middle East is patriarchal. This research aims to examine the language use of the Middle Eastern freedom activists’ speeches through gender perspective. The data sources are from male and female Middle Eastern freedom activists’ speech videos. In analyzing the data, the theories employed are about Language Style from Gender Perspective and The Language for Speech. The result reveals that there are sets of spoken language differences between male and female speakers. In using the language for speech, both male and female speakers produce metaphor, euphemism, the ‘rule of three’, parallelism, and pronouns in random frequency of production, which cannot be separated by genders. Moreover, it cannot be concluded that one gender is more potential than the other to influence the audience in delivering speech. There are other factors, particularly non-verbal factors, existing to give impacts on how a speech can influence the audience.

Keywords: gender perspective, language use, Middle Eastern freedom activists, speech

Procedia PDF Downloads 421
14602 Considering Cultural and Linguistic Variables When Working as a Speech-Language Pathologist with Multicultural Students

Authors: Gabriela Smeckova

Abstract:

The entire world is becoming more and more diverse. The reasons why people migrate are different and unique for each family /individual. Professionals delivering services (including speech-language pathologists) must be prepared to work with clients coming from different cultural and/or linguistic backgrounds. Well-educated speech-language pathologists will consider many factors when delivering services. Some of them will be discussed during the presentation (language spoken, beliefs about health care and disabilities, reasons for immigration, etc.). The communication styles of the client can be different than the styles of the speech-language pathologist. The goal is to become culturally responsive in service delivery.

Keywords: culture, cultural competence, culturallly responsive practices, speech-language pathologist, cultural and linguistical variables, communication styles

Procedia PDF Downloads 76
14601 Effect of Noise Reduction Algorithms on Temporal Splitting of Speech Signal to Improve Speech Perception for Binaural Hearing Aids

Authors: Rajani S. Pujar, Pandurangarao N. Kulkarni

Abstract:

Increased temporal masking affects the speech perception in persons with sensorineural hearing impairment especially under adverse listening conditions. This paper presents a cascaded scheme, which employs a noise reduction algorithm as well as temporal splitting of the speech signal. Earlier investigations have shown that by splitting the speech temporally and presenting alternate segments to the two ears help in reducing the effect of temporal masking. In this technique, the speech signal is processed by two fading functions, complementary to each other, and presented to left and right ears for binaural dichotic presentation. In the present study, half cosine signal is used as a fading function with crossover gain of 6 dB for the perceptual balance of loudness. Temporal splitting is combined with noise reduction algorithm to improve speech perception in the background noise. Two noise reduction schemes, namely spectral subtraction and Wiener filter are used. Listening tests were conducted on six normal-hearing subjects, with sensorineural loss simulated by adding broadband noise to the speech signal at different signal-to-noise ratios (∞, 3, 0, and -3 dB). Objective evaluation using PESQ was also carried out. The MOS score for VCV syllable /asha/ for SNR values of ∞, 3, 0, and -3 dB were 5, 4.46, 4.4 and 4.05 respectively, while the corresponding MOS scores for unprocessed speech were 5, 1.2, 0.9 and 0.65, indicating significant improvement in the perceived speech quality for the proposed scheme compared to the unprocessed speech.

Keywords: MOS, PESQ, spectral subtraction, temporal splitting, wiener filter

Procedia PDF Downloads 327
14600 An Integrated Cognitive Performance Evaluation Framework for Urban Search and Rescue Applications

Authors: Antonio D. Lee, Steven X. Jiang

Abstract:

A variety of techniques and methods are available to evaluate cognitive performance in Urban Search and Rescue (USAR) applications. However, traditional cognitive performance evaluation techniques typically incorporate either the conscious or systematic aspect, failing to take into consideration the subconscious or intuitive aspect. This leads to incomplete measures and produces ineffective designs. In order to fill the gaps in past research, this study developed a theoretical framework to facilitate the integration of situation awareness (SA) and intuitive pattern recognition (IPR) to enhance the cognitive performance representation in USAR applications. This framework provides guidance to integrate both SA and IPR in order to evaluate the cognitive performance of the USAR responders. The application of this framework will help improve the system design.

Keywords: cognitive performance, intuitive pattern recognition, situation awareness, urban search and rescue

Procedia PDF Downloads 328
14599 Named Entity Recognition System for Tigrinya Language

Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager

Abstract:

The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.

Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF

Procedia PDF Downloads 130
14598 The Complaint Speech Act Set Produced by Arab Students in the UAE

Authors: Tanju Deveci

Abstract:

It appears that the speech act of complaint has not received as much attention as other speech acts. However, the face-threatening nature of this speech act requires a special attention in multicultural contexts in particular. The teaching context in the UAE universities, where a big majority of teaching staff comes from other cultures, requires investigations into this speech act in order to improve communication between students and faculty. This session will outline the results of a study conducted with this purpose. The realization of complaints by Freshman English students in Communication courses at Petroleum Institute was investigated to identify communication patterns that seem to cause a strain. Data were collected using a role-play between a teacher and students, and a judgment scale completed by two of the instructors in the Communications Department. The initial findings reveal that the students had difficulty putting their case, produced the speech act of criticism along with a complaint and that they produced both requests and demands as candidate solutions. The judgement scales revealed that the students’ attitude was not appropriate most of the time and that the judges would behave differently from students. It is concluded that speech acts, in general, and complaint, in particular, need to be taught to learners explicitly to improve interpersonal communication in multicultural societies. Some teaching ideas are provided to help increase foreign language learners’ sociolinguistic competence.

Keywords: speech act, complaint, pragmatics, sociolinguistics, language teaching

Procedia PDF Downloads 507
14597 Vision-Based Hand Segmentation Techniques for Human-Computer Interaction

Authors: M. Jebali, M. Jemni

Abstract:

This work is the part of vision based hand gesture recognition system for Natural Human Computer Interface. Hand tracking and segmentation are the primary steps for any hand gesture recognition system. The aim of this paper is to develop robust and efficient hand segmentation algorithm such as an input to another system which attempt to bring the HCI performance nearby the human-human interaction, by modeling an intelligent sign language recognition system based on prediction in the context of dialogue between the system (avatar) and the interlocutor. For the purpose of hand segmentation, an overcoming occlusion approach has been proposed for superior results for detection of hand from an image.

Keywords: HCI, sign language recognition, object tracking, hand segmentation

Procedia PDF Downloads 412
14596 Evaluation of Features Extraction Algorithms for a Real-Time Isolated Word Recognition System

Authors: Tomyslav Sledevič, Artūras Serackis, Gintautas Tamulevičius, Dalius Navakauskas

Abstract:

This paper presents a comparative evaluation of features extraction algorithm for a real-time isolated word recognition system based on FPGA. The Mel-frequency cepstral, linear frequency cepstral, linear predictive and their cepstral coefficients were implemented in hardware/software design. The proposed system was investigated in the speaker-dependent mode for 100 different Lithuanian words. The robustness of features extraction algorithms was tested recognizing the speech records at different signals to noise rates. The experiments on clean records show highest accuracy for Mel-frequency cepstral and linear frequency cepstral coefficients. For records with 15 dB signal to noise rate the linear predictive cepstral coefficients give best result. The hard and soft part of the system is clocked on 50 MHz and 100 MHz accordingly. For the classification purpose, the pipelined dynamic time warping core was implemented. The proposed word recognition system satisfies the real-time requirements and is suitable for applications in embedded systems.

Keywords: isolated word recognition, features extraction, MFCC, LFCC, LPCC, LPC, FPGA, DTW

Procedia PDF Downloads 495
14595 Investigation of New Gait Representations for Improving Gait Recognition

Authors: Chirawat Wattanapanich, Hong Wei

Abstract:

This study presents new gait representations for improving gait recognition accuracy on cross gait appearances, such as normal walking, wearing a coat and carrying a bag. Based on the Gait Energy Image (GEI), two ideas are implemented to generate new gait representations. One is to append lower knee regions to the original GEI, and the other is to apply convolutional operations to the GEI and its variants. A set of new gait representations are created and used for training multi-class Support Vector Machines (SVMs). Tests are conducted on the CASIA dataset B. Various combinations of the gait representations with different convolutional kernel size and different numbers of kernels used in the convolutional processes are examined. Both the entire images as features and reduced dimensional features by Principal Component Analysis (PCA) are tested in gait recognition. Interestingly, both new techniques, appending the lower knee regions to the original GEI and convolutional GEI, can significantly contribute to the performance improvement in the gait recognition. The experimental results have shown that the average recognition rate can be improved from 75.65% to 87.50%.

Keywords: convolutional image, lower knee, gait

Procedia PDF Downloads 202
14594 On Overcoming Common Oral Speech Problems through Authentic Films

Authors: Tamara Matevosyan

Abstract:

The present paper discusses the main problems that students face while developing oral skills through authentic films. It states that special attention should be paid not only to the study of verbal speech but also to non-verbal communication. Authentic films serve as an important tool to understand both native speaker’s gestures and their culture of pausing while speaking. Various phonetic difficulties causing phonetic interference in actual speech are covered in the paper emphasizing the role of authentic films in overcoming them.

Keywords: compressive speech, filled pauses, unfilled pauses, pausing culture

Procedia PDF Downloads 353
14593 Gesture-Controlled Interface Using Computer Vision and Python

Authors: Vedant Vardhan Rathour, Anant Agrawal

Abstract:

The project aims to provide a touchless, intuitive interface for human-computer interaction, enabling users to control their computer using hand gestures and voice commands. The system leverages advanced computer vision techniques using the MediaPipe framework and OpenCV to detect and interpret real time hand gestures, transforming them into mouse actions such as clicking, dragging, and scrolling. Additionally, the integration of a voice assistant powered by the Speech Recognition library allows for seamless execution of tasks like web searches, location navigation and gesture control on the system through voice commands.

Keywords: gesture recognition, hand tracking, machine learning, convolutional neural networks

Procedia PDF Downloads 12
14592 Gender Recognition with Deep Belief Networks

Authors: Xiaoqi Jia, Qing Zhu, Hao Zhang, Su Yang

Abstract:

A gender recognition system is able to tell the gender of the given person through a few of frontal facial images. An effective gender recognition approach enables to improve the performance of many other applications, including security monitoring, human-computer interaction, image or video retrieval and so on. In this paper, we present an effective method for gender classification task in frontal facial images based on deep belief networks (DBNs), which can pre-train model and improve accuracy a little bit. Our experiments have shown that the pre-training method with DBNs for gender classification task is feasible and achieves a little improvement of accuracy on FERET and CAS-PEAL-R1 facial datasets.

Keywords: gender recognition, beep belief net-works, semi-supervised learning, greedy-layer wise RBMs

Procedia PDF Downloads 453
14591 ViraPart: A Text Refinement Framework for Automatic Speech Recognition and Natural Language Processing Tasks in Persian

Authors: Narges Farokhshad, Milad Molazadeh, Saman Jamalabbasi, Hamed Babaei Giglou, Saeed Bibak

Abstract:

The Persian language is an inflectional subject-object-verb language. This fact makes Persian a more uncertain language. However, using techniques such as Zero-Width Non-Joiner (ZWNJ) recognition, punctuation restoration, and Persian Ezafe construction will lead us to a more understandable and precise language. In most of the works in Persian, these techniques are addressed individually. Despite that, we believe that for text refinement in Persian, all of these tasks are necessary. In this work, we proposed a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. First, used the BERT variant for Persian followed by a classifier layer for classification procedures. Next, we combined models outputs to output cleartext. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively. Experimental results show that our proposed approach is very effective in text refinement for the Persian language.

Keywords: Persian Ezafe, punctuation, ZWNJ, NLP, ParsBERT, transformers

Procedia PDF Downloads 215
14590 A Weighted Approach to Unconstrained Iris Recognition

Authors: Yao-Hong Tsai

Abstract:

This paper presents a weighted approach to unconstrained iris recognition. Nowadays, commercial systems are usually characterized by strong acquisition constraints based on the subject’s cooperation. However, it is not always achievable for real scenarios in our daily life. Researchers have been focused on reducing these constraints and maintaining the performance of the system by new techniques at the same time. With large variation in the environment, there are two main improvements to develop the proposed iris recognition system. For solving extremely uneven lighting condition, statistic based illumination normalization is first used on eye region to increase the accuracy of iris feature. The detection of the iris image is based on Adaboost algorithm. Secondly, the weighted approach is designed by Gaussian functions according to the distance to the center of the iris. Furthermore, local binary pattern (LBP) histogram is then applied to texture classification with the weight. Experiment showed that the proposed system provided users a more flexible and feasible way to interact with the verification system through iris recognition.

Keywords: authentication, iris recognition, adaboost, local binary pattern

Procedia PDF Downloads 224
14589 Efficient Feature Fusion for Noise Iris in Unconstrained Environment

Authors: Yao-Hong Tsai

Abstract:

This paper presents an efficient fusion algorithm for iris images to generate stable feature for recognition in unconstrained environment. Recently, iris recognition systems are focused on real scenarios in our daily life without the subject’s cooperation. Under large variation in the environment, the objective of this paper is to combine information from multiple images of the same iris. The result of image fusion is a new image which is more stable for further iris recognition than each original noise iris image. A wavelet-based approach for multi-resolution image fusion is applied in the fusion process. The detection of the iris image is based on Adaboost algorithm and then local binary pattern (LBP) histogram is then applied to texture classification with the weighting scheme. Experiment showed that the generated features from the proposed fusion algorithm can improve the performance for verification system through iris recognition.

Keywords: image fusion, iris recognition, local binary pattern, wavelet

Procedia PDF Downloads 367
14588 2.5D Face Recognition Using Gabor Discrete Cosine Transform

Authors: Ali Cheraghian, Farshid Hajati, Soheila Gheisari, Yongsheng Gao

Abstract:

In this paper, we present a novel 2.5D face recognition method based on Gabor Discrete Cosine Transform (GDCT). In the proposed method, the Gabor filter is applied to extract feature vectors from the texture and the depth information. Then, Discrete Cosine Transform (DCT) is used for dimensionality and redundancy reduction to improve computational efficiency. The system is combined texture and depth information in the decision level, which presents higher performance compared to methods, which use texture and depth information, separately. The proposed algorithm is examined on publically available Bosphorus database including models with pose variation. The experimental results show that the proposed method has a higher performance compared to the benchmark.

Keywords: Gabor filter, discrete cosine transform, 2.5d face recognition, pose

Procedia PDF Downloads 328