Search results for: face recognition system
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 20366

Search results for: face recognition system

20276 Pyramid Binary Pattern for Age Invariant Face Verification

Authors: Saroj Bijarnia, Preety Singh

Abstract:

We propose a simple and effective biometrics system based on face verification across aging using a new variant of texture feature, Pyramid Binary Pattern. This employs Local Binary Pattern along with its hierarchical information. Dimension reduction of generated texture feature vector is done using Principal Component Analysis. Support Vector Machine is used for classification. Our proposed method achieves an accuracy of 92:24% and can be used in an automated age-invariant face verification system.

Keywords: biometrics, age invariant, verification, support vector machine

Procedia PDF Downloads 323
20275 Automatic Product Identification Based on Deep-Learning Theory in an Assembly Line

Authors: Fidel Lòpez Saca, Carlos Avilés-Cruz, Miguel Magos-Rivera, José Antonio Lara-Chávez

Abstract:

Automated object recognition and identification systems are widely used throughout the world, particularly in assembly lines, where they perform quality control and automatic part selection tasks. This article presents the design and implementation of an object recognition system in an assembly line. The proposed shapes-color recognition system is based on deep learning theory in a specially designed convolutional network architecture. The used methodology involve stages such as: image capturing, color filtering, location of object mass centers, horizontal and vertical object boundaries, and object clipping. Once the objects are cut out, they are sent to a convolutional neural network, which automatically identifies the type of figure. The identification system works in real-time. The implementation was done on a Raspberry Pi 3 system and on a Jetson-Nano device. The proposal is used in an assembly course of bachelor’s degree in industrial engineering. The results presented include studying the efficiency of the recognition and processing time.

Keywords: deep-learning, image classification, image identification, industrial engineering.

Procedia PDF Downloads 138
20274 The Combination of the Mel Frequency Cepstral Coefficients, Perceptual Linear Prediction, Jitter and Shimmer Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech

Authors: Brahim Fares Zaidi

Abstract:

Our work aims to improve our Automatic Recognition System for Dysarthria Speech based on the Hidden Models of Markov and the Hidden Markov Model Toolkit to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients and Perceptual Linear Prediction and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.

Keywords: ARSDS, HTK, HMM, MFCC, PLP

Procedia PDF Downloads 82
20273 Wolof Voice Response Recognition System: A Deep Learning Model for Wolof Audio Classification

Authors: Krishna Mohan Bathula, Fatou Bintou Loucoubar, FNU Kaleemunnisa, Christelle Scharff, Mark Anthony De Castro

Abstract:

Voice recognition algorithms such as automatic speech recognition and text-to-speech systems with African languages can play an important role in bridging the digital divide of Artificial Intelligence in Africa, contributing to the establishment of a fully inclusive information society. This paper proposes a Deep Learning model that can classify the user responses as inputs for an interactive voice response system. A dataset with Wolof language words ‘yes’ and ‘no’ is collected as audio recordings. A two stage Data Augmentation approach is adopted for enhancing the dataset size required by the deep neural network. Data preprocessing and feature engineering with Mel-Frequency Cepstral Coefficients are implemented. Convolutional Neural Networks (CNNs) have proven to be very powerful in image classification and are promising for audio processing when sounds are transformed into spectra. For performing voice response classification, the recordings are transformed into sound frequency feature spectra and then applied image classification methodology using a deep CNN model. The inference model of this trained and reusable Wolof voice response recognition system can be integrated with many applications associated with both web and mobile platforms.

Keywords: automatic speech recognition, interactive voice response, voice response recognition, wolof word classification

Procedia PDF Downloads 91
20272 The Application of a Hybrid Neural Network for Recognition of a Handwritten Kazakh Text

Authors: Almagul Assainova , Dariya Abykenova, Liudmila Goncharenko, Sergey Sybachin, Saule Rakhimova, Abay Aman

Abstract:

The recognition of a handwritten Kazakh text is a relevant objective today for the digitization of materials. The study presents a model of a hybrid neural network for handwriting recognition, which includes a convolutional neural network and a multi-layer perceptron. Each network includes 1024 input neurons and 42 output neurons. The model is implemented in the program, written in the Python programming language using the EMNIST database, NumPy, Keras, and Tensorflow modules. The neural network training of such specific letters of the Kazakh alphabet as ә, ғ, қ, ң, ө, ұ, ү, h, і was conducted. The neural network model and the program created on its basis can be used in electronic document management systems to digitize the Kazakh text.

Keywords: handwriting recognition system, image recognition, Kazakh font, machine learning, neural networks

Procedia PDF Downloads 234
20271 Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

Authors: Hamed Alqahtani, Manolya Kavakli-Thorne

Abstract:

The large pose discrepancy is one of the critical challenges in face recognition during video surveillance. Due to the entanglement of pose attributes with identity information, the conventional approaches for pose-independent representation lack in providing quality results in recognizing largely posed faces. In this paper, we propose a practical approach to disentangle the pose attribute from the identity information followed by synthesis of a face using a classifier network in latent space. The proposed approach employs a modified generative adversarial network framework consisting of an encoder-decoder structure embedded with a classifier in manifold space for carrying out factorization on the latent encoding. It can be further generalized to other face and non-face attributes for real-life video frames containing faces with significant attribute variations. Experimental results and comparison with state of the art in the field prove that the learned representation of the proposed approach synthesizes more compelling perceptual images through a combination of adversarial and classification losses.

Keywords: disentanglement, face detection, generative adversarial networks, video surveillance

Procedia PDF Downloads 97
20270 A Novel Method for Face Detection

Authors: H. Abas Nejad, A. R. Teymoori

Abstract:

Facial expression recognition is one of the open problems in computer vision. Robust neutral face recognition in real time is a major challenge for various supervised learning based facial expression recognition methods. This is due to the fact that supervised methods cannot accommodate all appearance variability across the faces with respect to race, pose, lighting, facial biases, etc. in the limited amount of training data. Moreover, processing each and every frame to classify emotions is not required, as the user stays neutral for the majority of the time in usual applications like video chat or photo album/web browsing. Detecting neutral state at an early stage, thereby bypassing those frames from emotion classification would save the computational power. In this work, we propose a light-weight neutral vs. emotion classification engine, which acts as a preprocessor to the traditional supervised emotion classification approaches. It dynamically learns neutral appearance at Key Emotion (KE) points using a textural statistical model, constructed by a set of reference neutral frames for each user. The proposed method is made robust to various types of user head motions by accounting for affine distortions based on a textural statistical model. Robustness to dynamic shift of KE points is achieved by evaluating the similarities on a subset of neighborhood patches around each KE point using the prior information regarding the directionality of specific facial action units acting on the respective KE point. The proposed method, as a result, improves ER accuracy and simultaneously reduces the computational complexity of ER system, as validated on multiple databases.

Keywords: neutral vs. emotion classification, Constrained Local Model, procrustes analysis, Local Binary Pattern Histogram, statistical model

Procedia PDF Downloads 323
20269 An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language

Authors: M. Draman, S. Z. Muhamad Yassin, M. S. Alias, Z. Lambak, M. I. Zulkifli, S. N. Padhi, K. N. Baharim, F. Maskuriy, A. I. A. Rahim

Abstract:

The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect.

Keywords: conversational speech recognition, deep neural network, Malay language, speech recognition

Procedia PDF Downloads 300
20268 Classifying Facial Expressions Based on a Motion Local Appearance Approach

Authors: Fabiola M. Villalobos-Castaldi, Nicolás C. Kemper, Esther Rojas-Krugger, Laura G. Ramírez-Sánchez

Abstract:

This paper presents the classification results about exploring the combination of a motion based approach with a local appearance method to describe the facial motion caused by the muscle contractions and expansions that are presented in facial expressions. The proposed feature extraction method take advantage of the knowledge related to which parts of the face reflects the highest deformations, so we selected 4 specific facial regions at which the appearance descriptor were applied. The most common used approaches for feature extraction are the holistic and the local strategies. In this work we present the results of using a local appearance approach estimating the correlation coefficient to the 4 corresponding landmark-localized facial templates of the expression face related to the neutral face. The results let us to probe how the proposed motion estimation scheme based on the local appearance correlation computation can simply and intuitively measure the motion parameters for some of the most relevant facial regions and how these parameters can be used to recognize facial expressions automatically.

Keywords: facial expression recognition system, feature extraction, local-appearance method, motion-based approach

Procedia PDF Downloads 390
20267 Specified Human Motion Recognition and Unknown Hand-Held Object Tracking

Authors: Jinsiang Shaw, Pik-Hoe Chen

Abstract:

This paper aims to integrate human recognition, motion recognition, and object tracking technologies without requiring a pre-training database model for motion recognition or the unknown object itself. Furthermore, it can simultaneously track multiple users and multiple objects. Unlike other existing human motion recognition methods, our approach employs a rule-based condition method to determine if a user hand is approaching or departing an object. It uses a background subtraction method to separate the human and object from the background, and employs behavior features to effectively interpret human object-grabbing actions. With an object’s histogram characteristics, we are able to isolate and track it using back projection. Hence, a moving object trajectory can be recorded and the object itself can be located. This particular technique can be used in a camera surveillance system in a shopping area to perform real-time intelligent surveillance, thus preventing theft. Experimental results verify the validity of the developed surveillance algorithm with an accuracy of 83% for shoplifting detection.

Keywords: Automatic Tracking, Back Projection, Motion Recognition, Shoplifting

Procedia PDF Downloads 308
20266 Defect Localization and Interaction on Surfaces with Projection Mapping and Gesture Recognition

Authors: Qiang Wang, Hongyang Yu, MingRong Lai, Miao Luo

Abstract:

This paper presents a method for accurately localizing and interacting with known surface defects by overlaying patterns onto real-world surfaces using a projection system. Given the world coordinates of the defects, we project corresponding patterns onto the surfaces, providing an intuitive visualization of the specific defect locations. To enable users to interact with and retrieve more information about individual defects, we implement a gesture recognition system based on a pruned and optimized version of YOLOv6. This lightweight model achieves an accuracy of 82.8% and is suitable for deployment on low-performance devices. Our approach demonstrates the potential for enhancing defect identification, inspection processes, and user interaction in various applications.

Keywords: defect localization, projection mapping, gesture recognition, YOLOv6

Procedia PDF Downloads 60
20265 Design and Implementation of an Image Based System to Enhance the Security of ATM

Authors: Seyed Nima Tayarani Bathaie

Abstract:

In this paper, an image-receiving system was designed and implemented through optimization of object detection algorithms using Haar features. This optimized algorithm served as face and eye detection separately. Then, cascading them led to a clear image of the user. Utilization of this feature brought about higher security by preventing fraud. This attribute results from the fact that services will be given to the user on condition that a clear image of his face has already been captured which would exclude the inappropriate person. In order to expedite processing and eliminating unnecessary ones, the input image was compressed, a motion detection function was included in the program, and detection window size was confined.

Keywords: face detection algorithm, Haar features, security of ATM

Procedia PDF Downloads 392
20264 The Effect of Pixelation on Face Detection: Evidence from Eye Movements

Authors: Kaewmart Pongakkasira

Abstract:

This study investigated how different levels of pixelation affect face detection in natural scenes. Eye movements and reaction times, while observers searched for faces in natural scenes rendered in different ranges of pixels, were recorded. Detection performance for coarse visual detail at lower pixel size (3 x 3) was better than with very blurred detail carried by higher pixel size (9 x 9). The result is consistent with the notion that face detection relies on gross detail information of face-shape template, containing crude shape structure and features. In contrast, detection was impaired when face shape and features are obscured. However, it was considered that the degradation of scenic information might also contribute to the effect. In the next experiment, a more direct measurement of the effect of pixelation on face detection, only the embedded face photographs, but not the scene background, will be filtered.

Keywords: eye movements, face detection, face-shape information, pixelation

Procedia PDF Downloads 295
20263 Pre-Analysis of Printed Circuit Boards Based on Multispectral Imaging for Vision Based Recognition of Electronics Waste

Authors: Florian Kleber, Martin Kampel

Abstract:

The increasing demand of gallium, indium and rare-earth elements for the production of electronics, e.g. solid state-lighting, photovoltaics, integrated circuits, and liquid crystal displays, will exceed the world-wide supply according to current forecasts. Recycling systems to reclaim these materials are not yet in place, which challenges the sustainability of these technologies. This paper proposes a multispectral imaging system as a basis for a vision based recognition system for valuable components of electronics waste. Multispectral images intend to enhance the contrast of images of printed circuit boards (single components, as well as labels) for further analysis, such as optical character recognition and entire printed circuit board recognition. The results show that a higher contrast is achieved in the near infrared compared to ultraviolet and visible light.

Keywords: electronics waste, multispectral imaging, printed circuit boards, rare-earth elements

Procedia PDF Downloads 398
20262 Gender Recognition with Deep Belief Networks

Authors: Xiaoqi Jia, Qing Zhu, Hao Zhang, Su Yang

Abstract:

A gender recognition system is able to tell the gender of the given person through a few of frontal facial images. An effective gender recognition approach enables to improve the performance of many other applications, including security monitoring, human-computer interaction, image or video retrieval and so on. In this paper, we present an effective method for gender classification task in frontal facial images based on deep belief networks (DBNs), which can pre-train model and improve accuracy a little bit. Our experiments have shown that the pre-training method with DBNs for gender classification task is feasible and achieves a little improvement of accuracy on FERET and CAS-PEAL-R1 facial datasets.

Keywords: gender recognition, beep belief net-works, semi-supervised learning, greedy-layer wise RBMs

Procedia PDF Downloads 426
20261 Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control

Authors: Van Nhan Nguyen, Harald Holone

Abstract:

Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed.

Keywords: automatic speech recognition, asr, air traffic control, atc

Procedia PDF Downloads 373
20260 Recognition of Spelling Problems during the Text in Progress: A Case Study on the Comments Made by Portuguese Students Newly Literate

Authors: E. Calil, L. A. Pereira

Abstract:

The acquisition of orthography is a complex process, involving both lexical and grammatical questions. This learning occurs simultaneously with the domain of multiple textual aspects (e.g.: graphs, punctuation, etc.). However, most of the research on orthographic acquisition focus on this acquisition from an autonomous point of view, separated from the process of textual production. This means that their object of analysis is the production of words selected by the researcher or the requested sentences in an experimental and controlled setting. In addition, the analysis of the Spelling Problems (SP) are identified by the researcher on the sheet of paper. Considering the perspective of Textual Genetics, from an enunciative approach, this study will discuss the SPs recognized by dyads of newly literate students, while they are writing a text collaboratively. Six proposals of textual production were registered, requested by a 2nd year teacher of a Portuguese Primary School between January and March 2015. In our case study we discuss the SPs recognized by the dyad B and L (7 years old). We adopted as a methodological tool the Ramos System audiovisual record. This system allows real-time capture of the text in process and of the face-to-face dialogue between both students and their teacher, and also captures the body movements and facial expressions of the participants during textual production proposals in the classroom. In these ecological conditions of multimodal registration of collaborative writing, we could identify the emergence of SP in two dimensions: i. In the product (finished text): SP identification without recursive graphic marks (without erasures) and the identification of SPs with erasures, indicating the recognition of SP by the student; ii. In the process (text in progress): identification of comments made by students about recognized SPs. Given this, we’ve analyzed the comments on identified SPs during the text in progress. These comments characterize a type of reformulation referred to as Commented Oral Erasure (COE). The COE has two enunciative forms: Simple Comment (SC) such as ' 'X' is written with 'Y' '; or Unfolded Comment (UC), such as ' 'X' is written with 'Y' because...'. The spelling COE may also occur before or during the SP (Early Spelling Recognition - ESR) or after the SP has been entered (Later Spelling Recognition - LSR). There were 631 words entered in the 6 stories written by the B-L dyad, 145 of them containing some type of SP. During the text in progress, the students recognized orally 174 SP, 46 of which were identified in advance (ESRs) and 128 were identified later (LSPs). If we consider that the 88 erasure SPs in the product indicate some form of SP recognition, we can observe that there were twice as many SPs recognized orally. The ESR was characterized by SC when students asked their colleague or teacher how to spell a given word. The LSR presented predominantly UC, verbalizing meta-orthographic arguments, mostly made by L. These results indicate that writing in dyad is an important didactic strategy for the promotion of metalinguistic reflection, favoring the learning of spelling.

Keywords: collaborative writing, erasure, learning, metalinguistic awareness, spelling, text production

Procedia PDF Downloads 147
20259 Switching to the Latin Alphabet in Kazakhstan: A Brief Overview of Character Recognition Methods

Authors: Ainagul Yermekova, Liudmila Goncharenko, Ali Baghirzade, Sergey Sybachin

Abstract:

In this article, we address the problem of Kazakhstan's transition to the Latin alphabet. The transition process started in 2017 and is scheduled to be completed in 2025. In connection with these events, the problem of recognizing the characters of the new alphabet is raised. Well-known character recognition programs such as ABBYY FineReader, FormReader, MyScript Stylus did not recognize specific Kazakh letters that were used in Cyrillic. The author tries to give an assessment of the well-known method of character recognition that could be in demand as part of the country's transition to the Latin alphabet. Three methods of character recognition: template, structured, and feature-based, are considered through the algorithms of operation. At the end of the article, a general conclusion is made about the possibility of applying a certain method to a particular recognition process: for example, in the process of population census, recognition of typographic text in Latin, or recognition of photos of car numbers, store signs, etc.

Keywords: text detection, template method, recognition algorithm, structured method, feature method

Procedia PDF Downloads 162
20258 Offline Signature Verification in Punjabi Based On SURF Features and Critical Point Matching Using HMM

Authors: Rajpal Kaur, Pooja Choudhary

Abstract:

Biometrics, which refers to identifying an individual based on his or her physiological or behavioral characteristics, has the capabilities to the reliably distinguish between an authorized person and an imposter. The Signature recognition systems can categorized as offline (static) and online (dynamic). This paper presents Surf Feature based recognition of offline signatures system that is trained with low-resolution scanned signature images. The signature of a person is an important biometric attribute of a human being which can be used to authenticate human identity. However the signatures of human can be handled as an image and recognized using computer vision and HMM techniques. With modern computers, there is need to develop fast algorithms for signature recognition. There are multiple techniques are defined to signature recognition with a lot of scope of research. In this paper, (static signature) off-line signature recognition & verification using surf feature with HMM is proposed, where the signature is captured and presented to the user in an image format. Signatures are verified depended on parameters extracted from the signature using various image processing techniques. The Off-line Signature Verification and Recognition is implemented using Mat lab platform. This work has been analyzed or tested and found suitable for its purpose or result. The proposed method performs better than the other recently proposed methods.

Keywords: offline signature verification, offline signature recognition, signatures, SURF features, HMM

Procedia PDF Downloads 362
20257 Efficient Feature Fusion for Noise Iris in Unconstrained Environment

Authors: Yao-Hong Tsai

Abstract:

This paper presents an efficient fusion algorithm for iris images to generate stable feature for recognition in unconstrained environment. Recently, iris recognition systems are focused on real scenarios in our daily life without the subject’s cooperation. Under large variation in the environment, the objective of this paper is to combine information from multiple images of the same iris. The result of image fusion is a new image which is more stable for further iris recognition than each original noise iris image. A wavelet-based approach for multi-resolution image fusion is applied in the fusion process. The detection of the iris image is based on Adaboost algorithm and then local binary pattern (LBP) histogram is then applied to texture classification with the weighting scheme. Experiment showed that the generated features from the proposed fusion algorithm can improve the performance for verification system through iris recognition.

Keywords: image fusion, iris recognition, local binary pattern, wavelet

Procedia PDF Downloads 350
20256 Recognition of Tifinagh Characters with Missing Parts Using Neural Network

Authors: El Mahdi Barrah, Said Safi, Abdessamad Malaoui

Abstract:

In this paper, we present an algorithm for reconstruction from incomplete 2D scans for tifinagh characters. This algorithm is based on using correlation between the lost block and its neighbors. This system proposed contains three main parts: pre-processing, features extraction and recognition. In the first step, we construct a database of tifinagh characters. In the second step, we will apply “shape analysis algorithm”. In classification part, we will use Neural Network. The simulation results demonstrate that the proposed method give good results.

Keywords: Tifinagh character recognition, neural networks, local cost computation, ANN

Procedia PDF Downloads 311
20255 Classification System for Soft Tissue Injuries of Face: Bringing Objectiveness to Injury Severity

Authors: Garg Ramneesh, Uppal Sanjeev, Mittal Rajinder, Shah Sheerin, Jain Vikas, Singla Bhupinder

Abstract:

Introduction: Despite advances in trauma care, a classification system for soft tissue injuries of the face still needs to be objectively defined. Aim: To develop a classification system for soft tissue injuries of the face; that is objective, easy to remember, reproducible, universally applicable, aids in surgical management and helps to develop a structured data that can be used for future use. Material and Methods: This classification system includes those patients that need surgical management of facial injuries. Associated underlying bony fractures have been intentionally excluded. Depending upon the severity of soft tissue injury, these can be graded from 0 to IV (O-Abrasions, I-lacerations, II-Avulsion injuries with no skin loss, III-Avulsion injuries with skin loss that would need graft or flap cover, and IV-complex injuries). Anatomically, the face has been divided into three zones (Zone 1/2/3), as per aesthetic subunits. Zone 1e stands for injury of eyebrows; Zones 2 a/b/c stand for nose, upper eyelid and lower eyelid respectively; Zones 3 a/b/c stand for upper lip, lower lip and cheek respectively. Suffices R and L stand for right or left involved side, B for presence of foreign body like glass or pellets, C for extensive contamination and D for depth which can be graded as D 1/2/3 if depth is still fat, muscle or bone respectively. I is for damage to facial nerve or parotid duct. Results and conclusions: This classification system is easy to remember, clinically applicable and would help in standardization of surgical management of soft tissue injuries of face. Certain inherent limitations of this classification system are inability to classify sutured wounds, hematomas and injuries along or against Langer’s lines.

Keywords: soft tissue injuries, face, avulsion, classification

Procedia PDF Downloads 363
20254 Sarcasm Recognition System Using Hybrid Tone-Word Spotting Audio Mining Technique

Authors: Sandhya Baskaran, Hari Kumar Nagabushanam

Abstract:

Sarcasm sentiment recognition is an area of natural language processing that is being probed into in the recent times. Even with the advancements in NLP, typical translations of words, sentences in its context fail to provide the exact information on a sentiment or emotion of a user. For example, if something bad happens, the statement ‘That's just what I need, great! Terrific!’ is expressed in a sarcastic tone which could be misread as a positive sign by any text-based analyzer. In this paper, we are presenting a unique real time ‘word with its tone’ spotting technique which would provide the sentiment analysis for a tone or pitch of a voice in combination with the words being expressed. This hybrid approach increases the probability for identification of special sentiment like sarcasm much closer to the real world than by mining text or speech individually. The system uses a tone analyzer such as YIN-FFT which extracts pitch segment-wise that would be used in parallel with a speech recognition system. The clustered data is classified for sentiments and sarcasm score for each of it determined. Our Simulations demonstrates the improvement in f-measure of around 12% compared to existing detection techniques with increased precision and recall.

Keywords: sarcasm recognition, tone-word spotting, natural language processing, pitch analyzer

Procedia PDF Downloads 272
20253 A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children

Authors: Mumtaz Begum Mustafa, Siti Salwah Salim, Feizal Dani Rahman

Abstract:

Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech.

Keywords: Automatic Speech Recognition System, children speech, adaptation, Malay

Procedia PDF Downloads 370
20252 Conversational Assistive Technology of Visually Impaired Person for Social Interaction

Authors: Komal Ghafoor, Tauqir Ahmad, Murtaza Hanif, Hira Zaheer

Abstract:

Assistive technology has been developed to support visually impaired people in their social interactions. Conversation assistive technology is designed to enhance communication skills, facilitate social interaction, and improve the quality of life of visually impaired individuals. This technology includes speech recognition, text-to-speech features, and other communication devices that enable users to communicate with others in real time. The technology uses natural language processing and machine learning algorithms to analyze spoken language and provide appropriate responses. It also includes features such as voice commands and audio feedback to provide users with a more immersive experience. These technologies have been shown to increase the confidence and independence of visually impaired individuals in social situations and have the potential to improve their social skills and relationships with others. Overall, conversation-assistive technology is a promising tool for empowering visually impaired people and improving their social interactions. One of the key benefits of conversation-assistive technology is that it allows visually impaired individuals to overcome communication barriers that they may face in social situations. It can help them to communicate more effectively with friends, family, and colleagues, as well as strangers in public spaces. By providing a more seamless and natural way to communicate, this technology can help to reduce feelings of isolation and improve overall quality of life. The main objective of this research is to give blind users the capability to move around in unfamiliar environments through a user-friendly device by face, object, and activity recognition system. This model evaluates the accuracy of activity recognition. This device captures the front view of the blind, detects the objects, recognizes the activities, and answers the blind query. It is implemented using the front view of the camera. The local dataset is collected that includes different 1st-person human activities. The results obtained are the identification of the activities that the VGG-16 model was trained on, where Hugging, Shaking Hands, Talking, Walking, Waving video, etc.

Keywords: dataset, visually impaired person, natural language process, human activity recognition

Procedia PDF Downloads 36
20251 The Effect of Computer-Mediated vs. Face-to-Face Instruction on L2 Pragmatics: A Meta-Analysis

Authors: Marziyeh Yousefi, Hossein Nassaji

Abstract:

This paper reports the results of a meta-analysis of studies on the effects of instruction mode on learning second language pragmatics during the last decade (from 2006 to 2016). After establishing related inclusion/ exclusion criteria, 39 published studies were retrieved and included in the present meta-analysis. Studies were later coded for face-to-face and computer-assisted mode of instruction. Statistical procedures were applied to obtain effect sizes. It was found that Computer-Assisted-Language-Learning studies generated larger effects than Face-to-Face instruction.

Keywords: meta-analysis, effect size, L2 pragmatics, comprehensive meta-analysis, face-to-face, computer-assisted language learning

Procedia PDF Downloads 198
20250 An MrPPG Method for Face Anti-Spoofing

Authors: Lan Zhang, Cailing Zhang

Abstract:

In recent years, many face anti-spoofing algorithms have high detection accuracy when detecting 2D face anti-spoofing or 3D mask face anti-spoofing alone in the field of face anti-spoofing, but their detection performance is greatly reduced in multidimensional and cross-datasets tests. The rPPG method used for face anti-spoofing uses the unique vital information of real face to judge real faces and face anti-spoofing, so rPPG method has strong stability compared with other methods, but its detection rate of 2D face anti-spoofing needs to be improved. Therefore, in this paper, we improve an rPPG(Remote Photoplethysmography) method(MrPPG) for face anti-spoofing which through color space fusion, using the correlation of pulse signals between real face regions and background regions, and introducing the cyclic neural network (LSTM) method to improve accuracy in 2D face anti-spoofing. Meanwhile, the MrPPG also has high accuracy and good stability in face anti-spoofing of multi-dimensional and cross-data datasets. The improved method was validated on Replay-Attack, CASIA-FASD, Siw and HKBU_MARs_V2 datasets, the experimental results show that the performance and stability of the improved algorithm proposed in this paper is superior to many advanced algorithms.

Keywords: face anti-spoofing, face presentation attack detection, remote photoplethysmography, MrPPG

Procedia PDF Downloads 153
20249 Optimized Dynamic Bayesian Networks and Neural Verifier Test Applied to On-Line Isolated Characters Recognition

Authors: Redouane Tlemsani, Redouane, Belkacem Kouninef, Abdelkader Benyettou

Abstract:

In this paper, our system is a Markovien system which we can see it like a Dynamic Bayesian Networks. One of the major interests of these systems resides in the complete training of the models (topology and parameters) starting from training data. The Bayesian Networks are representing models of dubious knowledge on complex phenomena. They are a union between the theory of probability and the graph theory in order to give effective tools to represent a joined probability distribution on a set of random variables. The representation of knowledge bases on description, by graphs, relations of causality existing between the variables defining the field of study. The theory of Dynamic Bayesian Networks is a generalization of the Bayesians networks to the dynamic processes. Our objective amounts finding the better structure which represents the relationships (dependencies) between the variables of a dynamic bayesian network. In applications in pattern recognition, one will carry out the fixing of the structure which obliges us to admit some strong assumptions (for example independence between some variables).

Keywords: Arabic on line character recognition, dynamic Bayesian network, pattern recognition, networks

Procedia PDF Downloads 591
20248 Content Based Face Sketch Images Retrieval in WHT, DCT, and DWT Transform Domain

Authors: W. S. Besbas, M. A. Artemi, R. M. Salman

Abstract:

Content based face sketch retrieval can be used to find images of criminals from their sketches for 'Crime Prevention'. This paper investigates the problem of CBIR of face sketch images in transform domain. Face sketch images that are similar to the query image are retrieved from the face sketch database. Features of the face sketch image are extracted in the spectrum domain of a selected transforms. These transforms are Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Walsh Hadamard Transform (WHT). For the performance analyses of features selection methods three face images databases are used. These are 'Sheffield face database', 'Olivetti Research Laboratory (ORL) face database', and 'Indian face database'. The City block distance measure is used to evaluate the performance of the retrieval process. The investigation concludes that, the retrieval rate is database dependent. But in general, the DCT is the best. On the other hand, the WHT is the best with respect to the speed of retrieving images.

Keywords: Content Based Image Retrieval (CBIR), face sketch image retrieval, features selection for CBIR, image retrieval in transform domain

Procedia PDF Downloads 465
20247 Facial Behavior Modifications Following the Diffusion of the Use of Protective Masks Due to COVID-19

Authors: Andreas Aceranti, Simonetta Vernocchi, Marco Colorato, Daniel Zaccariello

Abstract:

Our study explores the usefulness of implementing facial expression recognition capabilities and using the Facial Action Coding System (FACS) in contexts where the other person is wearing a mask. In the communication process, the subjects use a plurality of distinct and autonomous reporting systems. Among them, the system of mimicking facial movements is worthy of attention. Basic emotion theorists have identified the existence of specific and universal patterns of facial expressions related to seven basic emotions -anger, disgust, contempt, fear, sadness, surprise, and happiness- that would distinguish one emotion from another. However, due to the COVID-19 pandemic, we have come up against the problem of having the lower half of the face covered and, therefore, not investigable due to the masks. Facial-emotional behavior is a good starting point for understanding: (1) the affective state (such as emotions), (2) cognitive activity (perplexity, concentration, boredom), (3) temperament and personality traits (hostility, sociability, shyness), (4) psychopathology (such as diagnostic information relevant to depression, mania, schizophrenia, and less severe disorders), (5) psychopathological processes that occur during social interactions patient and analyst. There are numerous methods to measure facial movements resulting from the action of muscles, see for example, the measurement of visible facial actions using coding systems (non-intrusive systems that require the presence of an observer who encodes and categorizes behaviors) and the measurement of electrical "discharges" of contracting muscles (facial electromyography; EMG). However, the measuring system invented by Ekman and Friesen (2002) - "Facial Action Coding System - FACS" is the most comprehensive, complete, and versatile. Our study, carried out on about 1,500 subjects over three years of work, allowed us to highlight how the movements of the hands and upper part of the face change depending on whether the subject wears a mask or not. We have been able to identify specific alterations to the subjects’ hand movement patterns and their upper face expressions while wearing masks compared to when not wearing them. We believe that finding correlations between how body language changes when our facial expressions are impaired can provide a better understanding of the link between the face and body non-verbal language.

Keywords: facial action coding system, COVID-19, masks, facial analysis

Procedia PDF Downloads 53