Search results for: human recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 9339

Search results for: human recognition

9159 The Impact of Artificial Intelligence on Human Rights Development

Authors: Romany Wagih Farag Zaky

Abstract:

The relationship between development and human rights has long been the subject of academic debate. To understand the dynamics between these two concepts, various principles are adopted, from the right to development to development-based human rights. Despite the initiatives taken, the relationship between development and human rights remains unclear. However, the overlap between these two views and the idea that efforts should be made in the field of human rights have increased in recent years. It is then evaluated whether the right to sustainable development is acceptable or not. The article concludes that the principles of sustainable development are directly or indirectly recognized in various human rights instruments, which is a good answer to the question posed above. This book therefore cites regional and international human rights agreements such as , as well as the jurisprudence and interpretative guidelines of human rights institutions, to prove this hypothesis.

Keywords: sustainable development, human rights, the right to development, the human rights-based approach to development, environmental rights, economic development, social sustainability human rights protection, human rights violations, workers’ rights, justice, security

Procedia PDF Downloads 17
9158 A Fast, Reliable Technique for Face Recognition Based on Hidden Markov Model

Authors: Sameh Abaza, Mohamed Ibrahim, Tarek Mahmoud

Abstract:

Due to the development in the digital image processing, its wide use in many applications such as medical, security, and others, the need for more accurate techniques that are reliable, fast and robust is vehemently demanded. In the field of security, in particular, speed is of the essence. In this paper, a pattern recognition technique that is based on the use of Hidden Markov Model (HMM), K-means and the Sobel operator method is developed. The proposed technique is proved to be fast with respect to some other techniques that are investigated for comparison. Moreover, it shows its capability of recognizing the normal face (center part) as well as face boundary.

Keywords: HMM, K-Means, Sobel, accuracy, face recognition

Procedia PDF Downloads 300
9157 Iris Recognition Based on the Low Order Norms of Gradient Components

Authors: Iman A. Saad, Loay E. George

Abstract:

Iris pattern is an important biological feature of human body; it becomes very hot topic in both research and practical applications. In this paper, an algorithm is proposed for iris recognition and a simple, efficient and fast method is introduced to extract a set of discriminatory features using first order gradient operator applied on grayscale images. The gradient based features are robust, up to certain extents, against the variations may occur in contrast or brightness of iris image samples; the variations are mostly occur due lightening differences and camera changes. At first, the iris region is located, after that it is remapped to a rectangular area of size 360x60 pixels. Also, a new method is proposed for detecting eyelash and eyelid points; it depends on making image statistical analysis, to mark the eyelash and eyelid as a noise points. In order to cover the features localization (variation), the rectangular iris image is partitioned into N overlapped sub-images (blocks); then from each block a set of different average directional gradient densities values is calculated to be used as texture features vector. The applied gradient operators are taken along the horizontal, vertical and diagonal directions. The low order norms of gradient components were used to establish the feature vector. Euclidean distance based classifier was used as a matching metric for determining the degree of similarity between the features vector extracted from the tested iris image and template features vectors stored in the database. Experimental tests were performed using 2639 iris images from CASIA V4-Interival database, the attained recognition accuracy has reached up to 99.92%.

Keywords: iris recognition, contrast stretching, gradient features, texture features, Euclidean metric

Procedia PDF Downloads 302
9156 Mood Recognition Using Indian Music

Authors: Vishwa Joshi

Abstract:

The study of mood recognition in the field of music has gained a lot of momentum in the recent years with machine learning and data mining techniques and many audio features contributing considerably to analyze and identify the relation of mood plus music. In this paper we consider the same idea forward and come up with making an effort to build a system for automatic recognition of mood underlying the audio song’s clips by mining their audio features and have evaluated several data classification algorithms in order to learn, train and test the model describing the moods of these audio songs and developed an open source framework. Before classification, Preprocessing and Feature Extraction phase is necessary for removing noise and gathering features respectively.

Keywords: music, mood, features, classification

Procedia PDF Downloads 474
9155 Conversational Assistive Technology of Visually Impaired Person for Social Interaction

Authors: Komal Ghafoor, Tauqir Ahmad, Murtaza Hanif, Hira Zaheer

Abstract:

Assistive technology has been developed to support visually impaired people in their social interactions. Conversation assistive technology is designed to enhance communication skills, facilitate social interaction, and improve the quality of life of visually impaired individuals. This technology includes speech recognition, text-to-speech features, and other communication devices that enable users to communicate with others in real time. The technology uses natural language processing and machine learning algorithms to analyze spoken language and provide appropriate responses. It also includes features such as voice commands and audio feedback to provide users with a more immersive experience. These technologies have been shown to increase the confidence and independence of visually impaired individuals in social situations and have the potential to improve their social skills and relationships with others. Overall, conversation-assistive technology is a promising tool for empowering visually impaired people and improving their social interactions. One of the key benefits of conversation-assistive technology is that it allows visually impaired individuals to overcome communication barriers that they may face in social situations. It can help them to communicate more effectively with friends, family, and colleagues, as well as strangers in public spaces. By providing a more seamless and natural way to communicate, this technology can help to reduce feelings of isolation and improve overall quality of life. The main objective of this research is to give blind users the capability to move around in unfamiliar environments through a user-friendly device by face, object, and activity recognition system. This model evaluates the accuracy of activity recognition. This device captures the front view of the blind, detects the objects, recognizes the activities, and answers the blind query. It is implemented using the front view of the camera. The local dataset is collected that includes different 1st-person human activities. The results obtained are the identification of the activities that the VGG-16 model was trained on, where Hugging, Shaking Hands, Talking, Walking, Waving video, etc.

Keywords: dataset, visually impaired person, natural language process, human activity recognition

Procedia PDF Downloads 33
9154 Spatiotemporal Neural Network for Video-Based Pose Estimation

Authors: Bin Ji, Kai Xu, Shunyu Yao, Jingjing Liu, Ye Pan

Abstract:

Human pose estimation is a popular research area in computer vision for its important application in human-machine interface. In recent years, 2D human pose estimation based on convolution neural network has got great progress and development. However, in more and more practical applications, people often need to deal with tasks based on video. It’s not far-fetched for us to consider how to combine the spatial and temporal information together to achieve a balance between computing cost and accuracy. To address this issue, this study proposes a new spatiotemporal model, namely Spatiotemporal Net (STNet) to combine both temporal and spatial information more rationally. As a result, the predicted keypoints heatmap is potentially more accurate and spatially more precise. Under the condition of ensuring the recognition accuracy, the algorithm deal with spatiotemporal series in a decoupled way, which greatly reduces the computation of the model, thus reducing the resource consumption. This study demonstrate the effectiveness of our network over the Penn Action Dataset, and the results indicate superior performance of our network over the existing methods.

Keywords: convolutional long short-term memory, deep learning, human pose estimation, spatiotemporal series

Procedia PDF Downloads 117
9153 The Study on How Social Cues in a Scene Modulate Basic Object Recognition Proces

Authors: Shih-Yu Lo

Abstract:

Stereotypes exist in almost every society, affecting how people interact with each other. However, to our knowledge, the influence of stereotypes was rarely explored in the context of basic perceptual processes. This study aims to explore how the gender stereotype affects object recognition. Participants were presented with a series of scene pictures, followed by a target display with a man or a woman, holding a weapon or a non-weapon object. The task was to identify whether the object in the target display was a weapon or not. Although the gender of the object holder could not predict whether he or she held a weapon, and was irrelevant to the task goal, the participant nevertheless tended to identify the object as a weapon when the object holder was a man than a woman. The analysis based on the signal detection theory showed that the stereotype effect on object recognition mainly resulted from the participant’s bias to make a 'weapon' response when a man was in the scene instead of a woman in the scene. In addition, there was a trend that the participant’s sensitivity to differentiate a weapon from a non-threating object was higher when a woman was in the scene than a man was in the scene. The results of this study suggest that the irrelevant social cues implied in the visual scene can be very powerful that they can modulate the basic object recognition process.

Keywords: gender stereotype, object recognition, signal detection theory, weapon

Procedia PDF Downloads 178
9152 The Effect of Artificial Intelligence on Civil Engineering Outputs and Designs

Authors: Mina Youssef Makram Ibrahim

Abstract:

Engineering identity contributes to the professional and academic sustainability of female engineers. Recognizability is an important factor that shapes an engineer's identity. People who are deprived of real recognition often fail to create a positive identity. This study draws on Hornet’s recognition theory to identify factors that influence female civil engineers' sense of recognition. Over the past decade, a survey was created and distributed to 330 graduate students in the Department of Civil, Civil and Environmental Engineering at Iowa State University. Survey items include demographics, perceptions of a civil engineer's identity, and factors that influence recognition of a civil engineer's identity, such as B. Opinions about society and family. Descriptive analysis of survey responses revealed that perceptions of civil engineering varied significantly. The definitions of civil engineering provided by participants included the terms structure, design and infrastructure. Almost half of the participants said the main reason for studying Civil Engineering was their interest in the subject, and the majority said they were proud to be a civil engineer. Many study participants reported that their parents viewed them as civil engineers. Institutional and operational treatment was also found to have a significant impact on the recognition of women civil engineers. Almost half of the participants reported feeling isolated or ignored at work because of their gender. This research highlights the importance of recognition in developing the identity of women engineers.

Keywords: civil service, hiring, merit, policing civil engineering, construction, surveying, mapping, pile civil service, Kazakhstan, modernization, a national model of civil service, civil service reforms, bureaucracy civil engineering, gender, identity, recognition

Procedia PDF Downloads 24
9151 Evaluate the Changes in Stress Level Using Facial Thermal Imaging

Authors: Amin Derakhshan, Mohammad Mikaili, Mohammad Ali Khalilzadeh, Amin Mohammadian

Abstract:

This paper proposes a stress recognition system from multi-modal bio-potential signals. For stress recognition, Support Vector Machines (SVM) and LDA are applied to design the stress classifiers and its characteristics are investigated. Using gathered data under psychological polygraph experiments, the classifiers are trained and tested. The pattern recognition method classifies stressful from non-stressful subjects based on labels which come from polygraph data. The successful classification rate is 96% for 12 subjects. It means that facial thermal imaging due to its non-contact advantage could be a remarkable alternative for psycho-physiological methods.

Keywords: stress, thermal imaging, face, SVM, polygraph

Procedia PDF Downloads 455
9150 Its about Cortana, Microsoft’s Virtual Assistant

Authors: Aya Idriss, Esraa Othman, Lujain Malak

Abstract:

Artificial intelligence is the emulation of human intelligence processes by machines, particularly computer systems that act logically. Some of the specific applications of AI include natural language processing, speech recognition, and machine vision. Cortana is a virtual assistant and she’s an example of an AI Application. Microsoft made it possible for this app to be accessed not only on laptops and PCs but can be downloaded on mobile phones and used as a virtual assistant which was a huge success. Cortana can offer a lot apart from the basic orders such as setting alarms and marking the calendar. Its capabilities spread past that, for example, it provides us with listening to music and podcasts on the go, managing my to-do list and emails, connecting with my contacts hands-free by simply just telling the virtual assistant to call somebody, gives me instant answers and so on. A questionnaire was sent online to numerous friends and family members to perform the study, which is critical in evaluating Cortana's recognition capacity and the majority of the answers were in favor of Cortana’s capabilities. The results of the questionnaire assisted us in determining the level of Cortana's skills.

Keywords: artificial intelligence, Cortana, AI, abstract

Procedia PDF Downloads 156
9149 Alphabet Recognition Using Pixel Probability Distribution

Authors: Vaidehi Murarka, Sneha Mehta, Dishant Upadhyay

Abstract:

Our project topic is “Alphabet Recognition using pixel probability distribution”. The project uses techniques of Image Processing and Machine Learning in Computer Vision. Alphabet recognition is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files etc. Alphabet Recognition based OCR application is sometimes used in signature recognition which is used in bank and other high security buildings. One of the popular mobile applications includes reading a visiting card and directly storing it to the contacts. OCR's are known to be used in radar systems for reading speeders license plates and lots of other things. The implementation of our project has been done using Visual Studio and Open CV (Open Source Computer Vision). Our algorithm is based on Neural Networks (machine learning). The project was implemented in three modules: (1) Training: This module aims “Database Generation”. Database was generated using two methods: (a) Run-time generation included database generation at compilation time using inbuilt fonts of OpenCV library. Human intervention is not necessary for generating this database. (b) Contour–detection: ‘jpeg’ template containing different fonts of an alphabet is converted to the weighted matrix using specialized functions (contour detection and blob detection) of OpenCV. The main advantage of this type of database generation is that the algorithm becomes self-learning and the final database requires little memory to be stored (119kb precisely). (2) Preprocessing: Input image is pre-processed using image processing concepts such as adaptive thresholding, binarizing, dilating etc. and is made ready for segmentation. “Segmentation” includes extraction of lines, words, and letters from the processed text image. (3) Testing and prediction: The extracted letters are classified and predicted using the neural networks algorithm. The algorithm recognizes an alphabet based on certain mathematical parameters calculated using the database and weight matrix of the segmented image.

Keywords: contour-detection, neural networks, pre-processing, recognition coefficient, runtime-template generation, segmentation, weight matrix

Procedia PDF Downloads 357
9148 Comparison Study of Machine Learning Classifiers for Speech Emotion Recognition

Authors: Aishwarya Ravindra Fursule, Shruti Kshirsagar

Abstract:

In the intersection of artificial intelligence and human-centered computing, this paper delves into speech emotion recognition (SER). It presents a comparative analysis of machine learning models such as K-Nearest Neighbors (KNN),logistic regression, support vector machines (SVM), decision trees, ensemble classifiers, and random forests, applied to SER. The research employs four datasets: Crema D, SAVEE, TESS, and RAVDESS. It focuses on extracting salient audio signal features like Zero Crossing Rate (ZCR), Chroma_stft, Mel Frequency Cepstral Coefficients (MFCC), root mean square (RMS) value, and MelSpectogram. These features are used to train and evaluate the models’ ability to recognize eight types of emotions from speech: happy, sad, neutral, angry, calm, disgust, fear, and surprise. Among the models, the Random Forest algorithm demonstrated superior performance, achieving approximately 79% accuracy. This suggests its suitability for SER within the parameters of this study. The research contributes to SER by showcasing the effectiveness of various machine learning algorithms and feature extraction techniques. The findings hold promise for the development of more precise emotion recognition systems in the future. This abstract provides a succinct overview of the paper’s content, methods, and results.

Keywords: comparison, ML classifiers, KNN, decision tree, SVM, random forest, logistic regression, ensemble classifiers

Procedia PDF Downloads 15
9147 Understanding the Interactive Nature in Auditory Recognition of Phonological/Grammatical/Semantic Errors at the Sentence Level: An Investigation Based upon Japanese EFL Learners’ Self-Evaluation and Actual Language Performance

Authors: Hirokatsu Kawashima

Abstract:

One important element of teaching/learning listening is intensive listening such as listening for precise sounds, words, grammatical, and semantic units. Several classroom-based investigations have been conducted to explore the usefulness of auditory recognition of phonological, grammatical and semantic errors in such a context. The current study reports the results of one such investigation, which targeted auditory recognition of phonological, grammatical, and semantic errors at the sentence level. 56 Japanese EFL learners participated in this investigation, in which their recognition performance of phonological, grammatical and semantic errors was measured on a 9-point scale by learners’ self-evaluation from the perspective of 1) two types of similar English sound (vowel and consonant minimal pair words), 2) two types of sentence word order (verb phrase-based and noun phrase-based word orders), and 3) two types of semantic consistency (verb-purpose and verb-place agreements), respectively, and their general listening proficiency was examined using standardized tests. A number of findings have been made about the interactive relationships between the three types of auditory error recognition and general listening proficiency. Analyses based on the OPLS (Orthogonal Projections to Latent Structure) regression model have disclosed, for example, that the three types of auditory error recognition are linked in a non-linear way: the highest explanatory power for general listening proficiency may be attained when quadratic interactions between auditory recognition of errors related to vowel minimal pair words and that of errors related to noun phrase-based word order are embraced (R2=.33, p=.01).

Keywords: auditory error recognition, intensive listening, interaction, investigation

Procedia PDF Downloads 488
9146 The Impact of Artificial Intelligence on Human Rights Development and Obligations

Authors: Bola George Asaad Bekledas

Abstract:

Relationship between development and human rights has long been the subject of academic debate. To understand the dynamics between these two concepts, various principles are adopted, from the right to development to development-based human rights. Despite the initiatives taken, the relationship between development and human rights remains unclear. However, the compatibility between these two concepts and the idea that these efforts should be made to respect human rights guarantees have gained momentum in recent years. It is then evaluated whether the right to sustainable development is acceptable or not. The article concludes that the principles of sustainable development are directly or indirectly recognized in various human rights instruments, which is a good answer to the question posed above. This study, therefore, cites regional and international human rights agreements such as, as well as the jurisprudence and interpretative guidelines of human rights institutions, to prove this hypothesis.

Keywords: sustainable development, human rights, the right to development, the human rights-based approach to development, environmental rights, economic development, social sustainability human rights protection, human rights violations, workers’ rights, justice, security.

Procedia PDF Downloads 18
9145 Wolof Voice Response Recognition System: A Deep Learning Model for Wolof Audio Classification

Authors: Krishna Mohan Bathula, Fatou Bintou Loucoubar, FNU Kaleemunnisa, Christelle Scharff, Mark Anthony De Castro

Abstract:

Voice recognition algorithms such as automatic speech recognition and text-to-speech systems with African languages can play an important role in bridging the digital divide of Artificial Intelligence in Africa, contributing to the establishment of a fully inclusive information society. This paper proposes a Deep Learning model that can classify the user responses as inputs for an interactive voice response system. A dataset with Wolof language words ‘yes’ and ‘no’ is collected as audio recordings. A two stage Data Augmentation approach is adopted for enhancing the dataset size required by the deep neural network. Data preprocessing and feature engineering with Mel-Frequency Cepstral Coefficients are implemented. Convolutional Neural Networks (CNNs) have proven to be very powerful in image classification and are promising for audio processing when sounds are transformed into spectra. For performing voice response classification, the recordings are transformed into sound frequency feature spectra and then applied image classification methodology using a deep CNN model. The inference model of this trained and reusable Wolof voice response recognition system can be integrated with many applications associated with both web and mobile platforms.

Keywords: automatic speech recognition, interactive voice response, voice response recognition, wolof word classification

Procedia PDF Downloads 87
9144 Makhraj Recognition Using Convolutional Neural Network

Authors: Zan Azma Nasruddin, Irwan Mazlin, Nor Aziah Daud, Fauziah Redzuan, Fariza Hanis Abdul Razak

Abstract:

This paper focuses on a machine learning that learn the correct pronunciation of Makhraj Huroofs. Usually, people need to find an expert to pronounce the Huroof accurately. In this study, the researchers have developed a system that is able to learn the selected Huroofs which are ha, tsa, zho, and dza using the Convolutional Neural Network. The researchers present the chosen type of the CNN architecture to make the system that is able to learn the data (Huroofs) as quick as possible and produces high accuracy during the prediction. The researchers have experimented the system to measure the accuracy and the cross entropy in the training process.

Keywords: convolutional neural network, Makhraj recognition, speech recognition, signal processing, tensorflow

Procedia PDF Downloads 305
9143 The Artificial Intelligence Technologies Used in PhotoMath Application

Authors: Tala Toonsi, Marah Alagha, Lina Alnowaiser, Hala Rajab

Abstract:

This report is about the Photomath app, which is an AI application that uses image recognition technology, specifically optical character recognition (OCR) algorithms. The (OCR) algorithm translates the images into a mathematical equation, and the app automatically provides a step-by-step solution. The application supports decimals, basic arithmetic, fractions, linear equations, and multiple functions such as logarithms. Testing was conducted to examine the usage of this app, and results were collected by surveying ten participants. Later, the results were analyzed. This paper seeks to answer the question: To what level the artificial intelligence features are accurate and the speed of process in this app. It is hoped this study will inform about the efficiency of AI in Photomath to the users.

Keywords: photomath, image recognition, app, OCR, artificial intelligence, mathematical equations.

Procedia PDF Downloads 138
9142 Interventions for Children with Autism Using Interactive Technologies

Authors: Maria Hopkins, Sarah Koch, Fred Biasini

Abstract:

Autism is lifelong disorder that affects one out of every 110 Americans. The deficits that accompany Autism Spectrum Disorders (ASD), such as abnormal behaviors and social incompetence, often make it extremely difficult for these individuals to gain functional independence from caregivers. These long-term implications necessitate an immediate effort to improve social skills among children with an ASD. Any technology that could teach individuals with ASD necessary social skills would not only be invaluable for the individuals affected, but could also effect a massive saving to society in treatment programs. The overall purpose of the first study was to develop, implement, and evaluate an avatar tutor for social skills training in children with ASD. “Face Say” was developed as a colorful computer program that contains several different activities designed to teach children specific social skills, such as eye gaze, joint attention, and facial recognition. The children with ASD were asked to attend to FaceSay or a control painting computer game for six weeks. Children with ASD who received the training had an increase in emotion recognition, F(1, 48) = 23.04, p < 0.001 (adjusted Ms 8.70 and 6.79, respectively) compared to the control group. In addition, children who received the FaceSay training had higher post-test scored in facial recognition, F(1, 48) = 5.09, p < 0.05 (adjusted Ms: 38.11 and 33.37, respectively) compared to controls. The findings provide information about the benefits of computer-based training for children with ASD. Recent research suggests the value of also using socially assistive robots with children who have an ASD. Researchers investigating robots as tools for therapy in ASD have reported increased engagement, increased levels of attention, and novel social behaviors when robots are part of the social interaction. The overall goal of the second study was to develop a social robot designed to teach children specific social skills such as emotion recognition. The robot is approachable, with both an animal-like appearance and features of a human face (i.e., eyes, eyebrows, mouth). The feasibility of the robot is being investigated in children ages 7-12 to explore whether the social robot is capable of forming different facial expressions to accurately display emotions similar to those observed in the human face. The findings of this study will be used to create a potentially effective and cost efficient therapy for improving the cognitive-emotional skills of children with autism. Implications and study findings using the robot as an intervention tool will be discussed.

Keywords: autism, intervention, technology, emotions

Procedia PDF Downloads 351
9141 Features Vector Selection for the Recognition of the Fragmented Handwritten Numeric Chains

Authors: Salim Ouchtati, Aissa Belmeguenai, Mouldi Bedda

Abstract:

In this study, we propose an offline system for the recognition of the fragmented handwritten numeric chains. Firstly, we realized a recognition system of the isolated handwritten digits, in this part; the study is based mainly on the evaluation of neural network performances, trained with the gradient backpropagation algorithm. The used parameters to form the input vector of the neural network are extracted from the binary images of the isolated handwritten digit by several methods: the distribution sequence, sondes application, the Barr features, and the centered moments of the different projections and profiles. Secondly, the study is extended for the reading of the fragmented handwritten numeric chains constituted of a variable number of digits. The vertical projection was used to segment the numeric chain at isolated digits and every digit (or segment) was presented separately to the entry of the system achieved in the first part (recognition system of the isolated handwritten digits).

Keywords: features extraction, handwritten numeric chains, image processing, neural networks

Procedia PDF Downloads 242
9140 Semantic Data Schema Recognition

Authors: Aïcha Ben Salem, Faouzi Boufares, Sebastiao Correia

Abstract:

The subject covered in this paper aims at assisting the user in its quality approach. The goal is to better extract, mix, interpret and reuse data. It deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.

Keywords: schema recognition, semantic data profiling, meta-categorisation, semantic dependencies inter columns

Procedia PDF Downloads 393
9139 Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi

Authors: S. B. Rathna Kumar, Pranjali A Ujwane, Panchanan Mohanty

Abstract:

The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi.

Keywords: speech recognition performance, phonemic balance, equivalence analysis, performance-intensity function testing, reliability, validity

Procedia PDF Downloads 329
9138 Vision-Based Daily Routine Recognition for Healthcare with Transfer Learning

Authors: Bruce X. B. Yu, Yan Liu, Keith C. C. Chan

Abstract:

We propose to record Activities of Daily Living (ADLs) of elderly people using a vision-based system so as to provide better assistive and personalization technologies. Current ADL-related research is based on data collected with help from non-elderly subjects in laboratory environments and the activities performed are predetermined for the sole purpose of data collection. To obtain more realistic datasets for the application, we recorded ADLs for the elderly with data collected from real-world environment involving real elderly subjects. Motivated by the need to collect data for more effective research related to elderly care, we chose to collect data in the room of an elderly person. Specifically, we installed Kinect, a vision-based sensor on the ceiling, to capture the activities that the elderly subject performs in the morning every day. Based on the data, we identified 12 morning activities that the elderly person performs daily. To recognize these activities, we created a HARELCARE framework to investigate into the effectiveness of existing Human Activity Recognition (HAR) algorithms and propose the use of a transfer learning algorithm for HAR. We compared the performance, in terms of accuracy, and training progress. Although the collected dataset is relatively small, the proposed algorithm has a good potential to be applied to all daily routine activities for healthcare purposes such as evidence-based diagnosis and treatment.

Keywords: daily activity recognition, healthcare, IoT sensors, transfer learning

Procedia PDF Downloads 107
9137 Face Recognition Using Body-Worn Camera: Dataset and Baseline Algorithms

Authors: Ali Almadan, Anoop Krishnan, Ajita Rattani

Abstract:

Facial recognition is a widely adopted technology in surveillance, border control, healthcare, banking services, and lately, in mobile user authentication with Apple introducing “Face ID” moniker with iPhone X. A lot of research has been conducted in the area of face recognition on datasets captured by surveillance cameras, DSLR, and mobile devices. Recently, face recognition technology has also been deployed on body-worn cameras to keep officers safe, enabling situational awareness and providing evidence for trial. However, limited academic research has been conducted on this topic so far, without the availability of any publicly available datasets with a sufficient sample size. This paper aims to advance research in the area of face recognition using body-worn cameras. To this aim, the contribution of this work is two-fold: (1) collection of a dataset consisting of a total of 136,939 facial images of 102 subjects captured using body-worn cameras in in-door and daylight conditions and (2) evaluation of various deep-learning architectures for face identification on the collected dataset. Experimental results suggest a maximum True Positive Rate(TPR) of 99.86% at False Positive Rate(FPR) of 0.000 obtained by SphereFace based deep learning architecture in daylight condition. The collected dataset and the baseline algorithms will promote further research and development. A downloadable link of the dataset and the algorithms is available by contacting the authors.

Keywords: face recognition, body-worn cameras, deep learning, person identification

Procedia PDF Downloads 138
9136 Pre-Analysis of Printed Circuit Boards Based on Multispectral Imaging for Vision Based Recognition of Electronics Waste

Authors: Florian Kleber, Martin Kampel

Abstract:

The increasing demand of gallium, indium and rare-earth elements for the production of electronics, e.g. solid state-lighting, photovoltaics, integrated circuits, and liquid crystal displays, will exceed the world-wide supply according to current forecasts. Recycling systems to reclaim these materials are not yet in place, which challenges the sustainability of these technologies. This paper proposes a multispectral imaging system as a basis for a vision based recognition system for valuable components of electronics waste. Multispectral images intend to enhance the contrast of images of printed circuit boards (single components, as well as labels) for further analysis, such as optical character recognition and entire printed circuit board recognition. The results show that a higher contrast is achieved in the near infrared compared to ultraviolet and visible light.

Keywords: electronics waste, multispectral imaging, printed circuit boards, rare-earth elements

Procedia PDF Downloads 393
9135 The Combination of the Mel Frequency Cepstral Coefficients, Perceptual Linear Prediction, Jitter and Shimmer Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech

Authors: Brahim Fares Zaidi

Abstract:

Our work aims to improve our Automatic Recognition System for Dysarthria Speech based on the Hidden Models of Markov and the Hidden Markov Model Toolkit to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients and Perceptual Linear Prediction and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.

Keywords: ARSDS, HTK, HMM, MFCC, PLP

Procedia PDF Downloads 75
9134 Exploring Pre-Trained Automatic Speech Recognition Model HuBERT for Early Alzheimer’s Disease and Mild Cognitive Impairment Detection in Speech

Authors: Monica Gonzalez Machorro

Abstract:

Dementia is hard to diagnose because of the lack of early physical symptoms. Early dementia recognition is key to improving the living condition of patients. Speech technology is considered a valuable biomarker for this challenge. Recent works have utilized conventional acoustic features and machine learning methods to detect dementia in speech. BERT-like classifiers have reported the most promising performance. One constraint, nonetheless, is that these studies are either based on human transcripts or on transcripts produced by automatic speech recognition (ASR) systems. This research contribution is to explore a method that does not require transcriptions to detect early Alzheimer’s disease (AD) and mild cognitive impairment (MCI). This is achieved by fine-tuning a pre-trained ASR model for the downstream early AD and MCI tasks. To do so, a subset of the thoroughly studied Pitt Corpus is customized. The subset is balanced for class, age, and gender. Data processing also involves cropping the samples into 10-second segments. For comparison purposes, a baseline model is defined by training and testing a Random Forest with 20 extracted acoustic features using the librosa library implemented in Python. These are: zero-crossing rate, MFCCs, spectral bandwidth, spectral centroid, root mean square, and short-time Fourier transform. The baseline model achieved a 58% accuracy. To fine-tune HuBERT as a classifier, an average pooling strategy is employed to merge the 3D representations from audio into 2D representations, and a linear layer is added. The pre-trained model used is ‘hubert-large-ls960-ft’. Empirically, the number of epochs selected is 5, and the batch size defined is 1. Experiments show that our proposed method reaches a 69% balanced accuracy. This suggests that the linguistic and speech information encoded in the self-supervised ASR-based model is able to learn acoustic cues of AD and MCI.

Keywords: automatic speech recognition, early Alzheimer’s recognition, mild cognitive impairment, speech impairment

Procedia PDF Downloads 98
9133 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition

Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie

Abstract:

In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.

Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks

Procedia PDF Downloads 80
9132 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 151
9131 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: retail stores, faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition

Procedia PDF Downloads 117
9130 Evolution of the Environmental Justice Concept

Authors: Zahra Bakhtiari

Abstract:

This article explores the development and evolution of the concept of environmental justice, which has shifted from being dominated by white and middle-class individuals to a civil struggle by marginalized communities against environmental injustices. Environmental justice aims to achieve equity in decision-making and policy-making related to the environment. The concept of justice in this context includes four fundamental aspects: distribution, procedure, recognition, and capabilities. Recent scholars have attempted to broaden the concept of justice to include dimensions of participation, recognition, and capabilities. Focusing on all four dimensions of environmental justice is crucial for effective planning and policy-making to address environmental issues. Ignoring any of these aspects can lead to the failure of efforts and the waste of resources.

Keywords: environmental justice, distribution, procedure, recognition, capabilities

Procedia PDF Downloads 58