Search results for: visual recognition
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 3359

Search results for: visual recognition

3089 Creative Applications for Socially Assistive Robots to Support Mental Health: A Patient-Centered Feasibility Study

Authors: Andreas Kornmaaler Hansen, Carlos Gomez Cubero, Elizabeth Jochum

Abstract:

The use of the arts in therapy and rehabilitation is well established, and there is growing recognition of the value of the arts for improving health and well-being across diverse populations. Combining arts with socially assistive robots is a relatively under-explored research area. This paper presents the results of a feasibility study conducted within an existing arts and health program to scope the possibility of combining visual arts with socially assistive robots to promote mental health and well-being. Using a participatory research design with participant-led perspectives, we present the results of our feasibility study with a collaborative drawing robot among an adult population with mild to severe mental illness. We identify key methodological challenges and advantages of working with participatory and human-centered approaches. Based on the results of three pilot workshops with participants and lay health workers, we outline suggestions for authentic engagement with real stakeholders toward the development of socially assistive robots in community health contexts. Working closely with a patient population at all levels of the research process is key for developing tools and interventions that center patient experience and priorities while minimizing the risks of alienating patients and communities.

Keywords: arts and health, visual art, health promotion, mental health, collaborative robots, creativity, socially assistive robots

Procedia PDF Downloads 45
3088 Magnitude of Visual Impairment and Associated Factors among Adult Glaucoma Patients Attending University of Gondar, Comprehensive Specialized Hospital, Tertiary Eye Care and Training Center, Northwest Ethiopia, 2022

Authors: Getenet Shumet Birhan, Biruk Lelisa Eticha, Gizachew Tilahun Belete, Fisseha Admassu Ayele

Abstract:

Context: Glaucoma is a significant public health concern globally, being the second leading cause of blindness. This study focuses on adult glaucoma patients in Ethiopia, specifically at the University of Gondar. Research Aim: The main objective is to assess the prevalence of visual impairment and identify associated factors among adult glaucoma patients at the University of Gondar. Methodology: The study used an institution-based cross-sectional design, collecting data from 423 glaucoma patients through interviews and medical chart reviews. Descriptive statistics and logistic regression were employed for analysis. Findings: The study found a high prevalence of visual impairment (77.6%) among adult glaucoma patients, with factors such as female sex, rural residence, glaucoma type, disease stage, and duration of diagnosis significantly associated with visual impairment. Theoretical Importance: This research adds valuable insights into the prevalence and determinants of visual impairment among glaucoma patients in Ethiopia, contributing to the existing literature on eye health in low-resource settings. Data Collection: Data were collected through face-to-face interviews and medical chart reviews at the University of Gondar, utilizing a structured questionnaire. Analysis Procedures: Descriptive statistics, frequency analysis, and binary logistic regression were employed to analyze the data and identify factors associated with visual impairment in adult glaucoma patients. Question Addressed: The study sought to answer the question of the prevalence of visual impairment and its associated factors among adult glaucoma patients at the University of Gondar in Northwest Ethiopia. Conclusion: The research concludes that visual impairment is significantly high among adult glaucoma patients in this setting, with several factors playing a role in its occurrence.

Keywords: visual impairment, glaucoma, Ethiopia, Gondar

Procedia PDF Downloads 43
3087 Ezra Pound and James Joyce: Two Different Approaches to the Relation between Literature and Visual Arts

Authors: Espen Gronlie

Abstract:

This paper will suggest that Ezra Pound and James Joyce are paradigmatic for two different approaches to literature and visual arts. Both authors are infamous for being difficult, but this does not mean that their works are similar. Pound famously promoted Joyce’s Ulysses and was instrumental in getting the work published in literary reviews. However, Pound did not appreciate Joyce’s artistic development in his so-called Work in Progress, which was published in 1939 under the title Finnegans Wake. Pound and Joyce will be read as representing two different approaches to literature and other forms of art. Pound can be seen as essentially influenced by cubism and modernist techniques such as collage and montage. While many critics have used these notions to describe The Cantos, this paper will suggest reading Pound’s opus magnum in relation to Finnegans Wake. The latter work shows how Joyce remained tied to an idea of the literary work as sound, as something which may – or perhaps even should – be read aloud. In contrast, Pound’s The Cantos show clear signs of being influenced by experiments in the visual arts. The paper will argue that Pound intended to develop his work in order to bring literature 'up to date' with the development in visual arts, while Joyce stuck to a more classical understanding of the literary work as composed for oral presentation.

Keywords: collage, conceptualism, montage, literature and visual arts

Procedia PDF Downloads 175
3086 Pre-Analysis of Printed Circuit Boards Based on Multispectral Imaging for Vision Based Recognition of Electronics Waste

Authors: Florian Kleber, Martin Kampel

Abstract:

The increasing demand of gallium, indium and rare-earth elements for the production of electronics, e.g. solid state-lighting, photovoltaics, integrated circuits, and liquid crystal displays, will exceed the world-wide supply according to current forecasts. Recycling systems to reclaim these materials are not yet in place, which challenges the sustainability of these technologies. This paper proposes a multispectral imaging system as a basis for a vision based recognition system for valuable components of electronics waste. Multispectral images intend to enhance the contrast of images of printed circuit boards (single components, as well as labels) for further analysis, such as optical character recognition and entire printed circuit board recognition. The results show that a higher contrast is achieved in the near infrared compared to ultraviolet and visible light.

Keywords: electronics waste, multispectral imaging, printed circuit boards, rare-earth elements

Procedia PDF Downloads 401
3085 The Combination of the Mel Frequency Cepstral Coefficients, Perceptual Linear Prediction, Jitter and Shimmer Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech

Authors: Brahim Fares Zaidi

Abstract:

Our work aims to improve our Automatic Recognition System for Dysarthria Speech based on the Hidden Models of Markov and the Hidden Markov Model Toolkit to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients and Perceptual Linear Prediction and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers.

Keywords: ARSDS, HTK, HMM, MFCC, PLP

Procedia PDF Downloads 83
3084 UAV Based Visual Object Tracking

Authors: Vaibhav Dalmia, Manoj Phirke, Renith G

Abstract:

With the wide adoption of UAVs (unmanned aerial vehicles) in various industries by the government as well as private corporations for solving computer vision tasks it’s necessary that their potential is analyzed completely. Recent advances in Deep Learning have also left us with a plethora of algorithms to solve different computer vision tasks. This study provides a comprehensive survey on solving the Visual Object Tracking problem and explains the tradeoffs involved in building a real-time yet reasonably accurate object tracking system for UAVs by looking at existing methods and evaluating them on the aerial datasets. Finally, the best trackers suitable for UAV-based applications are provided.

Keywords: deep learning, drones, single object tracking, visual object tracking, UAVs

Procedia PDF Downloads 134
3083 Design of Visual Repository, Constraint and Process Modeling Tool Based on Eclipse Plug-Ins

Authors: Rushiraj Heshi, Smriti Bhandari

Abstract:

Master Data Management requires creation of Central repository, applying constraints on Repository and designing processes to manage data. Designing of Repository, constraints on repository and business processes is very tedious and time consuming task for large Enterprise. Hence Visual Repository, constraints and Process (Workflow) modeling is the most critical step in Master Data Management.In this paper, we realize a Visual Modeling tool for implementing Repositories, Constraints and Processes based on Eclipse Plugin using GMF/EMF which follows principles of Model Driven Engineering (MDE).

Keywords: EMF, GMF, GEF, repository, constraint, process

Procedia PDF Downloads 467
3082 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 158
3081 Interactive Shadow Play Animation System

Authors: Bo Wan, Xiu Wen, Lingling An, Xiaoling Ding

Abstract:

The paper describes a Chinese shadow play animation system based on Kinect. Users, without any professional training, can personally manipulate the shadow characters to finish a shadow play performance by their body actions and get a shadow play video through giving the record command to our system if they want. In our system, Kinect is responsible for capturing human movement and voice commands data. Gesture recognition module is used to control the change of the shadow play scenes. After packaging the data from Kinect and the recognition result from gesture recognition module, VRPN transmits them to the server-side. At last, the server-side uses the information to control the motion of shadow characters and video recording. This system not only achieves human-computer interaction, but also realizes the interaction between people. It brings an entertaining experience to users and easy to operate for all ages. Even more important is that the application background of Chinese shadow play embodies the protection of the art of shadow play animation.

Keywords: hadow play animation, Kinect, gesture recognition, VRPN, HCI

Procedia PDF Downloads 380
3080 Analyzing the Role of Visual Preferences for Designing of Urban Leftover Spaces

Authors: Jasim Azhar, Morten Gjerde

Abstract:

A city’s space is comprehended as a phenomenon that emerges from the ongoing negotiation between the constructed environment, urban processes, and bodily experience. Many spaces do not represent a static notion but are continually challenged and reconstituted. The ability to recognize those leftover spaces in the urban context is an integral part of an urban redevelopment process, where structured and layered approaches become useful in understanding to transform these spaces into places. Contemporary urban leftover spaces exist as a result of several factors and are present in every major city that often disrupts the flow of districts by creating visually unappealing places. These spaces can be designed, transformed and integrated so as to achieve environmental gains and social preferences. The paper explores how those small changes in visual quality of an urban leftover spaces in Wellington city influence a person’s experience significantly and its potential usage. These spaces can be seen as a catalyst for a change through an ecological sustainability’s framework. A creative and flexible design would lead to psychologically healthy places by improving the image of a city from within. The qualitative research is undertaken through the visual preference studies which will inform the planning initiatives by knowing what people feel about those visual changes in these leftover spaces. Those visual preferences can guide behavior and the emotional responses of different users for the redesign of those spaces with the meaningful attributes. The research is driven by the hypothesis that if the attributes are made visible, the likelihood of stimulating the interest of users should increase.

Keywords: leftover spaces, visual preferences, tactical urbanism, ecological sustainability

Procedia PDF Downloads 263
3079 A Meta-Analysis of Handwriting and Visual-Motor Integration (VMI): The Moderating Effect of Handwriting Dimensions

Authors: Hong Lu, Xin Chen, Zhengcheng Fan

Abstract:

Prior research has claimed a close association between handwriting and mathematics attainment with the help of spatial cognition. However, the exact mechanism behind this relationship remains un-investigated. Focusing on visual-motor integration (VMI), one critical spatial skill, this meta-analysis aims to estimate the size of the handwriting- visual-motor integration relationship and examine the moderating effect of handwriting dimensions on the link. With a random effect model, a medium relation (r=.26, 95%CI [.22, .30]) between handwriting and VMI was summarized in 38 studies with 55 unique samples and 141 effect sizes. Findings suggested handwriting dimensions significantly moderated the handwriting- VMI relationship, with handwriting legibility showing a substantial correlation with VMI, but neither handwriting speed nor pressure. Identifying the essential relationship between handwriting legibility and VMI, this study adds to the literature about the key cognitive processing needs underlying handwriting, and spatial cognition thus highlights the cognitive mechanism regarding handwriting, spatial cognition, and mathematics performances.

Keywords: handwriting, visual-motor integration, legibility, meta-analysis

Procedia PDF Downloads 89
3078 Design and Emotion: The Value of 1970s French Children’s Books in the Middle East

Authors: Tina Sleiman

Abstract:

In the early 1970s, a graphics revolution - in quantity and quality - marked the youth publications sector in France. The increased interest in youth publications was supported with the emergence of youth libraries and major publishing houses. In parallel, the 'Agence de Cooperation Culturelle et Technique' (currently the International Organization of the Francophonie) was created, and several Arab countries had joined as members. In spite of political turmoil in the Middle East, French schools in Arab countries were still functioning and some even flourishing. This is a testament that French culture was, and still is, a major export to the region. This study focuses on the aesthetic value of the graphic styles that characterize French children’s books from the 1970s, and their personal value to Francophone people who have consumed these artifacts, in the Middle East. The first part of the study looks at the artifact itself: starting from the context of creation and consumption of these books, and continuing to the preservation and remaining collections. The aesthetic value is studied and compared to similar types of visuals of juxtaposed time periods. The second part examines the audience’s response to the visuals in terms of style recognition or identification, along with emotional significance or associations, and the personal value the artifacts might hold to their consumers. The methods of investigation consist of a literature review, a survey of book collections, and a visual questionnaire, supported by personal interviews. As an outcome, visual patterns will be identified: elements from 1970s children’s books reborn in contemporary youth-based publications. Results of the study shall inform us directly on the aesthetic and personal value of illustrated French children’s books in the Middle East, and indirectly on the capacity of youth-targeted design to create a long-term emotional response from its audience.

Keywords: children’s books, French visual culture, graphic style, publication design, revival

Procedia PDF Downloads 145
3077 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: retail stores, faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition

Procedia PDF Downloads 127
3076 Experimental Investigation of Visual Comfort Requirement in Garment Factories and Identify the Cost Saving Opportunities

Authors: M. A. Wijewardane, S. A. N. C. Sudasinghe, H. K. G. Punchihewa, W. K. D. L. Wickramasinghe, S. A. Philip, M. R. S. U. Kumara

Abstract:

Visual comfort is one of the major parameters that can be taken to measure the human comfort in any environment. If the provided illuminance level in a working environment does not meet the workers visual comfort, it will lead to eye-strain, fatigue, headache, stress, accidents and finally, poor productivity. However, improvements in lighting do not necessarily mean that the workplace requires more light. Unnecessarily higher illuminance levels will also cause poor visual comfort and health risks. In addition, more power consumption on lighting will also result in higher energy costs. So, during this study, visual comfort and the illuminance requirement for the workers in textile/apparel industry were studied to perform different tasks (i.e. cutting, sewing and knitting) at their workplace. Experimental studies were designed to identify the optimum illuminance requirement depending upon the varied fabric colour and type and finally, energy saving potentials due to controlled illuminance level depending on the workforce requirement were analysed. Visual performance of workers during the sewing operation was studied using the ‘landolt ring experiment’. It was revealed that around 36.3% of the workers would like to work if the illuminance level varies from 601 lux to 850 lux illuminance level and 45.9% of the workers are not happy to work if the illuminance level reduces less than 600 lux and greater than 850 lux. Moreover, more than 65% of the workers who do not satisfy with the existing illuminance levels of the production floors suggested that they have headache, eye diseases, or both diseases due to poor visual comfort. In addition, findings of the energy analysis revealed that the energy-saving potential of 5%, 10%, 24%, 8% and 16% can be anticipated for fabric colours, red, blue, yellow, black and white respectively, when the 800 lux is the prevailing illuminance level for sewing operation.

Keywords: Landolt Ring experiment, lighting energy consumption, illuminance, textile and apparel industry, visual comfort

Procedia PDF Downloads 185
3075 Secret Sharing in Visual Cryptography Using NVSS and Data Hiding Techniques

Authors: Misha Alexander, S. B. Waykar

Abstract:

Visual Cryptography is a special unbreakable encryption technique that transforms the secret image into random noisy pixels. These shares are transmitted over the network and because of its noisy texture it attracts the hackers. To address this issue a Natural Visual Secret Sharing Scheme (NVSS) was introduced that uses natural shares either in digital or printed form to generate the noisy secret share. This scheme greatly reduces the transmission risk but causes distortion in the retrieved secret image through variation in settings and properties of digital devices used to capture the natural image during encryption / decryption phase. This paper proposes a new NVSS scheme that extracts the secret key from randomly selected unaltered multiple natural images. To further improve the security of the shares data hiding techniques such as Steganography and Alpha channel watermarking are proposed.

Keywords: decryption, encryption, natural visual secret sharing, natural images, noisy share, pixel swapping

Procedia PDF Downloads 388
3074 Communication Design in Newspapers: A Comparative Study of Graphic Resources in Portuguese and Spanish Publications

Authors: Fátima Gonçalves, Joaquim Brigas, Jorge Gonçalves

Abstract:

As a way of managing the increasing volume and complexity of information that circulates in the present time, graphical representations are increasingly used, which add meaning to the information presented in communication media, through an efficient communication design. The visual culture itself, driven by technological evolution, has been redefining the forms of communication, so that contemporary visual communication represents a major impact on society. This article presents the results and respective comparative analysis of four publications in the Iberian press, focusing on the formal aspects of newspapers and the space they dedicate to the various communication elements. Two Portuguese newspapers and two Spanish newspapers were selected for this purpose. The findings indicated that the newspapers show a similarity in the use of graphic solutions, which corroborate a visual trend in communication design. The results also reveal that Spanish newspapers are more meticulous with graphic consistency. This study intended to contribute to improving knowledge of the Iberian generalist press.

Keywords: communication design, graphic resources, Iberian press, visual journalism

Procedia PDF Downloads 232
3073 Evolution of the Environmental Justice Concept

Authors: Zahra Bakhtiari

Abstract:

This article explores the development and evolution of the concept of environmental justice, which has shifted from being dominated by white and middle-class individuals to a civil struggle by marginalized communities against environmental injustices. Environmental justice aims to achieve equity in decision-making and policy-making related to the environment. The concept of justice in this context includes four fundamental aspects: distribution, procedure, recognition, and capabilities. Recent scholars have attempted to broaden the concept of justice to include dimensions of participation, recognition, and capabilities. Focusing on all four dimensions of environmental justice is crucial for effective planning and policy-making to address environmental issues. Ignoring any of these aspects can lead to the failure of efforts and the waste of resources.

Keywords: environmental justice, distribution, procedure, recognition, capabilities

Procedia PDF Downloads 69
3072 Two Concurrent Convolution Neural Networks TC*CNN Model for Face Recognition Using Edge

Authors: T. Alghamdi, G. Alaghband

Abstract:

In this paper we develop a model that couples Two Concurrent Convolution Neural Network with different filters (TC*CNN) for face recognition and compare its performance to an existing sequential CNN (base model). We also test and compare the quality and performance of the models on three datasets with various levels of complexity (easy, moderate, and difficult) and show that for the most complex datasets, edges will produce the most accurate and efficient results. We further show that in such cases while Support Vector Machine (SVM) models are fast, they do not produce accurate results.

Keywords: Convolution Neural Network, Edges, Face Recognition , Support Vector Machine.

Procedia PDF Downloads 129
3071 Pattern Recognition Search: An Advancement Over Interpolation Search

Authors: Shahpar Yilmaz, Yasir Nadeem, Syed A. Mehdi

Abstract:

Searching for a record in a dataset is always a frequent task for any data structure-related application. Hence, a fast and efficient algorithm for the approach has its importance in yielding the quickest results and enhancing the overall productivity of the company. Interpolation search is one such technique used to search through a sorted set of elements. This paper proposes a new algorithm, an advancement over interpolation search for the application of search over a sorted array. Pattern Recognition Search or PR Search (PRS), like interpolation search, is a pattern-based divide and conquer algorithm whose objective is to reduce the sample size in order to quicken the process and it does so by treating the array as a perfect arithmetic progression series and thereby deducing the key element’s position. We look to highlight some of the key drawbacks of interpolation search, which are accounted for in the Pattern Recognition Search.

Keywords: array, complexity, index, sorting, space, time

Procedia PDF Downloads 215
3070 Pattern Recognition Based on Simulation of Chemical Senses (SCS)

Authors: Nermeen El Kashef, Yasser Fouad, Khaled Mahar

Abstract:

No AI-complete system can model the human brain or behavior, without looking at the totality of the whole situation and incorporating a combination of senses. This paper proposes a Pattern Recognition model based on Simulation of Chemical Senses (SCS) for separation and classification of sign language. The model based on human taste controlling strategy. The main idea of the introduced model is motivated by the facts that the tongue cluster input substance into its basic tastes first, and then the brain recognizes its flavor. To implement this strategy, two level architecture is proposed (this is inspired from taste system). The separation-level of the architecture focuses on hand posture cluster, while the classification-level of the architecture to recognizes the sign language. The efficiency of proposed model is demonstrated experimentally by recognizing American Sign Language (ASL) data set. The recognition accuracy obtained for numbers of ASL is 92.9 percent.

Keywords: artificial intelligence, biocybernetics, gustatory system, sign language recognition, taste sense

Procedia PDF Downloads 271
3069 Secure Message Transmission Using Meaningful Shares

Authors: Ajish Sreedharan

Abstract:

Visual cryptography encodes a secret image into shares of random binary patterns. If the shares are exerted onto transparencies, the secret image can be visually decoded by superimposing a qualified subset of transparencies, but no secret information can be obtained from the superposition of a forbidden subset. The binary patterns of the shares, however, have no visual meaning and hinder the objectives of visual cryptography. In the Secret Message Transmission through Meaningful Shares a secret message to be transmitted is converted to grey scale image. Then (2,2) visual cryptographic shares are generated from this converted gray scale image. The shares are encrypted using A Chaos-Based Image Encryption Algorithm Using Wavelet Transform. Two separate color images which are of the same size of the shares, taken as cover image of the respective shares to hide the shares into them. The encrypted shares which are covered by meaningful images so that a potential eavesdropper wont know there is a message to be read. The meaningful shares are transmitted through two different transmission medium. During decoding shares are fetched from received meaningful images and decrypted using A Chaos-Based Image Encryption Algorithm Using Wavelet Transform. The shares are combined to regenerate the grey scale image from where the secret message is obtained.

Keywords: visual cryptography, wavelet transform, meaningful shares, grey scale image

Procedia PDF Downloads 430
3068 Audio-Visual Co-Data Processing Pipeline

Authors: Rita Chattopadhyay, Vivek Anand Thoutam

Abstract:

Speech is the most acceptable means of communication where we can quickly exchange our feelings and thoughts. Quite often, people can communicate orally but cannot interact or work with computers or devices. It’s easy and quick to give speech commands than typing commands to computers. In the same way, it’s easy listening to audio played from a device than extract output from computers or devices. Especially with Robotics being an emerging market with applications in warehouses, the hospitality industry, consumer electronics, assistive technology, etc., speech-based human-machine interaction is emerging as a lucrative feature for robot manufacturers. Considering this factor, the objective of this paper is to design the “Audio-Visual Co-Data Processing Pipeline.” This pipeline is an integrated version of Automatic speech recognition, a Natural language model for text understanding, object detection, and text-to-speech modules. There are many Deep Learning models for each type of the modules mentioned above, but OpenVINO Model Zoo models are used because the OpenVINO toolkit covers both computer vision and non-computer vision workloads across Intel hardware and maximizes performance, and accelerates application development. A speech command is given as input that has information about target objects to be detected and start and end times to extract the required interval from the video. Speech is converted to text using the Automatic speech recognition QuartzNet model. The summary is extracted from text using a natural language model Generative Pre-Trained Transformer-3 (GPT-3). Based on the summary, essential frames from the video are extracted, and the You Only Look Once (YOLO) object detection model detects You Only Look Once (YOLO) objects on these extracted frames. Frame numbers that have target objects (specified objects in the speech command) are saved as text. Finally, this text (frame numbers) is converted to speech using text to speech model and will be played from the device. This project is developed for 80 You Only Look Once (YOLO) labels, and the user can extract frames based on only one or two target labels. This pipeline can be extended for more than two target labels easily by making appropriate changes in the object detection module. This project is developed for four different speech command formats by including sample examples in the prompt used by Generative Pre-Trained Transformer-3 (GPT-3) model. Based on user preference, one can come up with a new speech command format by including some examples of the respective format in the prompt used by the Generative Pre-Trained Transformer-3 (GPT-3) model. This pipeline can be used in many projects like human-machine interface, human-robot interaction, and surveillance through speech commands. All object detection projects can be upgraded using this pipeline so that one can give speech commands and output is played from the device.

Keywords: OpenVINO, automatic speech recognition, natural language processing, object detection, text to speech

Procedia PDF Downloads 57
3067 Subtitled Based-Approach for Learning Foreign Arabic Language

Authors: Elleuch Imen

Abstract:

In this paper, it propose a new approach for learning Arabic as a foreign language via audio-visual translation, particularly subtitling. The approach consists of developing video sequences appropriate to different levels of learning (from A1 to C2) containing conversations, quizzes, games and others. Each video aims to achieve a specific objective, such as the correct pronunciation of Arabic words, the correct syntactic structuring of Arabic sentences, the recognition of the morphological characteristics of terms and the semantic understanding of statements. The subtitled videos obtained can be incorporated into different Arabic second language learning tools such as Moocs, websites, platforms, etc.

Keywords: arabic foreign language, learning, audio-visuel translation, subtitled videos

Procedia PDF Downloads 44
3066 Using Audio-Visual Aids and Computer-Assisted Language Instruction to Overcome Learning Difficulties of Sound System in Students of Special Needs

Authors: Sadeq Al Yaari, Ayman Al Yaari, Adham Al Yaari, Montaha Al Yaari, Aayah Al Yaari, Sajedah Al Yaari

Abstract:

Background & Objectives: Audio-visual aids and computer-assisted language instruction (CALI) effects are strong in teaching language components (sound system, grammatical structures and vocabulary) to students of special needs. To explore the effects of the audio-visual aids and CALI in teaching sound system to this class of students by speech language therapists (SLTs), an experiment has been undertaken to evaluate their performance during their study of the sound system course. Methods: Forty students (males and females) of special needs at al-Malādh school for teaching students of special needs in Dhamar (Yemen) range between 8 and 18 years old underwent this experimental study while they were studying language sound system course. Pre-and-posttests have been administered at the begging and end of the semester. Students' treatment was compared to a similar group (control group) of the same number under the same environment. Whereas the first group was taught using audio-visual aids and CALI, the second was not. Students' performances were linguistically and statistically evaluated. Results & conclusions: Compared with the control group, the treatment group showed significantly higher scores in the posttest (72.32% vs. 31%). Compared with females, males scored higher marks (1421 vs. 1472). Thus, we should take the audio-visual aids and CALI into consideration in teaching sound system to students of special needs.

Keywords: language components, sound system, audio-visual aids, CALI, students, special needs, SLTs

Procedia PDF Downloads 14
3065 Defect Localization and Interaction on Surfaces with Projection Mapping and Gesture Recognition

Authors: Qiang Wang, Hongyang Yu, MingRong Lai, Miao Luo

Abstract:

This paper presents a method for accurately localizing and interacting with known surface defects by overlaying patterns onto real-world surfaces using a projection system. Given the world coordinates of the defects, we project corresponding patterns onto the surfaces, providing an intuitive visualization of the specific defect locations. To enable users to interact with and retrieve more information about individual defects, we implement a gesture recognition system based on a pruned and optimized version of YOLOv6. This lightweight model achieves an accuracy of 82.8% and is suitable for deployment on low-performance devices. Our approach demonstrates the potential for enhancing defect identification, inspection processes, and user interaction in various applications.

Keywords: defect localization, projection mapping, gesture recognition, YOLOv6

Procedia PDF Downloads 61
3064 Comparison of Visual Acuity Outcome and Complication after Phacoemulsification between Diabetic and Non-Diabetic Patients at Burapha University Hospital, Chonburi, Thailand

Authors: Luksanaporn Krungkraipetch

Abstract:

One hundred cataract patients with phacoemulsification were enrolled in the study to compare of visual acuity outcome and complication after phacoemulsification between diabetic and non-diabetic patients at Burapha University Hospital, Chonburi, Thailand. Fifty patients were diabetic (type II) group and 50 patients were non-diabetic group. All cases were operated by one doctor with the same pre-operative care, operation (phacoemulsification), and post-operative care. Visual acuity and complication after surgery were assessed after the operation for two years. There were no significant differences in demographic data between the two groups. The visual outcome values ≥ 2 lines and ≥ 20/40 had no significant differences between two groups after two years of surgery. The complication rate in diabetic group had cystoid macular edema 16%, rupture posterior capsule 8%, posterior capsule opacity 2%, uveitis 2 %, and 2% endophthalmitis. The non-diabetic group had cystoid macular edema 12%, rupture posterior capsule 8%, uveitis 2%, posterior capsule opacity 2%, and 2% wound leak. Comparison of visual acuity outcome and complication after phacoemulsification between diabetic and non-diabetic patients had no statistical significant differences between these two groups. It was found that cystoid macular edema was the most common complication in both groups and 10% of retinopathy progression was seen.

Keywords: cataract, visual acuity, cataract extraction, phacoemulsification, diabetic retinopathy

Procedia PDF Downloads 330
3063 Effect of Dimensional Reinforcement Probability on Discrimination of Visual Compound Stimuli by Pigeons

Authors: O. V. Vyazovska

Abstract:

Behavioral efficiency is one of the main principles to be successful in nature. Accuracy of visual discrimination is determined by the attention, learning experience, and memory. In the experimental condition, pigeons’ responses to visual stimuli presented on the screen of the monitor are behaviorally manifested by pecking or not pecking the stimulus, by the number of pecking, reaction time, etc. The higher the probability of rewarding is, the more likely pigeons will respond to the stimulus. We trained 8 pigeons (Columba livia) on a stagewise go/no-go visual discrimination task.16 visual stimuli were created from all possible combinations of four binary dimensions: brightness (dark/bright), size (large/small), line orientation (vertical/horizontal), and shape (circle/square). In the first stage, we presented S+ and 4 S-stimuli: the first that differed in all 4-dimensional values from S+, the second with brightness dimension sharing with S+, the third sharing brightness and orientation with S+, the fourth sharing brightness, orientation and size. Then all 16 stimuli were added. Pigeons rejected correctly 6-8 of 11 new added S-stimuli at the beginning of the second stage. The results revealed that pigeons’ behavior at the beginning of the second stage was controlled by probabilities of rewarding for 4 dimensions learned in the first stage. More or fewer mistakes with dimension discrimination at the beginning of the second stage depended on the number S- stimuli sharing the dimension with S+ in the first stage. A significant inverse correlation between the number of S- stimuli sharing dimension values with S+ in the first stage and the dimensional learning rate at the beginning of the second stage was found. Pigeons were more confident in discrimination of shape and size dimensions. They made mistakes at the beginning of the second stage, which were not associated with these dimensions. Thus, the received results help elucidate the principles of dimensional stimulus control during learning compound multidimensional visual stimuli.

Keywords: visual go/no go discrimination, selective attention, dimensional stimulus control, pigeon

Procedia PDF Downloads 119
3062 SCNet: A Vehicle Color Classification Network Based on Spatial Cluster Loss and Channel Attention Mechanism

Authors: Fei Gao, Xinyang Dong, Yisu Ge, Shufang Lu, Libo Weng

Abstract:

Vehicle color recognition plays an important role in traffic accident investigation. However, due to the influence of illumination, weather, and noise, vehicle color recognition still faces challenges. In this paper, a vehicle color classification network based on spatial cluster loss and channel attention mechanism (SCNet) is proposed for vehicle color recognition. A channel attention module is applied to extract the features of vehicle color representative regions and reduce the weight of nonrepresentative color regions in the channel. The proposed loss function, called spatial clustering loss (SC-loss), consists of two channel-specific components, such as a concentration component and a diversity component. The concentration component forces all feature channels belonging to the same class to be concentrated through the channel cluster. The diversity components impose additional constraints on the channels through the mean distance coefficient, making them mutually exclusive in spatial dimensions. In the comparison experiments, the proposed method can achieve state-of-the-art performance on the public datasets, VCD, and VeRi, which are 96.1% and 96.2%, respectively. In addition, the ablation experiment further proves that SC-loss can effectively improve the accuracy of vehicle color recognition.

Keywords: feature extraction, convolutional neural networks, intelligent transportation, vehicle color recognition

Procedia PDF Downloads 152
3061 Luxury in Fashion: Visual Analysis on Bag Advertising

Authors: Lama Ajinah

Abstract:

Luxury brands witnessed continuous growth which followed women’s desire towards individual distinctiveness and social glare. Bags are a woman’s best friend either for aesthetic or functional purposes when she leaves her home for leisure or work. One way of women constant aspiration for being distinguished while reflecting their wealth is through handbags. Subsequently, the demand and attraction by consumers towards the dazzle of luxurious brands for personal pleasure and social status have flourished. According to the literature review, a visual analysis on luxury brands has been explored yet a focus on bags was not discussed in details. Hence, a deep analysis will be dedicated on the two segments by showcasing examples of high-end bag advertising. The research is conducted to understand advertising strategies used in promoting for luxurious products. Furthermore, the paper explores the definition of the term luxury, the condition in which it is used in, and the visual language used along with the term. As luxury is an indicator of superior satisfaction, it is obtained on two levels: a personal and a social level. The examples of luxury brand ads are selected from the last five years to uncover the latest, most common strategies used to promote for luxurious brands. The methods employed in this paper consist of literature review, semiotic analysis, and content analysis. The researcher concludes with revealing the methods used in advertising while categorizing them into various themes.

Keywords: advertising, brands, fashion, graphic design, luxury, semiotic analysis, semiology, visual analysis, visual communication

Procedia PDF Downloads 224
3060 Analyzing the Use of Augmented Reality and Image Recognition in Cultural Education: Use Case of Sintra Palace Treasure Hunt Application

Authors: Marek Maruszczak

Abstract:

Gamified applications have been used successfully in education for years. The rapid development of technologies such as augmented reality and image recognition increases their availability and reduces their prices. Thus, there is an increasing possibility and need for a wide use of such applications in education. The main purpose of this article is to present the effects of work on a mobile application with augmented reality, the aim of which is to motivate tourists to pay more attention to the attractions and increase the likelihood of moving from one attraction to the next while visiting the Palácio Nacional de Sintra in Portugal. Work on the application was carried out together with the employees of Parques de Sintra from 2019 to 2021. Their effect was the preparation of a mobile application using augmented reality and image recognition. The application was tested on the palace premises by both Parques de Sintra employees and tourists visiting Palácio Nacional de Sintra. The collected conclusions allowed for the formulation of good practices and guidelines that can be used when designing gamified apps for the purpose of cultural education.

Keywords: augmented reality, cultural education, gamification, image recognition, mobile games

Procedia PDF Downloads 173