Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 3373

Search results for: visual recognition

2893 Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition

Authors: A. Shoiynbek, K. Kozhakhmet, P. Menezes, D. Kuanyshbay, D. Bayazitov

Abstract:

Speech emotion recognition has received increasing research interest all through current years. There was used emotional speech that was collected under controlled conditions in most research work. Actors imitating and artificially producing emotions in front of a microphone noted those records. There are four issues related to that approach, namely, (1) emotions are not natural, and it means that machines are learning to recognize fake emotions. (2) Emotions are very limited by quantity and poor in their variety of speaking. (3) There is language dependency on SER. (4) Consequently, each time when researchers want to start work with SER, they need to find a good emotional database on their language. In this paper, we propose the approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describe the sequence of actions of the proposed approach. One of the first objectives of the sequence of actions is a speech detection issue. The paper gives a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian languages. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To illustrate the working capacity of the developed model, we have performed an analysis of speech detection and extraction from real tasks.

Keywords: deep neural networks, speech detection, speech emotion recognition, Mel-frequency cepstrum coefficients, collecting speech emotion corpus, collecting speech emotion dataset, Kazakh speech dataset

Procedia PDF Downloads 82

2892 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 320

2891 Holographic Art as an Approach to Enhance Visual Communication in Egyptian Community: Experimental Study

Authors: Diaa Ahmed Mohamed Ahmedien

Abstract:

Nowadays, it cannot be denied that the most important interactive arts trends have appeared as a result of significant scientific mutations in the modern sciences, and holographic art is not an exception, where it is considered as a one of the most important major contemporary interactive arts trends in visual arts. Holographic technique had been evoked through the modern physics application in late 1940s, for the improvement of the quality of electron microscope images by Denis Gabor, until it had arrived to Margaret Benyon’s art exhibitions, and then it passed through a lot of procedures to enhance its quality and artistic applications technically and visually more over 70 years in visual arts. As a modest extension to these great efforts, this research aimed to invoke extraordinary attempt to enroll sample of normal people in Egyptian community in holographic recording program to record their appreciated objects or antiques, therefore examine their abilities to interact with modern techniques in visual communication arts. So this research tried to answer to main three questions: 'can we use the analog holographic techniques to unleash new theoretical and practical knowledge in interactive arts for public in Egyptian community?', 'to what extent holographic art can be familiar with public and make them able to produce interactive artistic samples?', 'are there possibilities to build holographic interactive program for normal people which lead them to enhance their understanding to visual communication in public and, be aware of interactive arts trends?' This research was depending in its first part on experimental methods, where it conducted in Laser lab at Cairo University, using Nd: Yag Laser 532 nm, and holographic optical layout, with selected samples of Egyptian people that they have been asked to record their appreciated object, after they had already learned recording methods, and in its second part on a lot of discussion panel had conducted to discuss the result and how participants felt towards their holographic artistic products through survey, questionnaires, take notes and critiquing holographic artworks. Our practical experiments and final discussions have already lead us to say that this experimental research was able to make most of participants pass through paradigm shift in their visual and conceptual experiences towards more interaction with contemporary visual arts trends, as an attempt to emphasize to the role of mature relationship between the art, science and technology, to spread interactive arts out in our community through the latest scientific and artistic mutations around the world and the role of this relationship in our societies particularly with those who have never been enrolled in practical arts programs before.

Keywords: Egyptian community, holographic art, laser art, visual art

Procedia PDF Downloads 465

2890 Selective Effect of Occipital Alpha Transcranial Alternating Current Stimulation in Perception and Working Memory

Authors: Andreina Giustiniani, Massimiliano Oliveri

Abstract:

Rhythmic activity in different frequencies could subserve distinct functional roles during visual perception and visual mental imagery. In particular, alpha band activity is thought to play a role in active inhibition of both task-irrelevant regions and processing of non-relevant information. In the present blind placebo-controlled study we applied alpha transcranial alternating current stimulation (tACS) in the occipital cortex both during a basic visual perception and a visual working memory task. To understand if the role of alpha is more related to a general inhibition of distractors or to an inhibition of task-irrelevant regions, we added a non visual distraction to both the tasks.Sixteen adult volunteers performed both a simple perception and a working memory task during 10 Hz tACS. The electrodes were placed over the left and right occipital cortex, the current intensity was 1 mA peak-to-baseline. Sham stimulation was chosen as control condition and in order to elicit the skin sensation similar to the real stimulation, electrical stimulation was applied for short periods (30 s) at the beginning of the session and then turned off. The tasks were split in two sets, in one set distracters were included and in the other set, there were no distracters. Motor interference was added by changing the answer key after subjects completed the first set of trials.The results show that alpha tACS improves working memory only when no motor distracters are added, suggesting a role of alpha tACS in inhibiting non-relevant regions rather than in a general inhibition of distractors. Additionally, we found that alpha tACS does not affect accuracy and hit rates during the visual perception task. These results suggest that alpha activity in the occipital cortex plays a different role in perception and working memory and it could optimize performance in tasks in which attention is internally directed, as in this working memory paradigm, but only when there is not motor distraction. Moreover, alpha tACS improves working memory performance by means of inhibition of task-irrelevant regions while it does not affect perception.

Keywords: alpha activity, interference, perception, working memory

Procedia PDF Downloads 233

2889 Investigation on the Changes in the Chemical Composition and Ecological State of Soils Contaminated with Heavy Metals

Authors: Metodi Mladenov

Abstract:

Heavy metals contamination of soils is a big problem mainly as a result of industrial production. From this point of view, this is of interests the processes for decontamination of soils for crop of production with low content of heavy metals and suitable for consumption from the animals and the peoples. In the current article, there are presented data for established changes in chemical composition and ecological state on soils contaminated from non-ferrous metallurgy manufacturing, for seven years time period. There was done investigation on alteration of pH, conductivity and contain of the next elements: As, Cd, Cu, Cr, Ni, Pb, Zn, Co, Mn and Al. Also, there was done visual observations under the processes of recovery of root-inhabitable soil layer and reforestation. Obtained data show friendly changes for the investigated indicators pH and conductivity and decreasing of content of some form analyzed elements. Visual observations show augmentation of plant cover areas and change in species structure with increase of number of shrubby and wood specimens.

Keywords: conductivity, contamination of soils, chemical composition, inductively coupled plasma–optical emission spectrometry, heavy metals, visual observation

Procedia PDF Downloads 153

2888 The Effects of Adding Vibrotactile Feedback to Upper Limb Performance during Dual-Tasking and Response to Misleading Visual Feedback

Authors: Sigal Portnoy, Jason Friedman, Eitan Raveh

Abstract:

Introduction: Sensory substitution is possible due to the capacity of our brain to adapt to information transmitted by a synthetic receptor via an alternative sensory system. Practical sensory substitution systems are being developed in order to increase the functionality of individuals with sensory loss, e.g. amputees. For upper limb prosthetic-users the loss of tactile feedback compels them to allocate visual attention to their prosthesis. The effect of adding vibrotactile feedback (VTF) to the applied force has been studied, however its effect on the allocation if visual attention during dual-tasking and the response during misleading visual feedback have not been studied. We hypothesized that VTF will improve the performance and reduce visual attention during dual-task assignments in healthy individuals using a robotic hand and improve the performance in a standardized functional test, despite the presence of misleading visual feedback. Methods: For the dual-task paradigm, twenty healthy subjects were instructed to toggle two keyboard arrow keys with the left hand to retain a moving virtual car on a road on a screen. During the game, instructions for various activities, e.g. mix the sugar in the glass with a spoon, appeared on the screen. The subject performed these tasks with a robotic hand, attached to the right hand. The robotic hand was controlled by the activity of the flexors and extensors of the right wrist, recorded using surface EMG electrodes. Pressure sensors were attached at the tips of the robotic hand and induced VTF using vibrotactile actuators attached to the right arm of the subject. An eye-tracking system tracked to visual attention of the subject during the trials. The trials were repeated twice, with and without the VTF. Additionally, the subjects performed the modified box and blocks, hidden from eyesight, in a motion laboratory. A virtual presentation of a misleading visual feedback was be presented on a screen so that twice during the trial, the virtual block fell while the physical block was still held by the subject. Results: This is an ongoing study, which current results are detailed below. We are continuing these trials with transradial myoelectric prosthesis-users. In the healthy group, the VTF did not reduce the visual attention or improve performance during dual-tasking for the tasks that were typed transfer-to-target, e.g. place the eraser on the shelf. An improvement was observed for other tasks. For example, the average±standard deviation of time to complete the sugar-mixing task was 13.7±17.2s and 19.3±9.1s with and without the VTF, respectively. Also, the number of gaze shifts from the screen to the hand during this task were 15.5±23.7 and 20.0±11.6, with and without the VTF, respectively. The response of the subjects to the misleading visual feedback did not differ between the two conditions, i.e. with and without VTF. Conclusions: Our interim results suggest that the performance of certain activities of daily living may be improved by VTF. The substitution of visual sensory input by tactile feedback might require a long training period so that brain plasticity can occur and allow adaptation to the new condition.

Keywords: prosthetics, rehabilitation, sensory substitution, upper limb amputation

Procedia PDF Downloads 322

2887 Fight the Burnout: Phase Two of a NICU Nurse Wellness Bundle

Authors: Megan Weisbart

Abstract:

Background/Significance: The Intensive Care Unit (ICU) environment contributes to nurse burnout. Burnout costs include decreased employee compassion, missed workdays, worse patient outcomes, diminished job performance, high turnover, and higher organizational cost. Meaningful recognition, nurturing of interpersonal connections, and mindfulness-based interventions are associated with decreased burnout. The purpose of this quality improvement project was to decrease Neonatal ICU (NICU) nurse burnout using a Wellness Bundle that fosters meaningful recognition, interpersonal connections and includes mindfulness-based interventions. Methods: The Professional Quality of Life Scale Version 5 (ProQOL5) was used to measure burnout before Wellness Bundle implementation, after six months, and will be given yearly for three years. Meaningful recognition bundle items include Online submission and posting of staff shoutouts, recognition events, Nurses Week and Unit Practice Council member gifts, and an employee recognition program. Fostering of interpersonal connections bundle items include: Monthly staff games with prizes, social events, raffle fundraisers, unit blog, unit wellness basket, and a wellness resource sheet. Quick coherence techniques were implemented at staff meetings and huddles as a mindfulness-based intervention. Findings: The mean baseline burnout score of 14 NICU nurses was 20.71 (low burnout). The baseline range was 13-28, with 11 nurses experiencing low burnout, three nurses experiencing moderate burnout, and zero nurses experiencing high burnout. After six months of the Wellness Bundle Implementation, the mean burnout score of 39 NICU nurses was 22.28 (low burnout). The range was 14-31, with 22 nurses experiencing low burnout, 17 nurses experiencing moderate burnout, and zero nurses experiencing high burnout. Conclusion: A NICU Wellness Bundle that incorporated meaningful recognition, fostering of interpersonal connections, and mindfulness-based activities was implemented to improve work environments and decrease nurse burnout. Participation bias and low baseline response rate may have affected the reliability of the data and necessitate another comparative measure of burnout in one year.

Keywords: burnout, NICU, nurse, wellness

Procedia PDF Downloads 67

2886 Effect of Monotonically Decreasing Parameters on Margin Softmax for Deep Face Recognition

Authors: Umair Rashid

Abstract:

Normally softmax loss is used as the supervision signal in face recognition (FR) system, and it boosts the separability of features. In the last two years, a number of techniques have been proposed by reformulating the original softmax loss to enhance the discriminating power of Deep Convolutional Neural Networks (DCNNs) for FR system. To learn angularly discriminative features Cosine-Margin based softmax has been adjusted as monotonically decreasing angular function, that is the main challenge for angular based softmax. On that issue, we propose monotonically decreasing element for Cosine-Margin based softmax and also, we discussed the effect of different monotonically decreasing parameters on angular Margin softmax for FR system. We train the model on publicly available dataset CASIA- WebFace via our proposed monotonically decreasing parameters for cosine function and the tests on YouTube Faces (YTF, Labeled Face in the Wild (LFW), VGGFace1 and VGGFace2 attain the state-of-the-art performance.

Keywords: deep convolutional neural networks, cosine margin face recognition, softmax loss, monotonically decreasing parameter

Procedia PDF Downloads 80

2885 Image Processing of Scanning Electron Microscope Micrograph of Ferrite and Pearlite Steel for Recognition of Micro-Constituents

Authors: Subir Gupta, Subhas Ganguly

Abstract:

In this paper, we demonstrate the new area of application of image processing in metallurgical images to develop the more opportunity for structure-property correlation based approaches of alloy design. The present exercise focuses on the development of image processing tools suitable for phrase segmentation, grain boundary detection and recognition of micro-constituents in SEM micrographs of ferrite and pearlite steels. A comprehensive data of micrographs have been experimentally developed encompassing the variation of ferrite and pearlite volume fractions and taking images at different magnification (500X, 1000X, 15000X, 2000X, 3000X and 5000X) under scanning electron microscope. The variation in the volume fraction has been achieved using four different plain carbon steel containing 0.1, 0.22, 0.35 and 0.48 wt% C heat treated under annealing and normalizing treatments. The obtained data pool of micrographs arbitrarily divided into two parts to developing training and testing sets of micrographs. The statistical recognition features for ferrite and pearlite constituents have been developed by learning from training set of micrographs. The obtained features for microstructure pattern recognition are applied to test set of micrographs. The analysis of the result shows that the developed strategy can successfully detect the micro constitutes across the wide range of magnification and variation of volume fractions of the constituents in the structure with an accuracy of about +/- 5%.

Keywords: SEM micrograph, metallurgical image processing, ferrite pearlite steel, microstructure

Procedia PDF Downloads 185

2884 Ionophore-Based Materials for Selective Optical Sensing of Iron(III)

Authors: Natalia Lukasik, Ewa Wagner-Wysiecka

Abstract:

Development of selective, fast-responsive, and economical sensors for diverse ions detection and determination is one of the most extensively studied areas due to its importance in the field of clinical, environmental and industrial analysis. Among chemical sensors, vast popularity has gained ionophore-based optical sensors, where the generated analytical signal is a consequence of the molecular recognition of ion by the ionophore. Change of color occurring during host-guest interactions allows for quantitative analysis and for 'naked-eye' detection without the need of using sophisticated equipment. An example of application of such sensors is colorimetric detection of iron(III) cations. Iron as one of the most significant trace elements plays roles in many biochemical processes. For these reasons, the development of reliable, fast, and selective methods of iron ions determination is highly demanded. Taking all mentioned above into account a chromogenic amide derivative of 3,4-dihydroxybenzoic acid was synthesized, and its ability to iron(III) recognition was tested. To the best of authors knowledge (according to chemical abstracts) the obtained ligand has not been described in the literature so far. The catechol moiety was introduced to the ligand structure in order to mimic the action of naturally occurring siderophores-iron(III)-selective receptors. The ligand–ion interactions were studied using spectroscopic methods: UV-Vis spectrophotometry and infrared spectroscopy. The spectrophotometric measurements revealed that the amide exhibits affinity to iron(III) in dimethyl sulfoxide and fully aqueous solution, what is manifested by the change of color from yellow to green. Incorporation of the tested amide into a polymeric matrix (cellulose triacetate) ensured effective recognition of iron(III) at pH 3 with the detection limit 1.58×10⁻⁵ M. For the obtained sensor material parameters like linear response range, response time, selectivity, and possibility of regeneration were determined. In order to evaluate the effect of the size of the sensing material on iron(III) detection nanospheres (in the form of nanoemulsion) containing the tested amide were also prepared. According to DLS (dynamic light scattering) measurements, the size of the nanospheres is 308.02 ± 0.67 nm. Work parameters of the nanospheres were determined and compared with cellulose triacetate-based material. Additionally, for fast, qualitative experiments the test strips were prepared by adsorption of the amide solution on a glass microfiber material. Visual limit of detection of iron(III) at pH 3 by the test strips was estimated at the level 10⁻⁴ M. In conclusion, reported here amide derived from 3,4- dihydroxybenzoic acid proved to be an effective candidate for optical sensing of iron(III) in fully aqueous solutions. N. L. kindly acknowledges financial support from National Science Centre Poland the grant no. 2017/01/X/ST4/01680. Authors thank for financial support from Gdansk University of Technology grant no. 032406.

Keywords: ion-selective optode, iron(III) recognition, nanospheres, optical sensor

Procedia PDF Downloads 140

2883 Aspects of the Promotional Language of Tourism in Social Media. A Case Study of Romanian Accommodation Industry

Authors: Sanda-Maria Ardeleanu, Ana Crăciunescu

Abstract:

This paper is sustained by our previous research on discursive strategies, whichdemonstrated that tourismhas developed and employed apromotional languageper se. We have studied this concept within the framework of audio-visual advertising by analyzing its discursive structures at the level of three main strategies (textual, visual, and both textual and visual) and confirmed the applicability of the promotional language per se within the field. Tourism, at large, represents a largely potential interdisplinary field, which allowed us to use qualitative methods of research such as Discourse Analysis (DA). Due to further research which showed that in the third phase of qualitative research methodologies, scholars in tourism recognized semiotics and DA as potential paths to follow, but which were insufficiently explored at the time, we soon realized that the natural next step to take is to bring together common qualitative methodologies for both fields, such as the method of observation, the triangulation, Discourse Analysis, etc. Therefore and in the light of fast transformations of the medium that intermediates the message, in this paper, we are going to focus on the manifestations of the promotional language in social media texts, which advertise for the urban industry of accommodation in Romania. We shall constitute a corpus of study as the basis for our research methodology and, through the empirical method of observation and DA, we propose to recognize or discover new patterns developed at textual (mainly) and visual level or the mix of the two, known as strategies of the promotional language of tourism.

Keywords: discourse analysis, promotional language of tourism, social media, urban accommodation industry, tourism

Procedia PDF Downloads 145

2882 Using Speech Emotion Recognition as a Longitudinal Biomarker for Alzheimer’s Diseases

Authors: Yishu Gong, Liangliang Yang, Jianyu Zhang, Zhengyu Chen, Sihong He, Xusheng Zhang, Wei Zhang

Abstract:

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that affects millions of people worldwide and is characterized by cognitive decline and behavioral changes. People living with Alzheimer’s disease often find it hard to complete routine tasks. However, there are limited objective assessments that aim to quantify the difficulty of certain tasks for AD patients compared to non-AD people. In this study, we propose to use speech emotion recognition (SER), especially the frustration level, as a potential biomarker for quantifying the difficulty patients experience when describing a picture. We build an SER model using data from the IEMOCAP dataset and apply the model to the DementiaBank data to detect the AD/non-AD group difference and perform longitudinal analysis to track the AD disease progression. Our results show that the frustration level detected from the SER model can possibly be used as a cost-effective tool for objective tracking of AD progression in addition to the Mini-Mental State Examination (MMSE) score.

Keywords: Alzheimer’s disease, speech emotion recognition, longitudinal biomarker, machine learning

Procedia PDF Downloads 94

2881 A Multimodal Measurement Approach Using Narratives and Eye Tracking to Investigate Visual Behaviour in Perceiving Naturalistic and Urban Environments

Authors: Khizar Z. Choudhrya, Richard Coles, Salman Qureshi, Robert Ashford, Salim Khan, Rabia R. Mir

Abstract:

Abstract: The majority of existing landscape research has been derived by conducting heuristic evaluations, without having empirical insight of real participant visual response. In this research, a modern multimodal measurement approach (using narratives and eye tracking) was applied to investigate visual behaviour in perceiving naturalistic and urban environments. This research is unique in exploring gaze behaviour on environmental images possessing different levels of saliency. Eye behaviour is predominantly attracted by salient locations. The concept of methodology of this research on naturalistic and urban environments is drawn from the approaches in market research. Borrowing methodologies from market research that examine visual responses and qualities provided a critical and hitherto unexplored approach. This research has been conducted by using mixed methodological quantitative and qualitative approaches. On the whole, the results of this research corroborated existing landscape research findings, but they also identified potential refinements. The research contributes both methodologically and empirically to human-environment interaction (HEI). This study focused on initial impressions of environmental images with the help of eye tracking. Taking under consideration the importance of the image, this study explored the factors that influence initial fixations in relation to expectations and preferences. In terms of key findings of this research it is noticed that each participant has his own unique navigation style while surfing through different elements of landscape images. This individual navigation style is given the name of ‘visual signature’. This study adds the necessary clarity that would complete the picture and bring an insight for future landscape researchers.

Keywords: human-environment interaction (HEI), multimodal measurement, narratives, eye tracking

Procedia PDF Downloads 322

2880 A Neural Network Classifier for Identifying Duplicate Image Entries in Real-Estate Databases

Authors: Sergey Ermolin, Olga Ermolin

Abstract:

A Deep Convolution Neural Network with Triplet Loss is used to identify duplicate images in real-estate advertisements in the presence of image artifacts such as watermarking, cropping, hue/brightness adjustment, and others. The effects of batch normalization, spatial dropout, and various convergence methodologies on the resulting detection accuracy are discussed. For comparative Return-on-Investment study (per industry request), end-2-end performance is benchmarked on both Nvidia Titan GPUs and Intel’s Xeon CPUs. A new real-estate dataset from San Francisco Bay Area is used for this work. Sufficient duplicate detection accuracy is achieved to supplement other database-grounded methods of duplicate removal. The implemented method is used in a Proof-of-Concept project in the real-estate industry.

Keywords: visual recognition, convolutional neural networks, triplet loss, spatial batch normalization with dropout, duplicate removal, advertisement technologies, performance benchmarking

Procedia PDF Downloads 320

2879 Web Page Design Optimisation Based on Segment Analytics

Authors: Varsha V. Rohini, P. R. Shreya, B. Renukadevi

Abstract:

In the web analytics the information delivery and the web usage is optimized and the analysis of data is done. The analytics is the measurement, collection and analysis of webpage data. Page statistics and user metrics are the important factor in most of the web analytics tool. This is the limitation of the existing tools. It does not provide design inputs for the optimization of information. This paper aims at providing an extension for the scope of web analytics to provide analysis and statistics of each segment of a webpage. The number of click count is calculated and the concentration of links in a web page is obtained. Its user metrics are used to help in proper design of the displayed content in a webpage by Vision Based Page Segmentation (VIPS) algorithm. When the algorithm is applied on the web page it divides the entire web page into the visual block tree. The visual block tree generated will further divide the web page into visual blocks or segments which help us to understand the usage of each segment in a page and its content. The dynamic web pages and deep web pages are used to extend the scope of web page segment analytics. Space optimization concept is used with the help of the output obtained from the Vision Based Page Segmentation (VIPS) algorithm. This technique provides us the visibility of the user interaction with the WebPages and helps us to place the important links in the appropriate segments of the webpage and effectively manage space in a page and the concentration of links.

Keywords: analytics, design optimization, visual block trees, vision based technology

Procedia PDF Downloads 252

2878 English Learning Speech Assistant Speak Application in Artificial Intelligence

Authors: Albatool Al Abdulwahid, Bayan Shakally, Mariam Mohamed, Wed Almokri

Abstract:

Artificial intelligence has infiltrated every part of our life and every field we can think of. With technical developments, artificial intelligence applications are becoming more prevalent. We chose ELSA speak because it is a magnificent example of Artificial intelligent applications, ELSA speak is a smartphone application that is free to download on both IOS and Android smartphones. ELSA speak utilizes artificial intelligence to help non-native English speakers pronounce words and phrases similar to a native speaker, as well as enhance their English skills. It employs speech-recognition technology that aids the application to excel the pronunciation of its users. This remarkable feature distinguishes ELSA from other voice recognition algorithms and increase the efficiency of the application. This study focused on evaluating ELSA speak application, by testing the degree of effectiveness based on survey questions. The results of the questionnaire were variable. The generality of the participants strongly agreed that ELSA has helped them enhance their pronunciation skills. However, a few participants were unconfident about the application’s ability to assist them in their learning journey.

Keywords: ELSA speak application, artificial intelligence, speech-recognition technology, language learning, english pronunciation

Procedia PDF Downloads 84

2877 A Method of the Semantic on Image Auto-Annotation

Authors: Lin Huo, Xianwei Liu, Jingxiong Zhou

Abstract:

Recently, due to the existence of semantic gap between image visual features and human concepts, the semantic of image auto-annotation has become an important topic. Firstly, by extract low-level visual features of the image, and the corresponding Hash method, mapping the feature into the corresponding Hash coding, eventually, transformed that into a group of binary string and store it, image auto-annotation by search is a popular method, we can use it to design and implement a method of image semantic auto-annotation. Finally, Through the test based on the Corel image set, and the results show that, this method is effective.

Keywords: image auto-annotation, color correlograms, Hash code, image retrieval

Procedia PDF Downloads 472

2876 Progress in Combining Image Captioning and Visual Question Answering Tasks

Authors: Prathiksha Kamath, Pratibha Jamkhandi, Prateek Ghanti, Priyanshu Gupta, M. Lakshmi Neelima

Abstract:

Combining Image Captioning and Visual Question Answering (VQA) tasks have emerged as a new and exciting research area. The image captioning task involves generating a textual description that summarizes the content of the image. VQA aims to answer a natural language question about the image. Both these tasks include computer vision and natural language processing (NLP) and require a deep understanding of the content of the image and semantic relationship within the image and the ability to generate a response in natural language. There has been remarkable growth in both these tasks with rapid advancement in deep learning. In this paper, we present a comprehensive review of recent progress in combining image captioning and visual question-answering (VQA) tasks. We first discuss both image captioning and VQA tasks individually and then the various ways in which both these tasks can be integrated. We also analyze the challenges associated with these tasks and ways to overcome them. We finally discuss the various datasets and evaluation metrics used in these tasks. This paper concludes with the need for generating captions based on the context and captions that are able to answer the most likely asked questions about the image so as to aid the VQA task. Overall, this review highlights the significant progress made in combining image captioning and VQA, as well as the ongoing challenges and opportunities for further research in this exciting and rapidly evolving field, which has the potential to improve the performance of real-world applications such as autonomous vehicles, robotics, and image search.

Keywords: image captioning, visual question answering, deep learning, natural language processing

Procedia PDF Downloads 58

2875 A Neuron Model of Facial Recognition and Detection of an Authorized Entity Using Machine Learning System

Authors: J. K. Adedeji, M. O. Oyekanmi

Abstract:

This paper has critically examined the use of Machine Learning procedures in curbing unauthorized access into valuable areas of an organization. The use of passwords, pin codes, user’s identification in recent times has been partially successful in curbing crimes involving identities, hence the need for the design of a system which incorporates biometric characteristics such as DNA and pattern recognition of variations in facial expressions. The facial model used is the OpenCV library which is based on the use of certain physiological features, the Raspberry Pi 3 module is used to compile the OpenCV library, which extracts and stores the detected faces into the datasets directory through the use of camera. The model is trained with 50 epoch run in the database and recognized by the Local Binary Pattern Histogram (LBPH) recognizer contained in the OpenCV. The training algorithm used by the neural network is back propagation coded using python algorithmic language with 200 epoch runs to identify specific resemblance in the exclusive OR (XOR) output neurons. The research however confirmed that physiological parameters are better effective measures to curb crimes relating to identities.

Keywords: biometric characters, facial recognition, neural network, OpenCV

Procedia PDF Downloads 236

2874 An Alternative Concept of Green Screen Keying

Authors: Jin Zhi

Abstract:

This study focuses on a green screen keying method developed especially for film visual effects. There are a series of ways of using existing tools for creating mattes from green or blue screen plates. However, it is still a time-consuming process, and the results vary especially when it comes to retaining tiny details, such as hair and fur. This paper introduces an alternative concept and method for retaining edge details of characters on a green screen plate, also, a number of connected mathematical equations are explored. At the end of this study, a simplified process of applying this method in real productions is also introduced.

Keywords: green screen, visual effects, compositing, matte

Procedia PDF Downloads 378

2873 Recognition and Counting Algorithm for Sub-Regional Objects in a Handwritten Image through Image Sets

Authors: Kothuri Sriraman, Mattupalli Komal Teja

Abstract:

In this paper, a novel algorithm is proposed for the recognition of hulls in a hand written images that might be irregular or digit or character shape. Identification of objects and internal objects is quite difficult to extract, when the structure of the image is having bulk of clusters. The estimation results are easily obtained while going through identifying the sub-regional objects by using the SASK algorithm. Focusing mainly to recognize the number of internal objects exist in a given image, so as it is shadow-free and error-free. The hard clustering and density clustering process of obtained image rough set is used to recognize the differentiated internal objects, if any. In order to find out the internal hull regions it involves three steps pre-processing, Boundary Extraction and finally, apply the Hull Detection system. By detecting the sub-regional hulls it can increase the machine learning capability in detection of characters and it can also be extend in order to get the hull recognition even in irregular shape objects like wise black holes in the space exploration with their intensities. Layered hulls are those having the structured layers inside while it is useful in the Military Services and Traffic to identify the number of vehicles or persons. This proposed SASK algorithm is helpful in making of that kind of identifying the regions and can useful in undergo for the decision process (to clear the traffic, to identify the number of persons in the opponent’s in the war).

Keywords: chain code, Hull regions, Hough transform, Hull recognition, Layered Outline Extraction, SASK algorithm

Procedia PDF Downloads 323

2872 Object Detection Based on Plane Segmentation and Features Matching for a Service Robot

Authors: António J. R. Neves, Rui Garcia, Paulo Dias, Alina Trifan

Abstract:

With the aging of the world population and the continuous growth in technology, service robots are more and more explored nowadays as alternatives to healthcare givers or personal assistants for the elderly or disabled people. Any service robot should be capable of interacting with the human companion, receive commands, navigate through the environment, either known or unknown, and recognize objects. This paper proposes an approach for object recognition based on the use of depth information and color images for a service robot. We present a study on two of the most used methods for object detection, where 3D data is used to detect the position of objects to classify that are found on horizontal surfaces. Since most of the objects of interest accessible for service robots are on these surfaces, the proposed 3D segmentation reduces the processing time and simplifies the scene for object recognition. The first approach for object recognition is based on color histograms, while the second is based on the use of the SIFT and SURF feature descriptors. We present comparative experimental results obtained with a real service robot.

Keywords: object detection, feature, descriptors, SIFT, SURF, depth images, service robots

Procedia PDF Downloads 522

2871 Visual Working Memory, Reading Abilities, and Vocabulary in Mexican Deaf Signers

Authors: A. Mondaca, E. Mendoza, D. Jackson-Maldonado, A. García-Obregón

Abstract:

Deaf signers usually show lower scores in Auditory Working Memory (AWM) tasks and higher scores in Visual Working Memory (VWM) tasks than their hearing pairs. Further, Working Memory has been correlated with reading abilities and vocabulary in Deaf and Hearing individuals. The aim of the present study is to compare the performance of Mexican Deaf signers and hearing adults in VWM, reading and Vocabulary tasks and observe if the latter are correlated to the former. 15 Mexican Deaf signers were assessed using the Corsi block test for VWM, four different subtests of PROLEC (Batería de Evaluación de los Procesos Lectores) for reading abilities, and the LexTale in its Spanish version for vocabulary. T-tests show significant differences between groups for VWM and Vocabulary but not for all the PROLEC subtests. A significant Pearson correlation was found between VWM and Vocabulary but not between VWM and reading abilities. This work is part of a larger research study and results are not yet conclusive. A discussion about the use of PROLEC as a tool to explore reading abilities in a Deaf population is included.

Keywords: deaf signers, visual working memory, reading, Mexican sign language

Procedia PDF Downloads 147

2870 Text Emotion Recognition by Multi-Head Attention based Bidirectional LSTM Utilizing Multi-Level Classification

Authors: Vishwanath Pethri Kamath, Jayantha Gowda Sarapanahalli, Vishal Mishra, Siddhesh Balwant Bandgar

Abstract:

Recognition of emotional information is essential in any form of communication. Growing HCI (Human-Computer Interaction) in recent times indicates the importance of understanding of emotions expressed and becomes crucial for improving the system or the interaction itself. In this research work, textual data for emotion recognition is used. The text being the least expressive amongst the multimodal resources poses various challenges such as contextual information and also sequential nature of the language construction. In this research work, the proposal is made for a neural architecture to resolve not less than 8 emotions from textual data sources derived from multiple datasets using google pre-trained word2vec word embeddings and a Multi-head attention-based bidirectional LSTM model with a one-vs-all Multi-Level Classification. The emotions targeted in this research are Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame, and Surprise. Textual data from multiple datasets were used for this research work such as ISEAR, Go Emotions, Affect datasets for creating the emotions’ dataset. Data samples overlap or conflicts were considered with careful preprocessing. Our results show a significant improvement with the modeling architecture and as good as 10 points improvement in recognizing some emotions.

Keywords: text emotion recognition, bidirectional LSTM, multi-head attention, multi-level classification, google word2vec word embeddings

Procedia PDF Downloads 160

2869 An Accurate Computation of 2D Zernike Moments via Fast Fourier Transform

Authors: Mohammed S. Al-Rawi, J. Bastos, J. Rodriguez

Abstract:

Object detection and object recognition are essential components of every computer vision system. Despite the high computational complexity and other problems related to numerical stability and accuracy, Zernike moments of 2D images (ZMs) have shown resilience when used in object recognition and have been used in various image analysis applications. In this work, we propose a novel method for computing ZMs via Fast Fourier Transform (FFT). Notably, this is the first algorithm that can generate ZMs up to extremely high orders accurately, e.g., it can be used to generate ZMs for orders up to 1000 or even higher. Furthermore, the proposed method is also simpler and faster than the other methods due to the availability of FFT software and/or hardware. The accuracies and numerical stability of ZMs computed via FFT have been confirmed using the orthogonality property. We also introduce normalizing ZMs with Neumann factor when the image is embedded in a larger grid, and color image reconstruction based on RGB normalization of the reconstructed images. Astonishingly, higher-order image reconstruction experiments show that the proposed methods are superior, both quantitatively and subjectively, compared to the q-recursive method.

Keywords: Chebyshev polynomial, fourier transform, fast algorithms, image recognition, pseudo Zernike moments, Zernike moments

Procedia PDF Downloads 247

2868 Individualized Emotion Recognition Through Dual-Representations and Ground-Established Ground Truth

Authors: Valentina Zhang

Abstract:

While facial expression is a complex and individualized behavior, all facial emotion recognition (FER) systems known to us rely on a single facial representation and are trained on universal data. We conjecture that: (i) different facial representations can provide different, sometimes complementing views of emotions; (ii) when employed collectively in a discussion group setting, they enable more accurate emotion reading which is highly desirable in autism care and other applications context sensitive to errors. In this paper, we first study FER using pixel-based DL vs semantics-based DL in the context of deepfake videos. Our experiment indicates that while the semantics-trained model performs better with articulated facial feature changes, the pixel-trained model outperforms on subtle or rare facial expressions. Armed with these findings, we have constructed an adaptive FER system learning from both types of models for dyadic or small interacting groups and further leveraging the synthesized group emotions as the ground truth for individualized FER training. Using a collection of group conversation videos, we demonstrate that FER accuracy and personalization can benefit from such an approach.

Keywords: neurodivergence care, facial emotion recognition, deep learning, ground truth for supervised learning

Procedia PDF Downloads 126

2867 Examining Foreign Student Visual Perceptions of Online Marketing Tools at a Hungarian University

Authors: Anita Kéri

Abstract:

Higher education marketing has been a widely researched field in recent years. Due to the increasing competition among higher education institutions worldwide, it has become crucial to target foreign students with effective marketing tools. Online marketing tools became central to attracting, retaining, and satisfying the needs of foreign students. Therefore, the aim of the current study is to reveal how the online marketing tools of a Hungarian university are perceived visually by its first-year foreign students, with special emphasis on the university webpage content. Eye-camera tracking and retrospective think-aloud interviews were used to measure visual perceptions. Results show that freshmen students remember those online marketing content more that has familiar content on them. Pictures of real-life students and their experiences attract students’ attention more, and they also remember information on these webpage elements more, compared to designs with stock photos. This research is novel in the sense that it uses eye-camera tracking in the field of higher education marketing, thereby providing insight into the perception of online higher education marketing for foreign students.

Keywords: higher education, marketing, eye-camera, visual perceptions

Procedia PDF Downloads 83

2866 A Review on Artificial Neural Networks in Image Processing

Authors: B. Afsharipoor, E. Nazemi

Abstract:

Artificial neural networks (ANNs) are powerful tool for prediction which can be trained based on a set of examples and thus, it would be useful for nonlinear image processing. The present paper reviews several paper regarding applications of ANN in image processing to shed the light on advantage and disadvantage of ANNs in this field. Different steps in the image processing chain including pre-processing, enhancement, segmentation, object recognition, image understanding and optimization by using ANN are summarized. Furthermore, results on using multi artificial neural networks are presented.

Keywords: neural networks, image processing, segmentation, object recognition, image understanding, optimization, MANN

Procedia PDF Downloads 375

2865 Open-Ended Multi-Modal Relational Reason for Video Question Answering

Authors: Haozheng Luo, Ruiyang Qin

Abstract:

People with visual impairments urgently need assistance, not only on the fundamental tasks such as guiding and retrieving objects but on the advanced like picturing the new environments. More than a guiding dog, they might want such devices that can provide linguistic interaction. Building on this idea, we aim to study the interaction between the robot agent and visually impaired people. In our research, we are going to develop a robot agent that will be able to analyze the test environment and answer the participants’ questions. We also will study the relevant issues regarding the interaction between human beings and the robot agents to figure out which and how the factors will affect the interaction.

Keywords: HRI, video question answering, visual question answering, natural language processing

Procedia PDF Downloads 203

2864 Use of Visual, Animating Narrative in an Entrepreneurial Storytelling: A Case Study of Greenesignit! Card Game, Educational and Brainstorming Tool for Development of Sustainable Products

Authors: Maja S. Todorovic

Abstract:

This paper aims to promote entrepreneurial storytelling by exploring new ideas and learning practices. An entrepreneur needs to be a ‘storyteller’, an ‘epic hero’, capable of offering an emotional connection to his audience, a character with whom audience can identify with, rejoice, suffer, celebrate, fail – simply experience everything. In other words, a successful entrepreneur is giving tangible experience through his business story and that’s what makes his story and business alive. Use of mythology, eulogy, metaphor, epic, fairytales and cartoons, permeated with humor and sudden twists is a winning recipe for a business story that captures attention. In the business case of the Greenesignit! Card game, (educational and brainstorming tool for development of sustainable products) we will demonstrate how an entrepreneur successfully used visual narrative to communicate his story and at the same time as a vehicle to transmute his message in learning tool and product development.

Keywords: animating narrative, entrepreneur, Greeneisgnit! card game, visual storytelling

Procedia PDF Downloads 377