Search results for: imageNet large scale visual recognition challenge (ILSVRC)
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16114

Search results for: imageNet large scale visual recognition challenge (ILSVRC)

15844 Research on Detection of Web Page Visual Salience Region Based on Eye Tracker and Spectral Residual Model

Authors: Xiaoying Guo, Xiangyun Wang, Chunhua Jia

Abstract:

Web page has been one of the most important way of knowing the world. Humans catch a lot of information from it everyday. Thus, understanding where human looks when they surfing the web pages is rather important. In normal scenes, the down-top features and top-down tasks significantly affect humans’ eye movement. In this paper, we investigated if the conventional visual salience algorithm can properly predict humans’ visual attractive region when they viewing the web pages. First, we obtained the eye movement data when the participants viewing the web pages using an eye tracker. By the analysis of eye movement data, we studied the influence of visual saliency and thinking way on eye-movement pattern. The analysis result showed that thinking way affect human’ eye-movement pattern much more than visual saliency. Second, we compared the results of web page visual salience region extracted by Itti model and Spectral Residual (SR) model. The results showed that Spectral Residual (SR) model performs superior than Itti model by comparison with the heat map from eye movements. Considering the influence of mind habit on humans’ visual region of interest, we introduced one of the most important cue in mind habit-fixation position to improved the SR model. The result showed that the improved SR model can better predict the human visual region of interest in web pages.

Keywords: web page salience region, eye-tracker, spectral residual, visual salience

Procedia PDF Downloads 249
15843 Comparisons of Depressive Symptoms and Cognitive Appraisals in Different Age Groups under Abusive Leadership

Authors: Shao-Ying Wang, Shin-I Shih, Chi-Cheng Wu

Abstract:

Background: By following to the maturity theory about age, the manifestation of depression in different age groups under occupational stressors still remains unclear. Therefore, the aim of this study was to examine the depression within four main symptoms clusters: cognition, affect, physical complaints and interpersonal difficulty among the different age groups. Additionally, this study also used the stress appraisal theory, through the examination of challenge and hindrance appraisals, the effects of cognitive factors were expected to give therapeutic indication for the future treatment of depression under abusive leadership. Methods (Participants and Procedure): The data were collected in two waves from employees of local companies in Taiwan. The participants (58 males and 167 females) were native Chinese speakers, ranging in age from 20 to 59 years (M= 36.51). Up to 80% educational level of participants were above senior high. The married population was approximately at 43%. Measures; 1. Abusive Leadership: To measure abusive leadership, we used 15-item scale of abusive supervision which anchored on a 7-point Likert-type scale. (α= .96) 2. Depression: We used Taiwanese Depression Scale to measure the 4 clusters (cognition, affect, physical complaints and interpersonal difficulty) of symptoms. Participants responded for depression anchored on a 7-point Likert-type scale (α= .96). 3. Stress Appraisal Scale: To measure challenge and hindrance types of appraisal, participants responded to 33-item measure anchored on a 7-point Likert-type scale. (Challenge appraisal; α= .90; hindrance appraisal α= .87). Results: The results of correlation showed that there was a significant and negative correlation between abusive leadership and age (r = - .21, p < .01). Abusive leadership was positive correlated significantly with hindrance appraisal (r = .52, p < .01) and depression (r = .20, p < .01). The results also showed that hindrance appraisal was correlated to depression positively (r = .36, p < .01). A one-way ANOVA was conducted to compare the effect of lower/middle/order age groups on each cluster of depressive symptoms. The results showed that the effect of age groups on cognition was significant F (2, 157) =3.66, P < .05. Older age group (M=13.43 SD=6.84) reported less cognitive symptoms of depression than the middle (M=16.77 SD=7.49) and lower age (M=16.91 SD=6.97) groups. Besides, the effect of age groups on affect was also significant F (2,157)= 4.09 P < .05. Older age group (M=18.68 SD=8.98) reported less affective symptoms of depression than the middle (M=22.01 SD=7.96) and lower age (M=23.56 SD=7.67) groups. Moreover, the main effect of hindrance appraisal was found F (2, 157) =3.81, P < .05. Older age group (M=9.44 SD=2.89) reported fewer score on hindrance appraisals than the middle (M=11.06 SD=4.02) and lower age (M=9.62 SD=3.17) groups. To conclude, the severity of depression symptoms varies across different age groups. Maturity seems to be the protective factor to depression, accompanying with lower hindrance appraisals.

Keywords: abusive leadership, affective commitment, depression symptoms, psychological well-being

Procedia PDF Downloads 176
15842 Single Atom Manipulation with 4 Scanning Tunneling Microscope Technique

Authors: Jianshu Yang, Delphine Sordes, Marek Kolmer, Christian Joachim

Abstract:

Nanoelectronics, for example the calculating circuits integrating at molecule scale logic gates, atomic scale circuits, has been constructed and investigated recently. A major challenge is their functional properties characterization because of the connecting problem from atomic scale to micrometer scale. New experimental instruments and new processes have been proposed therefore. To satisfy a precisely measurement at atomic scale and then connecting micrometer scale electrical integration controller, the technique improvement is kept on going. Our new machine, a low temperature high vacuum four scanning tunneling microscope, as a customer required instrument constructed by Omicron GmbH, is expected to be scaling down to atomic scale characterization. Here, we will present our first testified results about the performance of this new instrument. The sample we selected is Au(111) surface. The measurements have been taken at 4.2 K. The atomic resolution surface structure was observed with each of four scanners with noise level better than 3 pm. With a tip-sample distance calibration by I-z spectra, the sample conductance has been derived from its atomic locally I-V spectra. Furthermore, the surface conductance measurement has been performed using two methods, (1) by landing two STM tips on the surface with sample floating; and (2) by sample floating and one of the landed tips turned to be grounding. In addition, single atom manipulation has been achieved with a modified tip design, which is comparable to a conventional LT-STM.

Keywords: low temperature ultra-high vacuum four scanning tunneling microscope, nanoelectronics, point contact, single atom manipulation, tunneling resistance

Procedia PDF Downloads 258
15841 Multiscale Computational Approach to Enhance the Understanding, Design and Development of CO₂ Catalytic Conversion Technologies

Authors: Agnieszka S. Dzielendziak, Lindsay-Marie Armstrong, Matthew E. Potter, Robert Raja, Pier J. A. Sazio

Abstract:

Reducing carbon dioxide, CO₂, is one of the greatest global challenges. Conversion of CO₂ for utilisation across synthetic fuel, pharmaceutical, and agrochemical industries offers a promising option, yet requires significant research to understanding the complex multiscale processes involved. To experimentally understand and optimize such processes at that catalytic sites and exploring the impact of the process at reactor scale, is too expensive. Computational methods offer significant insight and flexibility but require a more detailed multi-scale approach which is a significant challenge in itself. This work introduces a computational approach which incorporates detailed catalytic models, taken from experimental investigations, into a larger-scale computational flow dynamics framework. The reactor-scale species transport approach is modified near the catalytic walls to determine the influence of catalytic clustering regions. This coupling approach enables more accurate modelling of velocity, pressures, temperatures, species concentrations and near-wall surface characteristics which will ultimately enable the impact of overall reactor design on chemical conversion performance.

Keywords: catalysis, CCU, CO₂, multi-scale model

Procedia PDF Downloads 228
15840 A Network of Nouns and Their Features :A Neurocomputational Study

Authors: Skiker Kaoutar, Mounir Maouene

Abstract:

Neuroimaging studies indicate that a large fronto-parieto-temporal network support nouns and their features, with some areas store semantic knowledge (visual, auditory, olfactory, gustatory,…), other areas store lexical representation and other areas are implicated in general semantic processing. However, it is not well understood how this fronto-parieto-temporal network can be modulated by different semantic tasks and different semantic relations between nouns. In this study, we combine a behavioral semantic network, functional MRI studies involving object’s related nouns and brain network studies to explain how different semantic tasks and different semantic relations between nouns can modulate the activity within the brain network of nouns and their features. We first describe how nouns and their features form a large scale brain network. For this end, we examine the connectivities between areas recruited during the processing of nouns to know which configurations of interaction areas are possible. We can thus identify if, for example, brain areas that store semantic knowledge communicate via functional/structural links with areas that store lexical representations. Second, we examine how this network is modulated by different semantic tasks involving nouns and finally, we examine how category specific activation may result from the semantic relations among nouns. The results indicate that brain network of nouns and their features is highly modulated and flexible by different semantic tasks and semantic relations. At the end, this study can be used as a guide to help neurosientifics to interpret the pattern of fMRI activations detected in the semantic processing of nouns. Specifically; this study can help to interpret the category specific activations observed extensively in a large number of neuroimaging studies and clinical studies.

Keywords: nouns, features, network, category specificity

Procedia PDF Downloads 490
15839 Numerical and Experimental Studies on the Characteristic of the Air Distribution in the Wind-Box of a Circulating Fluidized Bed Boiler

Authors: Xiaozhou Liu, Guangyu Zhu, Yu Zhang, Hongwei Wu

Abstract:

The wind-box is one of the important components of a Circulating Fluidized Bed (CFB) boiler. The uniformity of air flow in the wind-box of is very important for highly efficient operation of the CFB boiler. Non-uniform air flow distribution within the wind-box can reduce the boiler's thermal efficiency, leading to higher energy consumptions. An effective measure to solve this problem is to install an air flow distributing device in the wind-box. In order to validate the effectiveness of the air flow distributing device, visual and velocity distribution uniformity experiments have been carried out under five different test conditions by using a 1:64 scale model of a 220t/hr CFB boiler. It has been shown that the z component of flow velocity remains almost the same at control cross-sections of the wind-box, with a maximum variation of less than 10%. Moreover, the same methodology has been carried out to a full-scale 220t/hr CFB boiler. The hot test results depict that the thermal efficiency of the boiler has increased from 85.71% to 88.34% when tested with an air flow distributing device in place, which is equivalent to a saving of 5,000 tons of coal per year. The economic benefits of this energy-saving technology have been shown to be very significant, which clearly demonstrates that the technology is worth applying and popularizing.

Keywords: circulating fluidized bed, CFB, wind-box, air flow distributing device, visual experiment, velocity distribution uniformity experiment, hot test

Procedia PDF Downloads 146
15838 Reed: An Approach Towards Quickly Bootstrapping Multilingual Acoustic Models

Authors: Bipasha Sen, Aditya Agarwal

Abstract:

Multilingual automatic speech recognition (ASR) system is a single entity capable of transcribing multiple languages sharing a common phone space. Performance of such a system is highly dependent on the compatibility of the languages. State of the art speech recognition systems are built using sequential architectures based on recurrent neural networks (RNN) limiting the computational parallelization in training. This poses a significant challenge in terms of time taken to bootstrap and validate the compatibility of multiple languages for building a robust multilingual system. Complex architectural choices based on self-attention networks are made to improve the parallelization thereby reducing the training time. In this work, we propose Reed, a simple system based on 1D convolutions which uses very short context to improve the training time. To improve the performance of our system, we use raw time-domain speech signals directly as input. This enables the convolutional layers to learn feature representations rather than relying on handcrafted features such as MFCC. We report improvement on training and inference times by atleast a factor of 4x and 7.4x respectively with comparable WERs against standard RNN based baseline systems on SpeechOcean's multilingual low resource dataset.

Keywords: convolutional neural networks, language compatibility, low resource languages, multilingual automatic speech recognition

Procedia PDF Downloads 93
15837 Automatic Music Score Recognition System Using Digital Image Processing

Authors: Yuan-Hsiang Chang, Zhong-Xian Peng, Li-Der Jeng

Abstract:

Music has always been an integral part of human’s daily lives. But, for the most people, reading musical score and turning it into melody is not easy. This study aims to develop an Automatic music score recognition system using digital image processing, which can be used to read and analyze musical score images automatically. The technical approaches included: (1) staff region segmentation; (2) image preprocessing; (3) note recognition; and (4) accidental and rest recognition. Digital image processing techniques (e.g., horizontal /vertical projections, connected component labeling, morphological processing, template matching, etc.) were applied according to musical notes, accidents, and rests in staff notations. Preliminary results showed that our system could achieve detection and recognition rates of 96.3% and 91.7%, respectively. In conclusion, we presented an effective automated musical score recognition system that could be integrated in a system with a media player to play music/songs given input images of musical score. Ultimately, this system could also be incorporated in applications for mobile devices as a learning tool, such that a music player could learn to play music/songs.

Keywords: connected component labeling, image processing, morphological processing, optical musical recognition

Procedia PDF Downloads 390
15836 A Recognition Method of Ancient Yi Script Based on Deep Learning

Authors: Shanxiong Chen, Xu Han, Xiaolong Wang, Hui Ma

Abstract:

Yi is an ethnic group mainly living in mainland China, with its own spoken and written language systems, after development of thousands of years. Ancient Yi is one of the six ancient languages in the world, which keeps a record of the history of the Yi people and offers documents valuable for research into human civilization. Recognition of the characters in ancient Yi helps to transform the documents into an electronic form, making their storage and spreading convenient. Due to historical and regional limitations, research on recognition of ancient characters is still inadequate. Thus, deep learning technology was applied to the recognition of such characters. Five models were developed on the basis of the four-layer convolutional neural network (CNN). Alpha-Beta divergence was taken as a penalty term to re-encode output neurons of the five models. Two fully connected layers fulfilled the compression of the features. Finally, at the softmax layer, the orthographic features of ancient Yi characters were re-evaluated, their probability distributions were obtained, and characters with features of the highest probability were recognized. Tests conducted show that the method has achieved higher precision compared with the traditional CNN model for handwriting recognition of the ancient Yi.

Keywords: recognition, CNN, Yi character, divergence

Procedia PDF Downloads 137
15835 Characterising the Processes Underlying Emotion Recognition Deficits in Adolescents with Conduct Disorder

Authors: Nayra Martin-Key, Erich Graf, Wendy Adams, Graeme Fairchild

Abstract:

Children and adolescents with Conduct Disorder (CD) have been shown to demonstrate impairments in emotion recognition, but it is currently unclear whether this deficit is related to specific emotions or whether it represents a global deficit in emotion recognition. An emotion recognition task with concurrent eye-tracking was employed to further explore this relationship in a sample of male and female adolescents with CD. Participants made emotion categorization judgements for presented dynamic and morphed static facial expressions. The results demonstrated that males with CD, and to a lesser extent, females with CD, displayed impaired facial expression recognition in general, whereas callous-unemotional (CU) traits were linked to specific problems in sadness recognition in females with CD. A region-of-interest analysis of the eye-tracking data indicated that males with CD exhibited reduced fixation times for the eye-region of the face compared to typically-developing (TD) females, but not TD males. Females with CD did not show reduced fixation to the eye-region of the face relative to TD females. In addition, CU traits did not influence CD subjects’ attention to the eye-region of the face. These findings suggest that the emotion recognition deficits found in CD males, the worst performing group in the behavioural tasks, are partly driven by reduced attention to the eyes.

Keywords: attention, callous-unemotional traits, conduct disorder, emotion recognition, eye-region, eye-tracking, sex differences

Procedia PDF Downloads 279
15834 A Motion Dictionary to Real-Time Recognition of Sign Language Alphabet Using Dynamic Time Warping and Artificial Neural Network

Authors: Marcio Leal, Marta Villamil

Abstract:

Computacional recognition of sign languages aims to allow a greater social and digital inclusion of deaf people through interpretation of their language by computer. This article presents a model of recognition of two of global parameters from sign languages; hand configurations and hand movements. Hand motion is captured through an infrared technology and its joints are built into a virtual three-dimensional space. A Multilayer Perceptron Neural Network (MLP) was used to classify hand configurations and Dynamic Time Warping (DWT) recognizes hand motion. Beyond of the method of sign recognition, we provide a dataset of hand configurations and motion capture built with help of fluent professionals in sign languages. Despite this technology can be used to translate any sign from any signs dictionary, Brazilian Sign Language (Libras) was used as case study. Finally, the model presented in this paper achieved a recognition rate of 80.4%.

Keywords: artificial neural network, computer vision, dynamic time warping, infrared, sign language recognition

Procedia PDF Downloads 184
15833 Investigation of New Gait Representations for Improving Gait Recognition

Authors: Chirawat Wattanapanich, Hong Wei

Abstract:

This study presents new gait representations for improving gait recognition accuracy on cross gait appearances, such as normal walking, wearing a coat and carrying a bag. Based on the Gait Energy Image (GEI), two ideas are implemented to generate new gait representations. One is to append lower knee regions to the original GEI, and the other is to apply convolutional operations to the GEI and its variants. A set of new gait representations are created and used for training multi-class Support Vector Machines (SVMs). Tests are conducted on the CASIA dataset B. Various combinations of the gait representations with different convolutional kernel size and different numbers of kernels used in the convolutional processes are examined. Both the entire images as features and reduced dimensional features by Principal Component Analysis (PCA) are tested in gait recognition. Interestingly, both new techniques, appending the lower knee regions to the original GEI and convolutional GEI, can significantly contribute to the performance improvement in the gait recognition. The experimental results have shown that the average recognition rate can be improved from 75.65% to 87.50%.

Keywords: convolutional image, lower knee, gait

Procedia PDF Downloads 178
15832 Investigation of Different Machine Learning Algorithms in Large-Scale Land Cover Mapping within the Google Earth Engine

Authors: Amin Naboureh, Ainong Li, Jinhu Bian, Guangbin Lei, Hamid Ebrahimy

Abstract:

Large-scale land cover mapping has become a new challenge in land change and remote sensing field because of involving a big volume of data. Moreover, selecting the right classification method, especially when there are different types of landscapes in the study area is quite difficult. This paper is an attempt to compare the performance of different machine learning (ML) algorithms for generating a land cover map of the China-Central Asia–West Asia Corridor that is considered as one of the main parts of the Belt and Road Initiative project (BRI). The cloud-based Google Earth Engine (GEE) platform was used for generating a land cover map for the study area from Landsat-8 images (2017) by applying three frequently used ML algorithms including random forest (RF), support vector machine (SVM), and artificial neural network (ANN). The selected ML algorithms (RF, SVM, and ANN) were trained and tested using reference data obtained from MODIS yearly land cover product and very high-resolution satellite images. The finding of the study illustrated that among three frequently used ML algorithms, RF with 91% overall accuracy had the best result in producing a land cover map for the China-Central Asia–West Asia Corridor whereas ANN showed the worst result with 85% overall accuracy. The great performance of the GEE in applying different ML algorithms and handling huge volume of remotely sensed data in the present study showed that it could also help the researchers to generate reliable long-term land cover change maps. The finding of this research has great importance for decision-makers and BRI’s authorities in strategic land use planning.

Keywords: land cover, google earth engine, machine learning, remote sensing

Procedia PDF Downloads 91
15831 Acceptability and Challenges Experienced by Homosexual Indigenous Peoples in Southern Palawan

Authors: Crisanto H. Ecaldre

Abstract:

Gender perception represents how an individual perceives the gender identity of a person. Since this is a subjective assessment, it paves the way to various social reactions, either in the form of acceptance or discrimination. Reports across the world show that lesbian, gay, bisexual, or transgender (LGBT) people often face discrimination, stigmatization, and targeted violence because of their sexual orientation or gender identity. However, the challenges faced by those who belong to both a sexual minority and a marginalized ethnic, religious, linguistic, or indigenous community are even more complex. Specifically, in Palaw’an community, members own those who identify themselves as gays or lesbians and use “bantut” to identify them. There was also the introduction of various scholarly works to facilitate dialogues that promote visibility and inclusivity across sectors in terms of gender preferences; however, there are still gaps that need to be addressed in terms of recognition and visibility. Though local research initiatives are slowly increasing in terms of numbers, culturally situating gender studies appropriately within the context of indigenous cultural communities is still lacking. Indigenous community-based discourses on gender or indigenizing gender discourses remain a challenge; hence, this study aimed to contribute to addressing these identified gaps. These research objectives were realized through a qualitative approach following an exploratory design. Findings revealed that the Palaw’an indigenous cultural community has an existing concept of homosexuality, which they termed “bantut.” This notion was culturally defined by the participants as (a) kaloob ng diwata; (b) a manifestation of physical inferiority; (c) hindi nakapag-asawa or hindi nagka-anak; and (d) based on the ascribed roles by the community. These were recognized and valued by the community. However, despite the recognition and visibility within the community, the outside people view them otherwise. The challenges experienced by the Palaw’an homosexuals are imposed by the people outside their community, and these include prejudice, discrimination, and double marginalization. Because of these struggles, they are forced to cope. They deal with these imposed limitations, biases, and burdens by non-Palaw’an through self-acceptance, strong self-perception, and the option to leave the community to seek a more open and progressive environment for LGBTs. While these are indications of their ‘resilience’ amidst difficult situations, this reality poses an important concern -how the recognition and visibility of indigenous homosexuals from the mainstream perspective can be attained.

Keywords: gender preference, acceptability, challenge, recognition, visibility, coping

Procedia PDF Downloads 31
15830 Offline Signature Verification in Punjabi Based On SURF Features and Critical Point Matching Using HMM

Authors: Rajpal Kaur, Pooja Choudhary

Abstract:

Biometrics, which refers to identifying an individual based on his or her physiological or behavioral characteristics, has the capabilities to the reliably distinguish between an authorized person and an imposter. The Signature recognition systems can categorized as offline (static) and online (dynamic). This paper presents Surf Feature based recognition of offline signatures system that is trained with low-resolution scanned signature images. The signature of a person is an important biometric attribute of a human being which can be used to authenticate human identity. However the signatures of human can be handled as an image and recognized using computer vision and HMM techniques. With modern computers, there is need to develop fast algorithms for signature recognition. There are multiple techniques are defined to signature recognition with a lot of scope of research. In this paper, (static signature) off-line signature recognition & verification using surf feature with HMM is proposed, where the signature is captured and presented to the user in an image format. Signatures are verified depended on parameters extracted from the signature using various image processing techniques. The Off-line Signature Verification and Recognition is implemented using Mat lab platform. This work has been analyzed or tested and found suitable for its purpose or result. The proposed method performs better than the other recently proposed methods.

Keywords: offline signature verification, offline signature recognition, signatures, SURF features, HMM

Procedia PDF Downloads 358
15829 Visual Thinking Routines: A Mixed Methods Approach Applied to Student Teachers at the American University in Dubai

Authors: Alain Gholam

Abstract:

Visual thinking routines are principles based on several theories, approaches, and strategies. Such routines promote thinking skills, call for collaboration and sharing of ideas, and above all, make thinking and learning visible. Visual thinking routines were implemented in the teaching methodology graduate course at the American University in Dubai. The study used mixed methods. It was guided by the following two research questions: 1). To what extent do visual thinking inspire learning in the classroom, and make time for students’ questions, contributions, and thinking? 2). How do visual thinking routines inspire learning in the classroom and make time for students’ questions, contributions, and thinking? Eight student teachers enrolled in the teaching methodology course at the American University in Dubai (Spring 2017) participated in the following study. First, they completed a survey that measured to what degree they believed visual thinking routines inspired learning in the classroom and made time for students’ questions, contributions, and thinking. In order to build on the results from the quantitative phase, the student teachers were next involved in a qualitative data collection phase, where they had to answer the question: How do visual thinking routines inspire learning in the classroom and make time for students’ questions, contributions, and thinking? Results revealed that the implementation of visual thinking routines in the classroom strongly inspire learning in the classroom and make time for students’ questions, contributions, and thinking. In addition, student teachers explained how visual thinking routines allow for organization, variety, thinking, and documentation. As with all original, new, and unique resources, visual thinking routines are not free of challenges. To make the most of this useful and valued resource, educators, need to comprehend, model and spread an awareness of the effective ways of using such routines in the classroom. It is crucial that such routines become part of the curriculum to allow for and document students’ questions, contributions, and thinking.

Keywords: classroom display, student engagement, thinking classroom, visual thinking routines

Procedia PDF Downloads 201
15828 Probing Language Models for Multiple Linguistic Information

Authors: Bowen Ding, Yihao Kuang

Abstract:

In recent years, large-scale pre-trained language models have achieved state-of-the-art performance on a variety of natural language processing tasks. The word vectors produced by these language models can be viewed as dense encoded presentations of natural language that in text form. However, it is unknown how much linguistic information is encoded and how. In this paper, we construct several corresponding probing tasks for multiple linguistic information to clarify the encoding capabilities of different language models and performed a visual display. We firstly obtain word presentations in vector form from different language models, including BERT, ELMo, RoBERTa and GPT. Classifiers with a small scale of parameters and unsupervised tasks are then applied on these word vectors to discriminate their capability to encode corresponding linguistic information. The constructed probe tasks contain both semantic and syntactic aspects. The semantic aspect includes the ability of the model to understand semantic entities such as numbers, time, and characters, and the grammatical aspect includes the ability of the language model to understand grammatical structures such as dependency relationships and reference relationships. We also compare encoding capabilities of different layers in the same language model to infer how linguistic information is encoded in the model.

Keywords: language models, probing task, text presentation, linguistic information

Procedia PDF Downloads 70
15827 Similar Script Character Recognition on Kannada and Telugu

Authors: Gurukiran Veerapur, Nytik Birudavolu, Seetharam U. N., Chandravva Hebbi, R. Praneeth Reddy

Abstract:

This work presents a robust approach for the recognition of characters in Telugu and Kannada, two South Indian scripts with structural similarities in characters. To recognize the characters exhaustive datasets are required, but there are only a few publicly available datasets. As a result, we decided to create a dataset for one language (source language),train the model with it, and then test it with the target language.Telugu is the target language in this work, whereas Kannada is the source language. The suggested method makes use of Canny edge features to increase character identification accuracy on pictures with noise and different lighting. A dataset of 45,150 images containing printed Kannada characters was created. The Nudi software was used to automatically generate printed Kannada characters with different writing styles and variations. Manual labelling was employed to ensure the accuracy of the character labels. The deep learning models like CNN (Convolutional Neural Network) and Visual Attention neural network (VAN) are used to experiment with the dataset. A Visual Attention neural network (VAN) architecture was adopted, incorporating additional channels for Canny edge features as the results obtained were good with this approach. The model's accuracy on the combined Telugu and Kannada test dataset was an outstanding 97.3%. Performance was better with Canny edge characteristics applied than with a model that solely used the original grayscale images. The accuracy of the model was found to be 80.11% for Telugu characters and 98.01% for Kannada words when it was tested with these languages. This model, which makes use of cutting-edge machine learning techniques, shows excellent accuracy when identifying and categorizing characters from these scripts.

Keywords: base characters, modifiers, guninthalu, aksharas, vattakshara, VAN

Procedia PDF Downloads 26
15826 Investigation of the Functional Impact of Amblyopia on Visual Skills in Children

Authors: Chinmay V. Deshpande

Abstract:

Purpose: To assess the efficiency of visual functions and visual skills in strabismic & anisometropic amblyopes and to assess visual acuity and contrast sensitivity in anisometropic amblyopes with spectacles & contact lenses. Method: In a prospective clinical study, 32 children ageing from 5 to 15 years presenting with amblyopia in a pediatric department of Shri Ganapati Netralaya Jalna, India, were assessed for a period of three & half months. Visual acuity was measured with Snellen’s and Bailey-Lovie log MAR charts whereas contrast sensitivity was measured with Pelli-Robson chart with spectacles and contact lenses. Saccadic movements were assessed with SCCO scoring criteria and accommodative facility was checked with ±1.50 DS flippers. Stereopsis was assessed with TNO test. Results: By using Wilcoxon sign rank test p-value < 0.05 (< 0.001), the mean linear visual acuity was 0.29 (≈ 6/21) and mean single optotype visual acuity found to be 0.36 (≈ 6/18). Mean visual acuity of 0.27(≈ 6/21) with spectacles improved to 0.33 (≈ 6/18) with contact lenses in amblyopic eyes. The mean Log MAR visual acuity with spectacles and contact lens were found to be 0.602( ≈6/24) and 0.531(≈ 6/21) respectively. The contrast threshold out of 20 amblyopic eyes shows that mean contrast threshold changed in 9 patients from spectacles 0.27 to contact lens 0.19 respectively. The mean accommodative facility assessed was 5.31(± 2.37). 24 subjects (75%) revealed marked saccadic defects on the test applied. 78% subjects didn’t show even gross stereoscopic ability on TNO test. Conclusion: This study supports the facts about amblyopia and associated deficits in visual skills which are claimed in previous studies. In addition, anisometropic amblyopia can be managed better with contact lenses.

Keywords: strabismus, anisometropia, amblyopia, contrast sensitivity, saccades, stereopsis

Procedia PDF Downloads 399
15825 A Review: Detection and Classification Defects on Banana and Apples by Computer Vision

Authors: Zahow Muoftah

Abstract:

Traditional manual visual grading of fruits has been one of the agricultural industry’s major challenges due to its laborious nature as well as inconsistency in the inspection and classification process. The main requirements for computer vision and visual processing are some effective techniques for identifying defects and estimating defect areas. Automated defect detection using computer vision and machine learning has emerged as a promising area of research with a high and direct impact on the visual inspection domain. Grading, sorting, and disease detection are important factors in determining the quality of fruits after harvest. Many studies have used computer vision to evaluate the quality level of fruits during post-harvest. Many studies have used computer vision to evaluate the quality level of fruits during post-harvest. Many studies have been conducted to identify diseases and pests that affect the fruits of agricultural crops. However, most previous studies concentrated solely on the diagnosis of a lesion or disease. This study focused on a comprehensive study to identify pests and diseases of apple and banana fruits using detection and classification defects on Banana and Apples by Computer Vision. As a result, the current article includes research from these domains as well. Finally, various pattern recognition techniques for detecting apple and banana defects are discussed.

Keywords: computer vision, banana, apple, detection, classification

Procedia PDF Downloads 70
15824 Fast Adjustable Threshold for Uniform Neural Network Quantization

Authors: Alexander Goncharenko, Andrey Denisov, Sergey Alyamkin, Evgeny Terentev

Abstract:

The neural network quantization is highly desired procedure to perform before running neural networks on mobile devices. Quantization without fine-tuning leads to accuracy drop of the model, whereas commonly used training with quantization is done on the full set of the labeled data and therefore is both time- and resource-consuming. Real life applications require simplification and acceleration of quantization procedure that will maintain accuracy of full-precision neural network, especially for modern mobile neural network architectures like Mobilenet-v1, MobileNet-v2 and MNAS. Here we present a method to significantly optimize training with quantization procedure by introducing the trained scale factors for discretization thresholds that are separate for each filter. Using the proposed technique, we quantize the modern mobile architectures of neural networks with the set of train data of only ∼ 10% of the total ImageNet 2012 sample. Such reduction of train dataset size and small number of trainable parameters allow to fine-tune the network for several hours while maintaining the high accuracy of quantized model (accuracy drop was less than 0.5%). Ready-for-use models and code are available in the GitHub repository.

Keywords: distillation, machine learning, neural networks, quantization

Procedia PDF Downloads 291
15823 Vision-Based Hand Segmentation Techniques for Human-Computer Interaction

Authors: M. Jebali, M. Jemni

Abstract:

This work is the part of vision based hand gesture recognition system for Natural Human Computer Interface. Hand tracking and segmentation are the primary steps for any hand gesture recognition system. The aim of this paper is to develop robust and efficient hand segmentation algorithm such as an input to another system which attempt to bring the HCI performance nearby the human-human interaction, by modeling an intelligent sign language recognition system based on prediction in the context of dialogue between the system (avatar) and the interlocutor. For the purpose of hand segmentation, an overcoming occlusion approach has been proposed for superior results for detection of hand from an image.

Keywords: HCI, sign language recognition, object tracking, hand segmentation

Procedia PDF Downloads 380
15822 An Erudite Technique for Face Detection and Recognition Using Curvature Analysis

Authors: S. Jagadeesh Kumar

Abstract:

Face detection and recognition is an authoritative technology for image database management, video surveillance, and human computer interface (HCI). Face recognition is a rapidly nascent method, which has been extensively discarded in forensics such as felonious identification, tenable entree, and custodial security. This paper recommends an erudite technique using curvature analysis (CA) that has less false positives incidence, operative in different light environments and confiscates the artifacts that are introduced during image acquisition by ring correction in polar coordinate (RCP) method. This technique affronts mean and median filtering technique to remove the artifacts but it works in polar coordinate during image acquisition. Investigational fallouts for face detection and recognition confirms decent recitation even in diagonal orientation and stance variation.

Keywords: curvature analysis, ring correction in polar coordinate method, face detection, face recognition, human computer interaction

Procedia PDF Downloads 254
15821 An Approach in Design of Large-Scale Hydrogen Plants

Authors: Hamidreza Sahaleh

Abstract:

Because of the stringent prerequisite of low sulfur and heavier raw oil feedstock more hydrogen will be devoured in the refineries. Specifically if huge scale limits are the reaction to an expanded hydrogen request, certain configuration and building background are obliged with, which will be depicted in this paper with an illustration. Chosen procedure plan prerequisite will be recorded and portrayed in agreement to the flowsheet. Also, a determination of imaginative outline elements, similar to process condensate reuse, safe reformer start up and prerequisites will be highlighted.

Keywords: low sulfur, raw oil, refineries, flowsheet

Procedia PDF Downloads 265
15820 An Analysis of the Temporal Aspects of Visual Attention Processing Using Rapid Series Visual Processing (RSVP) Data

Authors: Shreya Borthakur, Aastha Vartak

Abstract:

This Electroencephalogram (EEG) project on Rapid Visual Serial Processing (RSVP) paradigm explores the temporal dynamics of visual attention processing in response to rapidly presented visual stimuli. The study builds upon previous research that used real-world images in RSVP tasks to understand the emergence of object representations in the human brain. The objectives of the research include investigating the differences in accuracy and reaction times between 5 Hz and 20 Hz presentation rates, as well as examining the prominent brain waves, particularly alpha and beta waves, associated with the attention task. The pre-processing and data analysis involves filtering EEG data, creating epochs for target stimuli, and conducting statistical tests using MATLAB, EEGLAB, Chronux toolboxes, and R. The results support the hypotheses, revealing higher accuracy at a slower presentation rate, faster reaction times for less complex targets, and the involvement of alpha and beta waves in attention and cognitive processing. This research sheds light on how short-term memory and cognitive control affect visual processing and could have practical implications in fields like education.

Keywords: RSVP, attention, visual processing, attentional blink, EEG

Procedia PDF Downloads 37
15819 An Automatic Large Classroom Attendance Conceptual Model Using Face Counting

Authors: Sirajdin Olagoke Adeshina, Haidi Ibrahim, Akeem Salawu

Abstract:

large lecture theatres cannot be covered by a single camera but rather by a multicamera setup because of their size, shape, and seating arrangements. Although, classroom capture is achievable through a single camera. Therefore, a design and implementation of a multicamera setup for a large lecture hall were considered. Researchers have shown emphasis on the impact of class attendance taken on the academic performance of students. However, the traditional method of carrying out this exercise is below standard, especially for large lecture theatres, because of the student population, the time required, sophistication, exhaustiveness, and manipulative influence. An automated large classroom attendance system is, therefore, imperative. The common approach in this system is face detection and recognition, where known student faces are captured and stored for recognition purposes. This approach will require constant face database updates due to constant changes in the facial features. Alternatively, face counting can be performed by cropping the localized faces on the video or image into a folder and then count them. This research aims to develop a face localization-based approach to detect student faces in classroom images captured using a multicamera setup. A selected Haar-like feature cascade face detector trained with an asymmetric goal to minimize the False Rejection Rate (FRR) relative to the False Acceptance Rate (FAR) was applied on Raspberry Pi 4B. A relationship between the two factors (FRR and FAR) was established using a constant (λ) as a trade-off between the two factors for automatic adjustment during training. An evaluation of the proposed approach and the conventional AdaBoost on classroom datasets shows an improvement of 8% TPR (output result of low FRR) and 7% minimization of the FRR. The average learning speed of the proposed approach was improved with 1.19s execution time per image compared to 2.38s of the improved AdaBoost. Consequently, the proposed approach achieved 97% TPR with an overhead constraint time of 22.9s compared to 46.7s of the improved Adaboost when evaluated on images obtained from a large lecture hall (DK5) USM.

Keywords: automatic attendance, face detection, haar-like cascade, manual attendance

Procedia PDF Downloads 48
15818 Pictorial Multimodal Analysis of Selected Paintings of Salvador Dali

Authors: Shaza Melies, Abeer Refky, Nihad Mansoor

Abstract:

Multimodality involves the communication between verbal and visual components in various discourses. A painting represents a form of communication between the artist and the viewer in terms of colors, shades, objects, and the title. This paper aims to present how multimodality can be used to decode the verbal and visual dimensions a painting holds. For that purpose, this study uses Kress and van Leeuwen’s theoretical framework of visual grammar for the analysis of the multimodal semiotic resources of selected paintings of Salvador Dali. This study investigates the visual decoding of the selected paintings of Salvador Dali and analyzing their social and political meanings using Kress and van Leeuwen’s framework of visual grammar. The paper attempts to answer the following questions: 1. How far can multimodality decode the verbal and non-verbal meanings of surrealistic art? 2. How can Kress and van Leeuwen’s theoretical framework of visual grammar be applied to analyze Dali’s paintings? 3. To what extent is Kress and van Leeuwen’s theoretical framework of visual grammar apt to deliver political and social messages of Dali? The paper reached the following findings: the framework’s descriptive tools (representational, interactive, and compositional meanings) can be used to analyze the paintings’ title and their visual elements. Social and political messages were delivered by appropriate usage of color, gesture, vectors, modality, and the way social actors were represented.

Keywords: multimodal analysis, painting analysis, Salvador Dali, visual grammar

Procedia PDF Downloads 91
15817 Alphabet Recognition Using Pixel Probability Distribution

Authors: Vaidehi Murarka, Sneha Mehta, Dishant Upadhyay

Abstract:

Our project topic is “Alphabet Recognition using pixel probability distribution”. The project uses techniques of Image Processing and Machine Learning in Computer Vision. Alphabet recognition is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files etc. Alphabet Recognition based OCR application is sometimes used in signature recognition which is used in bank and other high security buildings. One of the popular mobile applications includes reading a visiting card and directly storing it to the contacts. OCR's are known to be used in radar systems for reading speeders license plates and lots of other things. The implementation of our project has been done using Visual Studio and Open CV (Open Source Computer Vision). Our algorithm is based on Neural Networks (machine learning). The project was implemented in three modules: (1) Training: This module aims “Database Generation”. Database was generated using two methods: (a) Run-time generation included database generation at compilation time using inbuilt fonts of OpenCV library. Human intervention is not necessary for generating this database. (b) Contour–detection: ‘jpeg’ template containing different fonts of an alphabet is converted to the weighted matrix using specialized functions (contour detection and blob detection) of OpenCV. The main advantage of this type of database generation is that the algorithm becomes self-learning and the final database requires little memory to be stored (119kb precisely). (2) Preprocessing: Input image is pre-processed using image processing concepts such as adaptive thresholding, binarizing, dilating etc. and is made ready for segmentation. “Segmentation” includes extraction of lines, words, and letters from the processed text image. (3) Testing and prediction: The extracted letters are classified and predicted using the neural networks algorithm. The algorithm recognizes an alphabet based on certain mathematical parameters calculated using the database and weight matrix of the segmented image.

Keywords: contour-detection, neural networks, pre-processing, recognition coefficient, runtime-template generation, segmentation, weight matrix

Procedia PDF Downloads 357
15816 Deep Learning Based Unsupervised Sport Scene Recognition and Highlights Generation

Authors: Ksenia Meshkova

Abstract:

With increasing amount of multimedia data, it is very important to automate and speed up the process of obtaining meta. This process means not just recognition of some object or its movement, but recognition of the entire scene versus separate frames and having timeline segmentation as a final result. Labeling datasets is time consuming, besides, attributing characteristics to particular scenes is clearly difficult due to their nature. In this article, we will consider autoencoders application to unsupervised scene recognition and clusterization based on interpretable features. Further, we will focus on particular types of auto encoders that relevant to our study. We will take a look at the specificity of deep learning related to information theory and rate-distortion theory and describe the solutions empowering poor interpretability of deep learning in media content processing. As a conclusion, we will present the results of the work of custom framework, based on autoencoders, capable of scene recognition as was deeply studied above, with highlights generation resulted out of this recognition. We will not describe in detail the mathematical description of neural networks work but will clarify the necessary concepts and pay attention to important nuances.

Keywords: neural networks, computer vision, representation learning, autoencoders

Procedia PDF Downloads 95
15815 Development of a Computer Vision System for the Blind and Visually Impaired Person

Authors: Rodrigo C. Belleza, Jr., Roselyn A. Maaño, Karl Patrick E. Camota, Darwin Kim Q. Bulawan

Abstract:

Eyes are an essential and conspicuous organ of the human body. Human eyes are outward and inward portals of the body that allows to see the outside world and provides glimpses into ones inner thoughts and feelings. Inevitable blindness and visual impairments may result from eye-related disease, trauma, or congenital or degenerative conditions that cannot be corrected by conventional means. The study emphasizes innovative tools that will serve as an aid to the blind and visually impaired (VI) individuals. The researchers fabricated a prototype that utilizes the Microsoft Kinect for Windows and Arduino microcontroller board. The prototype facilitates advanced gesture recognition, voice recognition, obstacle detection and indoor environment navigation. Open Computer Vision (OpenCV) performs image analysis, and gesture tracking to transform Kinect data to the desired output. A computer vision technology device provides greater accessibility for those with vision impairments.

Keywords: algorithms, blind, computer vision, embedded systems, image analysis

Procedia PDF Downloads 288