Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 3160

Search results for: 3D computer vision

3070 Framework for Socio-Technical Issues in Requirements Engineering for Developing Resilient Machine Vision Systems Using Levels of Automation through the Lifecycle

Abstract:

This research is to examine the impacts of using data to generate performance requirements for automation in visual inspections using machine vision. These situations are intended for design and how projects can smooth the transfer of tacit knowledge to using an algorithm. We have proposed a framework when specifying machine vision systems. This framework utilizes varying levels of automation as contingency planning to reduce data processing complexity. Using data assists in extracting tacit knowledge from those who can perform the manual tasks to assist design the system; this means that real data from the system is always referenced and minimizes errors between participating parties. We propose using three indicators to know if the project has a high risk of failing to meet requirements related to accuracy and reliability. All systems tested achieved a better integration into operations after applying the framework.

Keywords: automation, contingency planning, continuous engineering, control theory, machine vision, system requirements, system thinking

Procedia PDF Downloads 205

3069 Saudi and U.S. Newspaper Coverage of Saudi Vision 2030 Concerning Women in Online Newspapers

Authors: Ziyad Alghamdi

Abstract:

This research investigates how issues concerning Saudi women have been represented in selected U.S. and Saudi publications. Saudi Vision 2030 is the Kingdom of Saudi Arabia's development strategy, which was revealed on April 25, 2016. This study used 115 news items across selected newspapers as its sampling. The New York Times and the Washington Post were chosen to represent U.S. newspapers and picked two Saudi newspapers, Al Jazirah, and Al Watan. This research examines how these issues were covered before and during the implementation of Saudi Vision 2030. The news pieces were analyzed using both quantitative and qualitative methodologies. The qualitative study employed an inductive technique to uncover frames. Furthermore, this work looked at how American and Saudi publications had framed Saudi women depicted in images by reviewing the photographs used in news reports about Saudi women's issues. The primary conclusion implies that the human-interest frame was more prevalent in American media, whereas the economic frame was more prevalent in Saudi publications. A variety of diverse topics were considered.

Keywords: Saudi newspapers, Saudi Vision 2030, framing theory, Saudi women

Procedia PDF Downloads 88

3068 Automatic Identification and Monitoring of Wildlife via Computer Vision and IoT

Authors: Bilal Arshad, Johan Barthelemy, Elliott Pilton, Pascal Perez

Abstract:

Getting reliable, informative, and up-to-date information about the location, mobility, and behavioural patterns of animals will enhance our ability to research and preserve biodiversity. The fusion of infra-red sensors and camera traps offers an inexpensive way to collect wildlife data in the form of images. However, extracting useful data from these images, such as the identification and counting of animals remains a manual, time-consuming, and costly process. In this paper, we demonstrate that such information can be automatically retrieved by using state-of-the-art deep learning methods. Another major challenge that ecologists are facing is the recounting of one single animal multiple times due to that animal reappearing in other images taken by the same or other camera traps. Nonetheless, such information can be extremely useful for tracking wildlife and understanding its behaviour. To tackle the multiple count problem, we have designed a meshed network of camera traps, so they can share the captured images along with timestamps, cumulative counts, and dimensions of the animal. The proposed method takes leverage of edge computing to support real-time tracking and monitoring of wildlife. This method has been validated in the field and can be easily extended to other applications focusing on wildlife monitoring and management, where the traditional way of monitoring is expensive and time-consuming.

Keywords: computer vision, ecology, internet of things, invasive species management, wildlife management

Procedia PDF Downloads 138

3067 Perceptual Learning with Hand-Eye Coordination as an Effective Tool for Managing Amblyopia: A Prospective Study

Authors: Anandkumar S. Purohit

Abstract:

Introduction: Amblyopia is a serious condition resulting in monocular impairment of vision. Although traditional treatment improves vision, we attempted the results of perceptual learning in this study. Methods: The prospective cohort study included all patients with amblyopia who were subjected to perceptual learning. The presenting data on vision, stereopsis, and contrast sensitivity were documented in a pretested online format, and the pre‑ and post‑treatment information was compared using descriptive, cross‑tabulation, and comparative methods on SPSS 22. Results: The cohort consisted of 47 patients (23 females and 24 males) with a mean age of 14.11 ± 7.13 years. A significant improvement was detected in visual acuity after the PL sessions, and the median follow‑up period was 17 days. Stereopsis improved significantly in all age groups. Conclusion: PL with hand-eye coordination is an effective method for managing amblyopia. This approach can improve vision in all age groups.

Keywords: amblyopia, perceptual learning, hand-eye coordination, visual acuity, stereopsis, contrast sensitivity, ophthalmology

Procedia PDF Downloads 27

3066 An Exponential Field Path Planning Method for Mobile Robots Integrated with Visual Perception

Authors: Magdy Roman, Mostafa Shoeib, Mostafa Rostom

Abstract:

Global vision, whether provided by overhead fixed cameras, on-board aerial vehicle cameras, or satellite images can always provide detailed information on the environment around mobile robots. In this paper, an intelligent vision-based method of path planning and obstacle avoidance for mobile robots is presented. The method integrates visual perception with a new proposed field-based path-planning method to overcome common path-planning problems such as local minima, unreachable destination and unnecessary lengthy paths around obstacles. The method proposes an exponential angle deviation field around each obstacle that affects the orientation of a close robot. As the robot directs toward, the goal point obstacles are classified into right and left groups, and a deviation angle is exponentially added or subtracted to the orientation of the robot. Exponential field parameters are chosen based on Lyapunov stability criterion to guarantee robot convergence to the destination. The proposed method uses obstacles' shape and location, extracted from global vision system, through a collision prediction mechanism to decide whether to activate or deactivate obstacles field. In addition, a search mechanism is developed in case of robot or goal point is trapped among obstacles to find suitable exit or entrance. The proposed algorithm is validated both in simulation and through experiments. The algorithm shows effectiveness in obstacles' avoidance and destination convergence, overcoming common path planning problems found in classical methods.

Keywords: path planning, collision avoidance, convergence, computer vision, mobile robots

Procedia PDF Downloads 195

3065 Floodnet: Classification for Post Flood Scene with a High-Resolution Aerial Imaginary Dataset

Authors: Molakala Mourya Vardhan Reddy, Kandimala Revanth, Koduru Sumanth, Beena B. M.

Abstract:

Emergency response and recovery operations are severely hampered by natural catastrophes, especially floods. Understanding post-flood scenarios is essential to disaster management because it facilitates quick evaluation and decision-making. To this end, we introduce FloodNet, a brand-new high-resolution aerial picture collection created especially for comprehending post-flood scenes. A varied collection of excellent aerial photos taken during and after flood occurrences make up FloodNet, which offers comprehensive representations of flooded landscapes, damaged infrastructure, and changed topographies. The dataset provides a thorough resource for training and assessing computer vision models designed to handle the complexity of post-flood scenarios, including a variety of environmental conditions and geographic regions. Pixel-level semantic segmentation masks are used to label the pictures in FloodNet, allowing for a more detailed examination of flood-related characteristics, including debris, water bodies, and damaged structures. Furthermore, temporal and positional metadata improve the dataset's usefulness for longitudinal research and spatiotemporal analysis. For activities like flood extent mapping, damage assessment, and infrastructure recovery projection, we provide baseline standards and evaluation metrics to promote research and development in the field of post-flood scene comprehension. By integrating FloodNet into machine learning pipelines, it will be easier to create reliable algorithms that will help politicians, urban planners, and first responders make choices both before and after floods. The goal of the FloodNet dataset is to support advances in computer vision, remote sensing, and disaster response technologies by providing a useful resource for researchers. FloodNet helps to create creative solutions for boosting communities' resilience in the face of natural catastrophes by tackling the particular problems presented by post-flood situations.

Keywords: image classification, segmentation, computer vision, nature disaster, unmanned arial vehicle(UAV), machine learning.

Procedia PDF Downloads 79

3064 Variation of Refractive Errors among Right and Left Eyes in Jos, Plateau State, Nigeria

Authors: F. B. Masok, S. S Songdeg, R. R. Dawam

Abstract:

Vision is an important process for learning and communication as man depends greatly on vision to sense his environment. Prevalence and variation of refractive errors conducted between December 2010 and May 2011 in Jos, revealed that 735 (77.50%) out 950 subjects examined for refractive error had various refractive errors. Myopia was observed in 373 (49.79%) of the subjects, the error in the right eyes was 263 (55.60%) while the error in the left was 210(44.39%). The mean myopic error was found to be -1.54± 3.32. Hyperopia was observed in 385 (40.53%) of the sampled population comprising 203(52.73%) of the right eyes and 182(47.27%). The mean hyperopic error was found to be +1.74± 3.13. Astigmatism accounted for 359 (38.84%) of the subjects, out of which 193(53.76%) were in the right eyes while 168(46.79%) were in the left eyes. Presbyopia was found in 404(42.53%) of the subjects, of this figure, 164(40.59%) were in the right eyes while 240(59.41%) were in left eyes. The number of right eyes and left eyes with refractive errors was observed in some age groups to increase with age and later had its peak within 60 – 69 age groups. This pattern of refractive errors could be attributed to exposure to various forms of light particularly the ultraviolet rays (e.g rays from television and computer screen). There was no remarkable differences between the mean Myopic error and mean Hyperopic error in the right eyes and in the left eyes which suggest the right eye and the left eye are similar.

Keywords: left eye, refractive errors, right eye, variation

Procedia PDF Downloads 433

3063 RV-YOLOX: Object Detection on Inland Waterways Based on Optimized YOLOX Through Fusion of Vision and 3+1D Millimeter Wave Radar

Authors: Zixian Zhang, Shanliang Yao, Zile Huang, Zhaodong Wu, Xiaohui Zhu, Yong Yue, Jieming Ma

Abstract:

Unmanned Surface Vehicles (USVs) are valuable due to their ability to perform dangerous and time-consuming tasks on the water. Object detection tasks are significant in these applications. However, inherent challenges, such as the complex distribution of obstacles, reflections from shore structures, water surface fog, etc., hinder the performance of object detection of USVs. To address these problems, this paper provides a fusion method for USVs to effectively detect objects in the inland surface environment, utilizing vision sensors and 3+1D Millimeter-wave radar. MMW radar is complementary to vision sensors, providing robust environmental information. The radar 3D point cloud is transferred to 2D radar pseudo image to unify radar and vision information format by utilizing the point transformer. We propose a multi-source object detection network (RV-YOLOX )based on radar-vision fusion for inland waterways environment. The performance is evaluated on our self-recording waterways dataset. Compared with the YOLOX network, our fusion network significantly improves detection accuracy, especially for objects with bad light conditions.

Keywords: inland waterways, YOLO, sensor fusion, self-attention

Procedia PDF Downloads 125

3062 Machine Learning and Deep Learning Approach for People Recognition and Tracking in Crowd for Safety Monitoring

Authors: A. Degale Desta, Cheng Jian

Abstract:

Deep learning application in computer vision is rapidly advancing, giving it the ability to monitor the public and quickly identify potentially anomalous behaviour from crowd scenes. Therefore, the purpose of the current work is to improve the performance of safety of people in crowd events from panic behaviour through introducing the innovative idea of Aggregation of Ensembles (AOE), which makes use of the pre-trained ConvNets and a pool of classifiers to find anomalies in video data with packed scenes. According to the theory of algorithms that applied K-means, KNN, CNN, SVD, and Faster-CNN, YOLOv5 architectures learn different levels of semantic representation from crowd videos; the proposed approach leverages an ensemble of various fine-tuned convolutional neural networks (CNN), allowing for the extraction of enriched feature sets. In addition to the above algorithms, a long short-term memory neural network to forecast future feature values and a handmade feature that takes into consideration the peculiarities of the crowd to understand human behavior. On well-known datasets of panic situations, experiments are run to assess the effectiveness and precision of the suggested method. Results reveal that, compared to state-of-the-art methodologies, the system produces better and more promising results in terms of accuracy and processing speed.

Keywords: action recognition, computer vision, crowd detecting and tracking, deep learning

Procedia PDF Downloads 163

3061 Resisting Adversarial Assaults: A Model-Agnostic Autoencoder Solution

Authors: Massimo Miccoli, Luca Marangoni, Alberto Aniello Scaringi, Alessandro Marceddu, Alessandro Amicone

Abstract:

The susceptibility of deep neural networks (DNNs) to adversarial manipulations is a recognized challenge within the computer vision domain. Adversarial examples, crafted by adding subtle yet malicious alterations to benign images, exploit this vulnerability. Various defense strategies have been proposed to safeguard DNNs against such attacks, stemming from diverse research hypotheses. Building upon prior work, our approach involves the utilization of autoencoder models. Autoencoders, a type of neural network, are trained to learn representations of training data and reconstruct inputs from these representations, typically minimizing reconstruction errors like mean squared error (MSE). Our autoencoder was trained on a dataset of benign examples; learning features specific to them. Consequently, when presented with significantly perturbed adversarial examples, the autoencoder exhibited high reconstruction errors. The architecture of the autoencoder was tailored to the dimensions of the images under evaluation. We considered various image sizes, constructing models differently for 256x256 and 512x512 images. Moreover, the choice of the computer vision model is crucial, as most adversarial attacks are designed with specific AI structures in mind. To mitigate this, we proposed a method to replace image-specific dimensions with a structure independent of both dimensions and neural network models, thereby enhancing robustness. Our multi-modal autoencoder reconstructs the spectral representation of images across the red-green-blue (RGB) color channels. To validate our approach, we conducted experiments using diverse datasets and subjected them to adversarial attacks using models such as ResNet50 and ViT_L_16 from the torch vision library. The autoencoder extracted features used in a classification model, resulting in an MSE (RGB) of 0.014, a classification accuracy of 97.33%, and a precision of 99%.

Keywords: adversarial attacks, malicious images detector, binary classifier, multimodal transformer autoencoder

Procedia PDF Downloads 113

3060 Vision Aided INS for Soft Landing

Authors: R. Sri Karthi Krishna, A. Saravana Kumar, Kesava Brahmaji, V. S. Vinoj

Abstract:

The lunar surface may contain rough and non-uniform terrain with dips and peaks. Soft-landing is a method of landing the lander on the lunar surface without any damage to the vehicle. This project focuses on finding a safe landing site for the vehicle by developing a method for the lateral velocity determination of the lunar lander. This is done by processing the real time images obtained by means of an on-board vision sensor. The hazard avoidance phase of the soft-landing starts when the vehicle is about 200 m above the lunar surface. Here, the lander has a very low velocity of about 10 cm/s:vertical and 5 m/s:horizontal. On the detection of a hazard the lander is navigated by controlling the vertical and lateral velocity. In order to find an appropriate landing site and to accordingly navigate, the lander image processing is performed continuously. The images are taken continuously until the landing site is determined, and the lander safely lands on the lunar surface. By integrating this vision-based navigation with the INS a better accuracy for the soft-landing of the lunar lander can be obtained.

Keywords: vision aided INS, image processing, lateral velocity estimation, materials engineering

Procedia PDF Downloads 467

3059 Shared Vision System Support for Maintenance Tasks of Wind Turbines

Authors: Buket Celik Ünal, Onur Ünal

Abstract:

Communication is the most challenging part of maintenance operations. Communication between expert and fieldworker is crucial for effective maintenance and this also affects the safety of the fieldworkers. To support a machine user in a remote collaborative physical task, both, a mobile and a stationary device are needed. Such a system is called a shared vision system and the system supports two people to solve a problem from different places. This system reduces the errors and provides a reliable support for qualified and less qualified users. Through this research, it was aimed to validate the effectiveness of using a shared vision system to facilitate communication between on-site workers and those issuing instructions regarding maintenance or inspection works over long distances. The system is designed with head-worn display which is called a shared vision system. As a part of this study, a substitute system is used and implemented by using a shared vision system for maintenance operation. The benefits of the use of a shared vision system are analyzed and results are adapted to the wind turbines to improve the occupational safety and health for maintenance technicians. The motivation for the research effort in this study can be summarized in the following research questions: -How can expert support technician over long distances during maintenance operation? -What are the advantages of using a shared vision system? Experience from the experiment shows that using a shared vision system is an advantage for both electrical and mechanical system failures. Results support that the shared vision system can be used for wind turbine maintenance and repair tasks. Because wind turbine generator/gearbox and the substitute system have similar failures. Electrical failures, such as voltage irregularities, wiring failures and mechanical failures, such as alignment, vibration, over-speed conditions are the common and similar failures for both. Furthermore, it was analyzed the effectiveness of the shared vision system by using a smart glasses in connection with the maintenance task performed by a substitute system under four different circumstances, namely by using a shared vision system, an audio communication, a smartphone and by yourself condition. A suitable method for determining dependencies between factors measured in Chi Square Test, and Chi Square Test for Independence measured for determining a relationship between two qualitative variables and finally Mann Whitney U Test is used to compare any two data sets. While based on this experiment, no relation was found between the results and the gender. Participants` responses confirmed that the shared vision system is efficient and helpful for maintenance operations. From the results of the research, there was a statistically significant difference in the average time taken by subjects on works using a shared vision system under the other conditions. Additionally, this study confirmed that a shared vision system provides reduction in time to diagnose and resolve maintenance issues, reduction in diagnosis errors, reduced travel costs for experts, and increased reliability in service.

Keywords: communication support, maintenance and inspection tasks, occupational health and safety, shared vision system

Procedia PDF Downloads 261

3058 Multimodal Deep Learning for Human Activity Recognition

Authors: Ons Slimene, Aroua Taamallah, Maha Khemaja

Abstract:

In recent years, human activity recognition (HAR) has been a key area of research due to its diverse applications. It has garnered increasing attention in the field of computer vision. HAR plays an important role in people’s daily lives as it has the ability to learn advanced knowledge about human activities from data. In HAR, activities are usually represented by exploiting different types of sensors, such as embedded sensors or visual sensors. However, these sensors have limitations, such as local obstacles, image-related obstacles, sensor unreliability, and consumer concerns. Recently, several deep learning-based approaches have been proposed for HAR and these approaches are classified into two categories based on the type of data used: vision-based approaches and sensor-based approaches. This research paper highlights the importance of multimodal data fusion from skeleton data obtained from videos and data generated by embedded sensors using deep neural networks for achieving HAR. We propose a deep multimodal fusion network based on a twostream architecture. These two streams use the Convolutional Neural Network combined with the Bidirectional LSTM (CNN BILSTM) to process skeleton data and data generated by embedded sensors and the fusion at the feature level is considered. The proposed model was evaluated on a public OPPORTUNITY++ dataset and produced a accuracy of 96.77%.

Keywords: human activity recognition, action recognition, sensors, vision, human-centric sensing, deep learning, context-awareness

Procedia PDF Downloads 101

3057 How Envisioning Process Is Constructed: An Exploratory Research Comparing Three International Public Televisions

Authors: Alexandre Bedard, Johane Brunet, Wendellyn Reid

Abstract:

Public Television is constantly trying to maintain and develop its audience. And to achieve those goals, it needs a strong and clear vision. Vision or envision is a multidimensional process; it is simultaneously a conduit that orients and fixes the future, an idea that comes before the strategy and a mean by which action is accomplished, from a business perspective. Also, vision is often studied from a prescriptive and instrumental manner. Based on our understanding of the literature, we were able to explain how envisioning, as a process, is a creative one; it takes place in the mind and uses wisdom and intelligence through a process of evaluation, analysis and creation. Through an aggregation of the literature, we build a model of the envisioning process, based on past experiences, perceptions and knowledge and influenced by the context, being the individual, the organization and the environment. With exploratory research in which vision was deciphered through the discourse, through a qualitative and abductive approach and a grounded theory perspective, we explored three extreme cases, with eighteen interviews with experts, leaders, politicians, actors of the industry, etc. and more than twenty hours of interviews in three different countries. We compared the strategy, the business model, and the political and legal forces. We also looked at the history of each industry from an inertial point of view. Our analysis of the data revealed that a legitimacy effect due to the audience, the innovation and the creativity of the institutions was at the cornerstone of what would influence the envisioning process. This allowed us to identify how different the process was for Canadian, French and UK public broadcasters, although we concluded that the three of them had a socially constructed vision for their future, based on stakeholder management and an emerging role for the managers: ideas brokers.

Keywords: envisioning process, international comparison, television, vision

Procedia PDF Downloads 132

3056 Monocular Depth Estimation Benchmarking with Thermal Dataset

Authors: Ali Akyar, Osman Serdar Gedik

Abstract:

Depth estimation is a challenging computer vision task that involves estimating the distance between objects in a scene and the camera. It predicts how far each pixel in the 2D image is from the capturing point. There are some important Monocular Depth Estimation (MDE) studies that are based on Vision Transformers (ViT). We benchmark three major studies. The first work aims to build a simple and powerful foundation model that deals with any images under any condition. The second work proposes a method by mixing multiple datasets during training and a robust training objective. The third work combines generalization performance and state-of-the-art results on specific datasets. Although there are studies with thermal images too, we wanted to benchmark these three non-thermal, state-of-the-art studies with a hybrid image dataset which is taken by Multi-Spectral Dynamic Imaging (MSX) technology. MSX technology produces detailed thermal images by bringing together the thermal and visual spectrums. Using this technology, our dataset images are not blur and poorly detailed as the normal thermal images. On the other hand, they are not taken at the perfect light conditions as RGB images. We compared three methods under test with our thermal dataset which was not done before. Additionally, we propose an image enhancement deep learning model for thermal data. This model helps extract the features required for monocular depth estimation. The experimental results demonstrate that, after using our proposed model, the performance of these three methods under test increased significantly for thermal image depth prediction.

Keywords: monocular depth estimation, thermal dataset, benchmarking, vision transformers

Procedia PDF Downloads 32

3055 Deprivation of Visual Information Affects Differently the Gait Cycle in Children with Different Level of Motor Competence

Authors: Miriam Palomo-Nieto, Adrian Agricola, Rudolf Psotta, Reza Abdollahipour, Ludvik Valtr

Abstract:

The importance of vision and the visual control of movement have been labeled in the literature related to motor control and many studies have demonstrated that children with low motor competence may rely more heavily on vision to perform movements than their typically developing peers. The aim of the study was to highlight the effects of different visual conditions on motor performance during walking in children with different levels of motor coordination. Participants (n = 32, mean age = 8.5 years sd. ± 0.5) were divided into two groups: typical development (TD) and low motor coordination (LMC) based on the scores of the Movement Assessment Battery for Children (MABC-2). They were asked to walk along a 10 meters walkway where the Optojump-Next instrument was installed in a portable laboratory (15 x 3 m), which allows that all participants had the same visual information. They walked in self-selected speed under four visual conditions: full vision (FV), limited vision 100 ms (LV-100), limited vision 150 ms (LV-150) and non-vision (NV). For visual occlusion participants were equipped with Plato Goggles that shut for 100 and 150 ms, respectively, within each 2 sec. Data were analyzed in a two-way mixed-effect ANOVA including 2 (TD vs. LMC) x 4 (FV, LV-100, LV-150 & NV) with repeated-measures on the last factor (p ≤.05). Results indicated that TD children walked faster and with longer normalized steps length and strides than LMC children. For TD children the percentage of the single support and swing time were higher than for low motor competence children. However, the percentage of load response and pre swing was higher in the low motor competence children rather than the TD children. These findings indicated that through walking we could be able to identify different levels of motor coordination in children. Likewise, LMC children showed shorter percentages in those parameters regarding only one leg support, supporting the idea of balance problems.

Keywords: visual information, motor performance, walking pattern, optojump

Procedia PDF Downloads 574

3054 Amorphous Silicon-Based PINIP Structure for Human-Like Photosensor

Authors: Sheng-Chuan Hsu

Abstract:

Because the existing structure of ambient light sensor is most silicon photodiode device, it is extremely sensitive in the red and infrared regions. Even though the IR-Cut filter had added, it still cannot completely eliminate the influence of infrared light, and the spectral response of infrared light was stronger than that of the human eyes. Therefore, it is not able to present the vision spectrum of the human eye reacts with the ambient light. Then it needs to consider that the human eye feels the spectra that show significant differences between light and dark place. Consequently, in practical applications, we must create and develop advanced device of human-like photosensor which can solve these problems of ambient light sensor and let cognitive lighting system to provide suitable light to achieve the goals of vision spectrum of human eye and save energy.

Keywords: ambient light sensor, vision spectrum, cognitive lighting system, human eye

Procedia PDF Downloads 335

3053 Refined Edge Detection Network

Authors: Omar Elharrouss, Youssef Hmamouche, Assia Kamal Idrissi, Btissam El Khamlichi, Amal El Fallah-Seghrouchni

Abstract:

Edge detection is represented as one of the most challenging tasks in computer vision, due to the complexity of detecting the edges or boundaries in real-world images that contains objects of different types and scales like trees, building as well as various backgrounds. Edge detection is represented also as a key task for many computer vision applications. Using a set of backbones as well as attention modules, deep-learning-based methods improved the detection of edges compared with the traditional methods like Sobel and Canny. However, images of complex scenes still represent a challenge for these methods. Also, the detected edges using the existing approaches suffer from non-refined results while the image output contains many erroneous edges. To overcome this, n this paper, by using the mechanism of residual learning, a refined edge detection network is proposed (RED-Net). By maintaining the high resolution of edges during the training process, and conserving the resolution of the edge image during the network stage, we make the pooling outputs at each stage connected with the output of the previous layer. Also, after each layer, we use an affined batch normalization layer as an erosion operation for the homogeneous region in the image. The proposed methods are evaluated using the most challenging datasets including BSDS500, NYUD, and Multicue. The obtained results outperform the designed edge detection networks in terms of performance metrics and quality of output images.

Keywords: edge detection, convolutional neural networks, deep learning, scale-representation, backbone

Procedia PDF Downloads 102

3052 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 113

3051 Causes of Blindness and Low Vision among Visually Impaired Population Supported by Welfare Organization in Ardabil Province in Iran

Authors: Mohammad Maeiyat, Ali Maeiyat Ivatlou, Rasul Fani Khiavi, Abouzar Maeiyat Ivatlou, Parya Maeiyat

Abstract:

Purpose: Considering the fact that visual impairment is still one of the countries health problem, this study was conducted to determine the causes of blindness and low vision in visually impaired membership of Ardabil Province welfare organization. Methods: The present study which was based on descriptive and national-census, that carried out in visually impaired population supported by welfare organization in all urban and rural areas of Ardabil Province in 2013 and Collection of samples lasted for 7 months. The subjects were inspected by optometrist to determine their visual status (blindness or low vision) and then referred to ophthalmologist in order to discover the main causes of visual impairment based on the international classification of diseases version 10. Statistical analysis of collected data was performed using SPSS software version 18. Results: Overall, 403 subjects with mean age of years participated in this study. 73.2% were blind, 26.8 % were low vision and according gender grouping 60.50 % of them were male, 39.50 % were female that divided into three groups with the age level of lower than 15 (11.2%) 15 to 49 (76.7%), and 50 and higher (12.1%). The age range was 1 to 78 years. The causes of blindness and low vision were in descending order: optic atrophy (18.4%), retinitis pigmentosa (16.8%), corneal diseases (12.4%), chorioretinal diseases (9.4%), cataract (8.9%), glaucoma (8.2%), phthisis bulbi (7.2%), degenerative myopia (6.9%), microphtalmos ( 4%), amblyopia (3.2%), albinism (2.5%) and nistagmus (2%). Conclusion: in this study the main causes of visual impairments were optic atrophy and retinitis pigmentosa, thus specific prevention plans can be effective in reducing the incidence of visual disabilities.

Keywords: blindness, low vision, welfare, ardabil

Procedia PDF Downloads 440

3050 Enhancer: An Effective Transformer Architecture for Single Image Super Resolution

Authors: Pitigalage Chamath Chandira Peiris

Abstract:

A widely researched domain in the field of image processing in recent times has been single image super-resolution, which tries to restore a high-resolution image from a single low-resolution image. Many more single image super-resolution efforts have been completed utilizing equally traditional and deep learning methodologies, as well as a variety of other methodologies. Deep learning-based super-resolution methods, in particular, have received significant interest. As of now, the most advanced image restoration approaches are based on convolutional neural networks; nevertheless, only a few efforts have been performed using Transformers, which have demonstrated excellent performance on high-level vision tasks. The effectiveness of CNN-based algorithms in image super-resolution has been impressive. However, these methods cannot completely capture the non-local features of the data. Enhancer is a simple yet powerful Transformer-based approach for enhancing the resolution of images. A method for single image super-resolution was developed in this study, which utilized an efficient and effective transformer design. This proposed architecture makes use of a locally enhanced window transformer block to alleviate the enormous computational load associated with non-overlapping window-based self-attention. Additionally, it incorporates depth-wise convolution in the feed-forward network to enhance its ability to capture local context. This study is assessed by comparing the results obtained for popular datasets to those obtained by other techniques in the domain.

Keywords: single image super resolution, computer vision, vision transformers, image restoration

Procedia PDF Downloads 105

3049 Optimization of Fused Deposition Modeling 3D Printing Process via Preprocess Calibration Routine Using Low-Cost Thermal Sensing

Authors: Raz Flieshman, Adam Michael Altenbuchner, Jörg Krüger

Abstract:

This paper presents an approach to optimizing the Fused Deposition Modeling (FDM) 3D printing process through a preprocess calibration routine of printing parameters. The core of this method involves the use of a low-cost thermal sensor capable of measuring tempera-tures within the range of -20 to 500 degrees Celsius for detailed process observation. The calibration process is conducted by printing a predetermined path while varying the process parameters through machine instructions (g-code). This enables the extraction of critical thermal, dimensional, and surface properties along the printed path. The calibration routine utilizes computer vision models to extract features and metrics from the thermal images, in-cluding temperature distribution, layer adhesion quality, surface roughness, and dimension-al accuracy and consistency. These extracted properties are then analyzed to optimize the process parameters to achieve the desired qualities of the printed material. A significant benefit of this calibration method is its potential to create printing parameter profiles for new polymer and composite materials, thereby enhancing the versatility and application range of FDM 3D printing. The proposed method demonstrates significant potential in enhancing the precision and reliability of FDM 3D printing, making it a valuable contribution to the field of additive manufacturing.

Keywords: FDM 3D printing, preprocess calibration, thermal sensor, process optimization, additive manufacturing, computer vision, material profiles

Procedia PDF Downloads 41

3048 Facial Expression Recognition Using Sparse Gaussian Conditional Random Field

Authors: Mohammadamin Abbasnejad

Abstract:

The analysis of expression and facial Action Units (AUs) detection are very important tasks in fields of computer vision and Human Computer Interaction (HCI) due to the wide range of applications in human life. Many works have been done during the past few years which has their own advantages and disadvantages. In this work, we present a new model based on Gaussian Conditional Random Field. We solve our objective problem using ADMM and we show how well the proposed model works. We train and test our work on two facial expression datasets, CK+, and RU-FACS. Experimental evaluation shows that our proposed approach outperform state of the art expression recognition.

Keywords: Gaussian Conditional Random Field, ADMM, convergence, gradient descent

Procedia PDF Downloads 356

3047 Exploring Dynamics of Regional Creative Economy

Authors: Ari Lindeman, Melina Maunula, Jani Kiviranta, Ronja Pölkki

Abstract:

The aim of this paper is to build a vision of the utilization of creative industry competences in industrial and services firms connected to Kymenlaakso region, Finland, smart specialization focus areas. Research indicates that creativity and the use of creative industry’s inputs can enhance innovation and competitiveness. Currently creative methods and services are underutilized in regional businesses and the added value they provide is not well grasped. Methodologically, the research adopts a qualitative exploratory approach. Data is collected in multiple ways including a survey, focus groups, and interviews. Theoretically, the paper contributes to the discussion about the use creative industry competences in regional development, and argues for building regional creative economy ecosystems in close co-operation with regional strategies and traditional industries rather than as treating regional creative industry ecosystem initiatives separate from them. The practical contribution of the paper is the creative vision for the use of regional authorities in updating smart specialization strategy as well as boosting industrial and creative & cultural sectors’ competitiveness. The paper also illustrates a research-based model of vision building.

Keywords: business, cooperation, creative economy, regional development, vision

Procedia PDF Downloads 131

3046 Alternative Approach to the Machine Vision System Operating for Solving Industrial Control Issue

Authors: M. S. Nikitenko, S. A. Kizilov, D. Y. Khudonogov

Abstract:

The paper considers an approach to a machine vision operating system combined with using a grid of light markers. This approach is used to solve several scientific and technical problems, such as measuring the capability of an apron feeder delivering coal from a lining return port to a conveyor in the technology of mining high coal releasing to a conveyor and prototyping an autonomous vehicle obstacle detection system. Primary verification of a method of calculating bulk material volume using three-dimensional modeling and validation in laboratory conditions with relative errors calculation were carried out. A method of calculating the capability of an apron feeder based on a machine vision system and a simplifying technology of a three-dimensional modelled examined measuring area with machine vision was offered. The proposed method allows measuring the volume of rock mass moved by an apron feeder using machine vision. This approach solves the volume control issue of coal produced by a feeder while working off high coal by lava complexes with release to a conveyor with accuracy applied for practical application. The developed mathematical apparatus for measuring feeder productivity in kg/s uses only basic mathematical functions such as addition, subtraction, multiplication, and division. Thus, this fact simplifies software development, and this fact expands the variety of microcontrollers and microcomputers suitable for performing tasks of calculating feeder capability. A feature of an obstacle detection issue is to correct distortions of the laser grid, which simplifies their detection. The paper presents algorithms for video camera image processing and autonomous vehicle model control based on obstacle detection machine vision systems. A sample fragment of obstacle detection at the moment of distortion with the laser grid is demonstrated.

Keywords: machine vision, machine vision operating system, light markers, measuring capability, obstacle detection system, autonomous transport

Procedia PDF Downloads 114

3045 Drastic Improvement in Vision Following Surgical Excision of Juvenile Nasopharyngeal Angiofibroma with Compressive Optic Neuropathy

Authors: Sweta Das

Abstract:

This case report is a 15-year-old male who presented with painless unilateral vision loss from left optic nerve compression due to juvenile nasopharyngeal angiofibroma. JNA is a rare, benign neoplasm that causes intracranial and intraorbital bone destruction and extends aggressively into surrounding soft tissues. It accounts for <1% of all head and neck tumors, is predominantly found in pediatric males and tends to affect indigenous population disproportionately. The most common presenting symptom for JNA is epistaxis and nasal obstruction. However, it can invade orbit, chiasm and pituitary gland, causing loss of vision and field. Visual acuity and function near normalized following surgical excision. Optometry plays an important role in the diagnosis and co-management of JNA with optic nerve compression by closely monitoring afferent optic nerve function and structure, and extraocular motility. Visual function and acuity in patients with short-term compressive neuropathy may drastically improve following surgical resection as this case demonstrates.

Keywords: orbital mass, painless monocular vision loss, compressive optic neuropathy, pediatric tumor

Procedia PDF Downloads 60

3044 American Sign Language Recognition System

Authors: Rishabh Nagpal, Riya Uchagaonkar, Venkata Naga Narasimha Ashish Mernedi, Ahmed Hambaba

Abstract:

The rapid evolution of technology in the communication sector continually seeks to bridge the gap between different communities, notably between the deaf community and the hearing world. This project develops a comprehensive American Sign Language (ASL) recognition system, leveraging the advanced capabilities of convolutional neural networks (CNNs) and vision transformers (ViTs) to interpret and translate ASL in real-time. The primary objective of this system is to provide an effective communication tool that enables seamless interaction through accurate sign language interpretation. The architecture of the proposed system integrates dual networks -VGG16 for precise spatial feature extraction and vision transformers for contextual understanding of the sign language gestures. The system processes live input, extracting critical features through these sophisticated neural network models, and combines them to enhance gesture recognition accuracy. This integration facilitates a robust understanding of ASL by capturing detailed nuances and broader gesture dynamics. The system is evaluated through a series of tests that measure its efficiency and accuracy in real-world scenarios. Results indicate a high level of precision in recognizing diverse ASL signs, substantiating the potential of this technology in practical applications. Challenges such as enhancing the system’s ability to operate in varied environmental conditions and further expanding the dataset for training were identified and discussed. Future work will refine the model’s adaptability and incorporate haptic feedback to enhance the interactivity and richness of the user experience. This project demonstrates the feasibility of an advanced ASL recognition system and lays the groundwork for future innovations in assistive communication technologies.

Keywords: sign language, computer vision, vision transformer, VGG16, CNN

Procedia PDF Downloads 43

3043 Using Computer Vision and Machine Learning to Improve Facility Design for Healthcare Facility Worker Safety

Authors: Hengameh Hosseini

Abstract:

Design of large healthcare facilities – such as hospitals, multi-service line clinics, and nursing facilities - that can accommodate patients with wide-ranging disabilities is a challenging endeavor and one that is poorly understood among healthcare facility managers, administrators, and executives. An even less-understood extension of this problem is the implications of weakly or insufficiently accommodative design of facilities for healthcare workers in physically-intensive jobs who may also suffer from a range of disabilities and who are therefore at increased risk of workplace accident and injury. Combine this reality with the vast range of facility types, ages, and designs, and the problem of universal accommodation becomes even more daunting and complex. In this study, we focus on the implication of facility design for healthcare workers suffering with low vision who also have physically active jobs. The points of difficulty are myriad and could span health service infrastructure, the equipment used in health facilities, and transport to and from appointments and other services can all pose a barrier to health care if they are inaccessible, less accessible, or even simply less comfortable for people with various disabilities. We conduct a series of surveys and interviews with employees and administrators of 7 facilities of a range of sizes and ownership models in the Northeastern United States and combine that corpus with in-facility observations and data collection to identify five major points of failure common to all the facilities that we concluded could pose safety threats to employees with vision impairments, ranging from very minor to severe. We determine that lack of design empathy is a major commonality among facility management and ownership. We subsequently propose three methods for remedying this lack of empathy-informed design, to remedy the dangers posed to employees: the use of an existing open-sourced Augmented Reality application to simulate the low-vision experience for designers and managers; the use of a machine learning model we develop to automatically infer facility shortcomings from large datasets of recorded patient and employee reviews and feedback; and the use of a computer vision model fine tuned on images of each facility to infer and predict facility features, locations, and workflows, that could again pose meaningful dangers to visually impaired employees of each facility. After conducting a series of real-world comparative experiments with each of these approaches, we conclude that each of these are viable solutions under particular sets of conditions, and finally characterize the range of facility types, workforce composition profiles, and work conditions under which each of these methods would be most apt and successful.

Keywords: artificial intelligence, healthcare workers, facility design, disability, visually impaired, workplace safety

Procedia PDF Downloads 116

3042 Classical Myths in Modern Drama: A Study of the Vision of Jean Anouilh in Antigone

Authors: Azza Taha Zaki

Abstract:

Modern drama was characterised by realism and naturalism as dominant literary movements that focused on contemporary people and their issues to reflect the status of modern man and his environment. However, some modern dramatists have often fallen on classical mythology in ancient Greek tragedies to create a sense of the universality of the human experience. The tragic overtones of classical myths have helped modern dramatists in their attempts to create an enduring piece by evoking the majestic grandeur of the ancient myths and the heroic struggle of man against forces he cannot fight. Myths have continued to appeal to modern playwrights not only for the plot and narrative material but also for the vision and insight into the human experience and human condition. This paper intends to study how the reworking of Sophocles’ Antigone by Jean Anouilh in his Antigone, written in 1942 at the height of the Second World War and during the German occupation of his country, France, fits his own purpose and his own time. The paper will also offer an analysis of the vision in both plays to show how Anouilh has used the classical Antigone freely to produce a modern vision of the dilemma of man when faced by personal and national conflicts.

Keywords: Anouilh, Antigone, drama, Greek tragedy, modern, myth, sophocles

Procedia PDF Downloads 182

3041 Glaucoma Detection in Retinal Tomography Using the Vision Transformer

Authors: Sushish Baral, Pratibha Joshi, Yaman Maharjan

Abstract:

Glaucoma is a chronic eye condition that causes vision loss that is irreversible. Early detection and treatment are critical to prevent vision loss because it can be asymptomatic. For the identification of glaucoma, multiple deep learning algorithms are used. Transformer-based architectures, which use the self-attention mechanism to encode long-range dependencies and acquire extremely expressive representations, have recently become popular. Convolutional architectures, on the other hand, lack knowledge of long-range dependencies in the image due to their intrinsic inductive biases. The aforementioned statements inspire this thesis to look at transformer-based solutions and investigate the viability of adopting transformer-based network designs for glaucoma detection. Using retinal fundus images of the optic nerve head to develop a viable algorithm to assess the severity of glaucoma necessitates a large number of well-curated images. Initially, data is generated by augmenting ocular pictures. After that, the ocular images are pre-processed to make them ready for further processing. The system is trained using pre-processed images, and it classifies the input images as normal or glaucoma based on the features retrieved during training. The Vision Transformer (ViT) architecture is well suited to this situation, as it allows the self-attention mechanism to utilise structural modeling. Extensive experiments are run on the common dataset, and the results are thoroughly validated and visualized.

Keywords: glaucoma, vision transformer, convolutional architectures, retinal fundus images, self-attention, deep learning

Procedia PDF Downloads 191