Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 27

Computer Vision Related Abstracts

27 Development of a Computer Vision System for the Blind and Visually Impaired Person

Authors: Jr., Roselyn A. Maaño, Rodrigo C. Belleza, Karl Patrick E. Camota, Darwin Kim Q. Bulawan


Eyes are an essential and conspicuous organ of the human body. Human eyes are outward and inward portals of the body that allows to see the outside world and provides glimpses into ones inner thoughts and feelings. Inevitable blindness and visual impairments may result from eye-related disease, trauma, or congenital or degenerative conditions that cannot be corrected by conventional means. The study emphasizes innovative tools that will serve as an aid to the blind and visually impaired (VI) individuals. The researchers fabricated a prototype that utilizes the Microsoft Kinect for Windows and Arduino microcontroller board. The prototype facilitates advanced gesture recognition, voice recognition, obstacle detection and indoor environment navigation. Open Computer Vision (OpenCV) performs image analysis, and gesture tracking to transform Kinect data to the desired output. A computer vision technology device provides greater accessibility for those with vision impairments.

Keywords: Image Analysis, Computer Vision, Algorithms, Embedded Systems, blind

Procedia PDF Downloads 150
26 Vision Based People Tracking System

Authors: Boukerch Haroun, Luo Qing Sheng, Li Hua Shi, Boukraa Sebti


In this paper we present the design and the implementation of a target tracking system where the target is set to be a moving person in a video sequence. The system can be applied easily as a vision system for mobile robot. The system is composed of two major parts the first is the detection of the person in the video frame using the SVM learning machine based on the “HOG” descriptors. The second part is the tracking of a moving person it’s done by using a combination of the Kalman filter and a modified version of the Camshift tracking algorithm by adding the target motion feature to the color feature, the experimental results had shown that the new algorithm had overcame the traditional Camshift algorithm in robustness and in case of occlusion.

Keywords: Computer Vision, Kalman Filter, camshift algorithm, object tracking

Procedia PDF Downloads 267
25 Objects Tracking in Catadioptric Images Using Spherical Snake

Authors: Mohammed Rziza, Khald Anisse, Amina Radgui


Tracking objects on video sequences is a very challenging task in many works in computer vision applications. However, there is no article that treats this topic in catadioptric vision. This paper is an attempt that tries to describe a new approach of omnidirectional images processing based on inverse stereographic projection in the half-sphere. We used the spherical model proposed by Gayer and al. For object tracking, our work is based on snake method, with optimization using the Greedy algorithm, by adapting its different operators. The algorithm will respect the deformed geometries of omnidirectional images such as spherical neighborhood, spherical gradient and reformulation of optimization algorithm on the spherical domain. This tracking method that we call "spherical snake" permitted to know the change of the shape and the size of object in different replacements in the spherical image.

Keywords: Computer Vision, object tracking, spherical snake, omnidirectional image, inverse stereographic projection

Procedia PDF Downloads 235
24 Improved Dynamic Bayesian Networks Applied to Arabic On Line Characters Recognition

Authors: Redouane Tlemsani, Abdelkader Benyettou


Work is in on line Arabic character recognition and the principal motivation is to study the Arab manuscript with on line technology. This system is a Markovian system, which one can see as like a Dynamic Bayesian Network (DBN). One of the major interests of these systems resides in the complete models training (topology and parameters) starting from training data. Our approach is based on the dynamic Bayesian Networks formalism. The DBNs theory is a Bayesians networks generalization to the dynamic processes. Among our objective, amounts finding better parameters, which represent the links (dependences) between dynamic network variables. In applications in pattern recognition, one will carry out the fixing of the structure, which obliges us to admit some strong assumptions (for example independence between some variables). Our application will relate to the Arabic isolated characters on line recognition using our laboratory database: NOUN. A neural tester proposed for DBN external optimization. The DBN scores and DBN mixed are respectively 70.24% and 62.50%, which lets predict their further development; other approaches taking account time were considered and implemented until obtaining a significant recognition rate 94.79%.

Keywords: Pattern Recognition, Computer Vision, Arabic on line character recognition, dynamic Bayesian network

Procedia PDF Downloads 313
23 Anisotropic Approach for Discontinuity Preserving in Optical Flow Estimation

Authors: Sanjeev Kumar, Pushpendra Kumar, R. Balasubramanian


Estimation of optical flow from a sequence of images using variational methods is one of the most successful approach. Discontinuity between different motions is one of the challenging problem in flow estimation. In this paper, we design a new anisotropic diffusion operator, which is able to provide smooth flow over a region and efficiently preserve discontinuity in optical flow. This operator is designed on the basis of intensity differences of the pixels and isotropic operator using exponential function. The combination of these are used to control the propagation of flow. Experimental results on the different datasets verify the robustness and accuracy of the algorithm and also validate the effect of anisotropic operator in the discontinuity preserving.

Keywords: Computer Vision, Variational Methods, optical flow, anisotropic operator

Procedia PDF Downloads 534
22 High Level Synthesis of Canny Edge Detection Algorithm on Zynq Platform

Authors: Hanaa M. Abdelgawad, Mona Safar, Ayman M. Wahba


Real-time image and video processing is a demand in many computer vision applications, e.g. video surveillance, traffic management and medical imaging. The processing of those video applications requires high computational power. Therefore, the optimal solution is the collaboration of CPU and hardware accelerators. In this paper, a Canny edge detection hardware accelerator is proposed. Canny edge detection is one of the common blocks in the pre-processing phase of image and video processing pipeline. Our presented approach targets offloading the Canny edge detection algorithm from processing system (PS) to programmable logic (PL) taking the advantage of High Level Synthesis (HLS) tool flow to accelerate the implementation on Zynq platform. The resulting implementation enables up to a 100x performance improvement through hardware acceleration. The CPU utilization drops down and the frame rate jumps to 60 fps of 1080p full HD input video stream.

Keywords: Computer Vision, Hardware accelerators, high level synthesis, canny edge detection

Procedia PDF Downloads 339
21 Human Motion Capture: New Innovations in the Field of Computer Vision

Authors: Najm Alotaibi


Human motion capture has become one of the major area of interest in the field of computer vision. Some of the major application areas that have been rapidly evolving include the advanced human interfaces, virtual reality and security/surveillance systems. This study provides a brief overview of the techniques and applications used for the markerless human motion capture, which deals with analyzing the human motion in the form of mathematical formulations. The major contribution of this research is that it classifies the computer vision based techniques of human motion capture based on the taxonomy, and then breaks its down into four systematically different categories of tracking, initialization, pose estimation and recognition. The detailed descriptions and the relationships descriptions are given for the techniques of tracking and pose estimation. The subcategories of each process are further described. Various hypotheses have been used by the researchers in this domain are surveyed and the evolution of these techniques have been explained. It has been concluded in the survey that most researchers have focused on using the mathematical body models for the markerless motion capture.

Keywords: Computer Vision, Tracking, human motion capture, vision-based

Procedia PDF Downloads 194
20 Inspection of Railway Track Fastening Elements Using Artificial Vision

Authors: Abdelkrim Belhaoua, Jean-Pierre Radoux


In France, the railway network is one of the main transport infrastructures and is the second largest European network. Therefore, railway inspection is an important task in railway maintenance to ensure safety for passengers using significant means in personal and technical facilities. Artificial vision has recently been applied to several railway applications due to its potential to improve the efficiency and accuracy when analyzing large databases of acquired images. In this paper, we present a vision system able to detect fastening elements based on artificial vision approach. This system acquires railway images using a CCD camera installed under a control carriage. These images are stitched together before having processed. Experimental results are presented to show that the proposed method is robust for detection fasteners in a complex environment.

Keywords: Image Processing, Computer Vision, Neural Network, image stitching, railway inspection, fastener recognition

Procedia PDF Downloads 251
19 Robust and Real-Time Traffic Counting System

Authors: Hossam M. Moftah, Aboul Ella Hassanien


In the recent years the importance of automatic traffic control has increased due to the traffic jams problem especially in big cities for signal control and efficient traffic management. Traffic counting as a kind of traffic control is important to know the road traffic density in real time. This paper presents a fast and robust traffic counting system using different image processing techniques. The proposed system is composed of the following four fundamental building phases: image acquisition, pre-processing, object detection, and finally counting the connected objects. The object detection phase is comprised of the following five steps: subtracting the background, converting the image to binary, closing gaps and connecting nearby blobs, image smoothing to remove noises and very small objects, and detecting the connected objects. Experimental results show the great success of the proposed approach.

Keywords: Image Processing, Computer Vision, Traffic Management, Object Detection, traffic counting

Procedia PDF Downloads 147
18 Paddy/Rice Singulation for Determination of Husking Efficiency and Damage Using Machine Vision

Authors: A. Jafari, S. Minaei, M. Shaker, M. H. Khoshtaghaza, A. Banakar


In this study a system of machine vision and singulation was developed to separate paddy from rice and determine paddy husking and rice breakage percentages. The machine vision system consists of three main components including an imaging chamber, a digital camera, a computer equipped with image processing software. The singulation device consists of a kernel holding surface, a motor with vacuum fan, and a dimmer. For separation of paddy from rice (in the image), it was necessary to set a threshold. Therefore, some images of paddy and rice were sampled and the RGB values of the images were extracted using MATLAB software. Then mean and standard deviation of the data were determined. An Image processing algorithm was developed using MATLAB to determine paddy/rice separation and rice breakage and paddy husking percentages, using blue to red ratio. Tests showed that, a threshold of 0.75 is suitable for separating paddy from rice kernels. Results from the evaluation of the image processing algorithm showed that the accuracies obtained with the algorithm were 98.36% and 91.81% for paddy husking and rice breakage percentage, respectively. Analysis also showed that a suction of 45 mmHg to 50 mmHg yielding 81.3% separation efficiency is appropriate for operation of the kernel singulation system.

Keywords: Computer Vision, breakage, husking, rice kernel

Procedia PDF Downloads 189
17 Telecontrolled Service Robots for Increasing the Quality of Life of Elderly and Disabled

Authors: Nayden Chivarov, Denis Chikurtev, Kaloyan Yovchev, Nedko Shivarov


This paper represents methods for improving the efficiency and precision of service mobile robot. This robot is used for increasing the quality of life of elderly and disabled people. The key concept of the proposed Intelligent Service Mobile Robot is its easier adaptability to achieve services for a wide range of Elderly or Disabled Person’s needs, by performing different tasks for supporting Elderly or Disabled Persons care. We developed robot autonomous navigation and computer vision systems in order to recognize different objects and bring them to the people. Web based user interface is developed to provide easy access and tele-control of the robot by any device through the internet. In this study algorithms for object recognition and localization are proposed for providing successful object recognition and accuracy in the positioning. Different methods for sending movement commands to the mobile robot system are proposed and evaluated. After executing some experiments to show the results of the research, we can summarize that these systems and algorithms provide good control of the service mobile robot and it will be more useful to help the elderly and disabled persons.

Keywords: Computer Vision, Mobile robot, autonomous navigation, ROS, service robot, web user interface

Procedia PDF Downloads 216
16 Detecting Tomato Flowers in Greenhouses Using Computer Vision

Authors: Yael Edan, Dor Oppenheim, Guy Shani


This paper presents an image analysis algorithm to detect and count yellow tomato flowers in a greenhouse with uneven illumination conditions, complex growth conditions and different flower sizes. The algorithm is designed to be employed on a drone that flies in greenhouses to accomplish several tasks such as pollination and yield estimation. Detecting the flowers can provide useful information for the farmer, such as the number of flowers in a row, and the number of flowers that were pollinated since the last visit to the row. The developed algorithm is designed to handle the real world difficulties in a greenhouse which include varying lighting conditions, shadowing, and occlusion, while considering the computational limitations of the simple processor in the drone. The algorithm identifies flowers using an adaptive global threshold, segmentation over the HSV color space, and morphological cues. The adaptive threshold divides the images into darker and lighter images. Then, segmentation on the hue, saturation and volume is performed accordingly, and classification is done according to size and location of the flowers. 1069 images of greenhouse tomato flowers were acquired in a commercial greenhouse in Israel, using two different RGB Cameras – an LG G4 smartphone and a Canon PowerShot A590. The images were acquired from multiple angles and distances and were sampled manually at various periods along the day to obtain varying lighting conditions. Ground truth was created by manually tagging approximately 25,000 individual flowers in the images. Sensitivity analyses on the acquisition angle of the images, periods throughout the day, different cameras and thresholding types were performed. Precision, recall and their derived F1 score were calculated. Results indicate better performance for the view angle facing the flowers than any other angle. Acquiring images in the afternoon resulted with the best precision and recall results. Applying a global adaptive threshold improved the median F1 score by 3%. Results showed no difference between the two cameras used. Using hue values of 0.12-0.18 in the segmentation process provided the best results in precision and recall, and the best F1 score. The precision and recall average for all the images when using these values was 74% and 75% respectively with an F1 score of 0.73. Further analysis showed a 5% increase in precision and recall when analyzing images acquired in the afternoon and from the front viewpoint.

Keywords: Image Processing, Agricultural Engineering, Computer Vision, flower detection

Procedia PDF Downloads 168
15 Comparative Analysis of Feature Extraction and Classification Techniques

Authors: Abhishek Jain, R. L. Ujjwal


In the field of computer vision, most facial variations such as identity, expression, emotions and gender have been extensively studied. Automatic age estimation has been rarely explored. With age progression of a human, the features of the face changes. This paper is providing a new comparable study of different type of algorithm to feature extraction [Hybrid features using HAAR cascade & HOG features] & classification [KNN & SVM] training dataset. By using these algorithms we are trying to find out one of the best classification algorithms. Same thing we have done on the feature selection part, we extract the feature by using HAAR cascade and HOG. This work will be done in context of age group classification model.

Keywords: Computer Vision, Face Detection, age group

Procedia PDF Downloads 209
14 Applying Big Data to Understand Urban Design Quality: The Correlation between Social Activities and Automated Pedestrian Counts in Dilworth Park, Philadelphia

Authors: Jae Min Lee


Presence of people and intensity of activities have been widely accepted as an indicator for successful public spaces in urban design literature. This study attempts to predict the qualitative indicators, presence of people and intensity of activities, with the quantitative measurements of pedestrian counting. We conducted participant observation in Dilworth Park, Philadelphia to collect the total number of people and activities in the park. Then, the participant observation data is compared with detailed pedestrian counts at 10 exit locations to estimate the number of park users. The study found that there is a clear correlation between the intensity of social activities and automated pedestrian counts.

Keywords: Computer Vision, urban Design, Public space, automated pedestrian count

Procedia PDF Downloads 236
13 Human Computer Interaction Using Computer Vision and Speech Processing

Authors: Shreyansh Jain Jeetmal, Shobith P. Chadaga, Shreyas H. Srinivas


Internet of Things (IoT) is seen as the next major step in the ongoing revolution in the Information Age. It is predicted that in the near future billions of embedded devices will be communicating with each other to perform a plethora of tasks with or without human intervention. One of the major ongoing hotbed of research activity in IoT is Human Computer Interaction (HCI). HCI is used to facilitate communication between an intelligent system and a user. An intelligent system typically comprises of a system consisting of various sensors, actuators and embedded controllers which communicate with each other to monitor data collected from the environment. Communication by the user to the system is typically done using voice. One of the major ongoing applications of HCI is in home automation as a personal assistant. The prime objective of our project is to implement a use case of HCI for home automation. Our system is designed to detect and recognize the users and personalize the appliances in the house according to their individual preferences. Our HCI system is also capable of speaking with the user when certain commands are spoken such as searching on the web for information and controlling appliances. Our system can also monitor the environment in the house such as air quality and gas leakages for added safety.

Keywords: Sensor Networks, Computer Vision, Internet of Things, Human Computer Interaction, android, speech to text, text to speech

Procedia PDF Downloads 232
12 Using Deep Learning Real-Time Object Detection Convolution Neural Networks for Fast Fruit Recognition in the Tree

Authors: K. Bresilla, L. Manfrini, B. Morandi, A. Boini, G. Perulli, L. C. Grappadelli


Image/video processing for fruit in the tree using hard-coded feature extraction algorithms have shown high accuracy during recent years. While accurate, these approaches even with high-end hardware are computationally intensive and too slow for real-time systems. This paper details the use of deep convolution neural networks (CNNs), specifically an algorithm (YOLO - You Only Look Once) with 24+2 convolution layers. Using deep-learning techniques eliminated the need for hard-code specific features for specific fruit shapes, color and/or other attributes. This CNN is trained on more than 5000 images of apple and pear fruits on 960 cores GPU (Graphical Processing Unit). Testing set showed an accuracy of 90%. After this, trained data were transferred to an embedded device (Raspberry Pi gen.3) with camera for more portability. Based on correlation between number of visible fruits or detected fruits on one frame and the real number of fruits on one tree, a model was created to accommodate this error rate. Speed of processing and detection of the whole platform was higher than 40 frames per second. This speed is fast enough for any grasping/harvesting robotic arm or other real-time applications.

Keywords: Artificial Intelligence, Computer Vision, Precision Agriculture, Deep learning, fruit recognition, harvesting robot

Procedia PDF Downloads 259
11 An Exponential Field Path Planning Method for Mobile Robots Integrated with Visual Perception

Authors: Magdy Roman, Mostafa Shoeib, Mostafa Rostom


Global vision, whether provided by overhead fixed cameras, on-board aerial vehicle cameras, or satellite images can always provide detailed information on the environment around mobile robots. In this paper, an intelligent vision-based method of path planning and obstacle avoidance for mobile robots is presented. The method integrates visual perception with a new proposed field-based path-planning method to overcome common path-planning problems such as local minima, unreachable destination and unnecessary lengthy paths around obstacles. The method proposes an exponential angle deviation field around each obstacle that affects the orientation of a close robot. As the robot directs toward, the goal point obstacles are classified into right and left groups, and a deviation angle is exponentially added or subtracted to the orientation of the robot. Exponential field parameters are chosen based on Lyapunov stability criterion to guarantee robot convergence to the destination. The proposed method uses obstacles' shape and location, extracted from global vision system, through a collision prediction mechanism to decide whether to activate or deactivate obstacles field. In addition, a search mechanism is developed in case of robot or goal point is trapped among obstacles to find suitable exit or entrance. The proposed algorithm is validated both in simulation and through experiments. The algorithm shows effectiveness in obstacles' avoidance and destination convergence, overcoming common path planning problems found in classical methods.

Keywords: Computer Vision, Convergence, Mobile Robots, Path Planning, Collision Avoidance

Procedia PDF Downloads 53
10 Non-Targeted Adversarial Image Classification Attack-Region Modification Methods

Authors: Bandar Alahmadi, Lethia Jackson


Machine Learning model is used today in many real-life applications. The safety and security of such model is important, so the results of the model are as accurate as possible. One challenge of machine learning model security is the adversarial examples attack. Adversarial examples are designed by the attacker to cause the machine learning model to misclassify the input. We propose a method to generate adversarial examples to attack image classifiers. We are modifying the successfully classified images, so a classifier misclassifies them after the modification. In our method, we do not update the whole image, but instead we detect the important region, modify it, place it back to the original image, and then run it through a classifier. The algorithm modifies the detected region using two methods. First, it will add abstract image matrix on back of the detected image matrix. Then, it will perform a rotation attack to rotate the detected region around its axes, and embed the trace of image in image background. Finally, the attacked region is placed in its original position, from where it was removed, and a smoothing filter is applied to smooth the background with foreground. We test our method in cascade classifier, and the algorithm is efficient, the classifier confident has dropped to almost zero. We also try it in CNN (Convolutional neural network) with higher setting and the algorithm was successfully worked.

Keywords: Image Processing, Computer Vision, attack, adversarial examples

Procedia PDF Downloads 230
9 A Motion Dictionary to Real-Time Recognition of Sign Language Alphabet Using Dynamic Time Warping and Artificial Neural Network

Authors: Marcio Leal, Marta Villamil


Computacional recognition of sign languages aims to allow a greater social and digital inclusion of deaf people through interpretation of their language by computer. This article presents a model of recognition of two of global parameters from sign languages; hand configurations and hand movements. Hand motion is captured through an infrared technology and its joints are built into a virtual three-dimensional space. A Multilayer Perceptron Neural Network (MLP) was used to classify hand configurations and Dynamic Time Warping (DWT) recognizes hand motion. Beyond of the method of sign recognition, we provide a dataset of hand configurations and motion capture built with help of fluent professionals in sign languages. Despite this technology can be used to translate any sign from any signs dictionary, Brazilian Sign Language (Libras) was used as case study. Finally, the model presented in this paper achieved a recognition rate of 80.4%.

Keywords: Computer Vision, Infrared, Artificial Neural Network, dynamic time warping, sign language recognition

Procedia PDF Downloads 88
8 Data Collection Techniques for Robotics to Identify the Facial Expressions of Traumatic Brain Injured Patients

Authors: Chaudhary Muhammad Aqdus Ilyas, Matthias Rehm, Kamal Nasrollahi, Thomas B. Moeslund


This paper presents the investigation of data collection procedures, associated with robots when placed with traumatic brain injured (TBI) patients for rehabilitation purposes through facial expression and mood analysis. Rehabilitation after TBI is very crucial due to nature of injury and variation in recovery time. It is advantageous to analyze these emotional signals in a contactless manner, due to the non-supportive behavior of patients, limited muscle movements and increase in negative emotional expressions. This work aims at the development of framework where robots can recognize TBI emotions through facial expressions to perform rehabilitation tasks by physical, cognitive or interactive activities. The result of these studies shows that with customized data collection strategies, proposed framework identify facial and emotional expressions more accurately that can be utilized in enhancing recovery treatment and social interaction in robotic context.

Keywords: Computer Vision, Rehabilitation, Robots, convolution neural network- long short term memory network (CNN-LSTM), facial expression and mood recognition, multimodal (RGB-thermal) analysis, traumatic brain injured patients

Procedia PDF Downloads 25
7 Non-Targeted Adversarial Object Detection Attack: Fast Gradient Sign Method

Authors: Bandar Alahmadi, Lethia Jackson, Manohar Mareboyana


Today, there are many applications that are using computer vision models, such as face recognition, image classification, and object detection. The accuracy of these models is very important for the performance of these applications. One challenge that facing the computer vision models is the adversarial examples attack. In computer vision, the adversarial example is an image that is intentionally designed to cause the machine learning model to misclassify it. One of very well-known method that is used to attack the Convolution Neural Network (CNN) is Fast Gradient Sign Method (FGSM). The goal of this method is to find the perturbation that can fool the CNN using the gradient of the cost function of CNN. In this paper, we introduce a novel model that can attack Regional-Convolution Neural Network (R-CNN) that use FGSM. We first extract the regions that are detected by R-CNN, and then we resize these regions into the size of regular images. Then, we find the best perturbation of the regions that can fool CNN using FGSM. Next, we add the resulted perturbation to the attacked region to get a new region image that looks similar to the original image to human eyes. Finally, we placed the regions back to the original image and test the R-CNN with the attacked images. Our model could drop the accuracy of the R-CNN when we tested with Pascal VOC 2012 dataset.

Keywords: Image Processing, Computer Vision, attack, adversarial examples

Procedia PDF Downloads 13
6 Online Pose Estimation and Tracking Approach with Siamese Region Proposal Network

Authors: Cunyue Lu, Lingwei Quan, Cheng Fang


Human pose estimation and tracking are to accurately identify and locate the positions of human joints in the video. It is a computer vision task which is of great significance for human motion recognition, behavior understanding and scene analysis. There has been remarkable progress on human pose estimation in recent years. However, more researches are needed for human pose tracking especially for online tracking. In this paper, a framework, called PoseSRPN, is proposed for online single-person pose estimation and tracking. We use Siamese network attaching a pose estimation branch to incorporate Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) into one framework. The pose estimation branch has a simple network structure that replaces the complex upsampling and convolution network structure with deconvolution. By augmenting the loss of fully convolutional Siamese network with the pose estimation task, pose estimation and tracking can be trained in one stage. Once trained, PoseSRPN only relies on a single bounding box initialization and producing human joints location. The experimental results show that while maintaining the good accuracy of pose estimation on COCO and PoseTrack datasets, the proposed method achieves a speed of 59 frame/s, which is superior to other pose tracking frameworks.

Keywords: Computer Vision, pose estimation, pose tracking, Siamese network

Procedia PDF Downloads 1
5 Automatic Identification and Monitoring of Wildlife via Computer Vision and IoT

Authors: Bilal Arshad, Johan Barthelemy, Elliott Pilton, Pascal Perez


Getting reliable, informative, and up-to-date information about the location, mobility, and behavioural patterns of animals will enhance our ability to research and preserve biodiversity. The fusion of infra-red sensors and camera traps offers an inexpensive way to collect wildlife data in the form of images. However, extracting useful data from these images, such as the identification and counting of animals remains a manual, time-consuming, and costly process. In this paper, we demonstrate that such information can be automatically retrieved by using state-of-the-art deep learning methods. Another major challenge that ecologists are facing is the recounting of one single animal multiple times due to that animal reappearing in other images taken by the same or other camera traps. Nonetheless, such information can be extremely useful for tracking wildlife and understanding its behaviour. To tackle the multiple count problem, we have designed a meshed network of camera traps, so they can share the captured images along with timestamps, cumulative counts, and dimensions of the animal. The proposed method takes leverage of edge computing to support real-time tracking and monitoring of wildlife. This method has been validated in the field and can be easily extended to other applications focusing on wildlife monitoring and management, where the traditional way of monitoring is expensive and time-consuming.

Keywords: Computer Vision, Internet of Things, Ecology, Wildlife management, Invasive Species Management

Procedia PDF Downloads 1
4 High Fidelity Interactive Video Segmentation Using Tensor Decomposition, Boundary Loss, Convolutional Tessellations, and Context-Aware Skip Connections

Authors: Anthony D. Rhodes, Manan Goel


We provide a high fidelity deep learning algorithm (HyperSeg) for interactive video segmentation tasks using a dense convolutional network with context-aware skip connections and compressed, 'hypercolumn' image features combined with a convolutional tessellation procedure. In order to maintain high output fidelity, our model crucially processes and renders all image features in high resolution, without utilizing downsampling or pooling procedures. We maintain this consistent, high grade fidelity efficiently in our model chiefly through two means: (1) we use a statistically-principled, tensor decomposition procedure to modulate the number of hypercolumn features and (2) we render these features in their native resolution using a convolutional tessellation technique. For improved pixel-level segmentation results, we introduce a boundary loss function; for improved temporal coherence in video data, we include temporal image information in our model. Through experiments, we demonstrate the improved accuracy of our model against baseline models for interactive segmentation tasks using high resolution video data. We also introduce a benchmark video segmentation dataset, the VFX Segmentation Dataset, which contains over 27,046 high resolution video frames, including green screen and various composited scenes with corresponding, hand-crafted, pixel-level segmentations. Our work presents a improves state of the art segmentation fidelity with high resolution data and can be used across a broad range of application domains, including VFX pipelines and medical imaging disciplines.

Keywords: Computer Vision, Object segmentation, interactive segmentation, model compression

Procedia PDF Downloads 1
3 Image Ranking to Assist Object Labeling for Training Detection Models

Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman


Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack is rarely seen examples. This is due to the process commonly used during human annotation, where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.

Keywords: Computer Vision, Semiconductor, Deep learning, Object Detection

Procedia PDF Downloads 1
2 Deep Learning Based Unsupervised Sport Scene Recognition and Highlights Generation

Authors: Ksenia Meshkova


With increasing amount of multimedia data, it is very important to automate and speed up the process of obtaining meta. This process means not just recognition of some object or its movement, but recognition of the entire scene versus separate frames and having timeline segmentation as a final result. Labeling datasets is time consuming, besides, attributing characteristics to particular scenes is clearly difficult due to their nature. In this article, we will consider autoencoders application to unsupervised scene recognition and clusterization based on interpretable features. Further, we will focus on particular types of auto encoders that relevant to our study. We will take a look at the specificity of deep learning related to information theory and rate-distortion theory and describe the solutions empowering poor interpretability of deep learning in media content processing. As a conclusion, we will present the results of the work of custom framework, based on autoencoders, capable of scene recognition as was deeply studied above, with highlights generation resulted out of this recognition. We will not describe in detail the mathematical description of neural networks work but will clarify the necessary concepts and pay attention to important nuances.

Keywords: Neural Networks, Computer Vision, Representation Learning, autoencoders

Procedia PDF Downloads 1
1 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme


Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: Computer Vision, Finance, Machine Learning, Information Retrieval, natural language processing, Entity Recognition

Procedia PDF Downloads 1