Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 76

Search results for: 3D convolution

76 A Time-Varying and Non-Stationary Convolution Spectral Mixture Kernel for Gaussian Process

Authors: Kai Chen, Shuguang Cui, Feng Yin


Gaussian process (GP) with spectral mixture (SM) kernel demonstrates flexible non-parametric Bayesian learning ability in modeling unknown function. In this work a novel time-varying and non-stationary convolution spectral mixture (TN-CSM) kernel with a significant enhancing of interpretability by using process convolution is introduced. A way decomposing the SM component into an auto-convolution of base SM component and parameterizing it to be input dependent is outlined. Smoothly, performing a convolution between two base SM component yields a novel structure of non-stationary SM component with much better generalized expression and interpretation. The TN-CSM perfectly allows compatibility with the stationary SM kernel in terms of kernel form and spectral base ignored and confused by previous non-stationary kernels. On synthetic and real-world datatsets, experiments show the time-varying characteristics of hyper-parameters in TN-CSM and compare the learning performance of TN-CSM with popular and representative non-stationary GP.

Keywords: Gaussian process, spectral mixture, non-stationary, convolution

Procedia PDF Downloads 38
75 Compensation of Cable Attenuation in Step Current Generators to Enable the Convolution Method for Calibration of Current Transducers

Authors: P. Treyer, M. Kujda, H. Urs


The purpose of this paper is to digitally compensate for the apparent discharge time constant of the coaxial cable so that the current step response is flat and can be used to calibrate current transducers using the convolution method. For proper use of convolution, the step response record length is required to be at least the same as the waveform duration to be evaluated. The current step generator based on the cable discharge is compared to the Blumlein generator. Moreover, the influence of each component of the system on the performance of the step is described, which allows building the appropriate measurement set-up. In the end, the calibration of current viewing resistors dedicated to high current impulse is computed.

Keywords: Blumlein generator, cable attenuation, convolution, current step generator

Procedia PDF Downloads 36
74 Error Estimation for the Reconstruction Algorithm with Fan Beam Geometry

Authors: Nirmal Yadav, Tanuja Srivastava


Shannon theory is an exact method to recover a band limited signals from its sampled values in discrete implementation, using sinc interpolators. But sinc based results are not much satisfactory for band-limited calculations so that convolution with window function, having compact support, has been introduced. Convolution Backprojection algorithm with window function is an approximation algorithm. In this paper, the error has been calculated, arises due to this approximation nature of reconstruction algorithm. This result will be defined for fan beam projection data which is more faster than parallel beam projection.

Keywords: computed tomography, convolution backprojection, radon transform, fan beam

Procedia PDF Downloads 387
73 Two Concurrent Convolution Neural Networks TC*CNN Model for Face Recognition Using Edge

Authors: T. Alghamdi, G. Alaghband


In this paper we develop a model that couples Two Concurrent Convolution Neural Network with different filters (TC*CNN) for face recognition and compare its performance to an existing sequential CNN (base model). We also test and compare the quality and performance of the models on three datasets with various levels of complexity (easy, moderate, and difficult) and show that for the most complex datasets, edges will produce the most accurate and efficient results. We further show that in such cases while Support Vector Machine (SVM) models are fast, they do not produce accurate results.

Keywords: Convolution Neural Network, Edges, Face Recognition , Support Vector Machine.

Procedia PDF Downloads 46
72 Performance Analysis of MIMO-OFDM Using Convolution Codes with QAM Modulation

Authors: I Gede Puja Astawa, Yoedy Moegiharto, Ahmad Zainudin, Imam Dui Agus Salim, Nur Annisa Anggraeni


Performance of Orthogonal Frequency Division Multiplexing (OFDM) system can be improved by adding channel coding (error correction code) to detect and correct the errors that occur during data transmission. One can use the convolution code. This paper presents performance of OFDM using Space Time Block Codes (STBC) diversity technique use QAM modulation with code rate 1/2. The evaluation is done by analyzing the value of Bit Error Rate (BER) vs. Energy per Bit to Noise Power Spectral Density Ratio (Eb/No). This scheme is conducted 256 sub-carrier which transmits Rayleigh multipath channel in OFDM system. To achieve a BER of 10-3 is required 30 dB SNR in SISO-OFDM scheme. For 2x2 MIMO-OFDM scheme requires 10 dB to achieve a BER of 10-3. For 4x4 MIMO-OFDM scheme requires 5 dB while adding convolution in a 4x4 MIMO-OFDM can improve performance up to 0 dB to achieve the same BER. This proves the existence of saving power by 3 dB of 4x4 MIMO-OFDM system without coding, power saving 7 dB of 2x2 MIMO-OFDM system without coding and significant power savings from SISO-OFDM system.

Keywords: convolution code, OFDM, MIMO, QAM, BER

Procedia PDF Downloads 293
71 Simultaneous Determination of Methotrexate and Aspirin Using Fourier Transform Convolution Emission Data under Non-Parametric Linear Regression Method

Authors: Marwa A. A. Ragab, Hadir M. Maher, Eman I. El-Kimary


Co-administration of methotrexate (MTX) and aspirin (ASP) can cause a pharmacokinetic interaction and a subsequent increase in blood MTX concentrations which may increase the risk of MTX toxicity. Therefore, it is important to develop a sensitive, selective, accurate and precise method for their simultaneous determination in urine. A new hybrid chemometric method has been applied to the emission response data of the two drugs. Spectrofluorimetric method for determination of MTX through measurement of its acid-degradation product, 4-amino-4-deoxy-10-methylpteroic acid (4-AMP), was developed. Moreover, the acid-catalyzed degradation reaction enables the spectrofluorimetric determination of ASP through the formation of its active metabolite salicylic acid (SA). The proposed chemometric method deals with convolution of emission data using 8-points sin xi polynomials (discrete Fourier functions) after the derivative treatment of these emission data. The first and second derivative curves (D1 & D2) were obtained first then convolution of these curves was done to obtain first and second derivative under Fourier functions curves (D1/FF) and (D2/FF). This new application was used for the resolution of the overlapped emission bands of the degradation products of both drugs to allow their simultaneous indirect determination in human urine. Not only this chemometric approach was applied to the emission data but also the obtained data were subjected to non-parametric linear regression analysis (Theil’s method). The proposed method was fully validated according to the ICH guidelines and it yielded linearity ranges as follows: 0.05-0.75 and 0.5-2.5 µg mL-1 for MTX and ASP respectively. It was found that the non-parametric method was superior over the parametric one in the simultaneous determination of MTX and ASP after the chemometric treatment of the emission spectra of their degradation products. The work combines the advantages of derivative and convolution using discrete Fourier function together with the reliability and efficacy of the non-parametric analysis of data. The achieved sensitivity along with the low values of LOD (0.01 and 0.06 µg mL-1) and LOQ (0.04 and 0.2 µg mL-1) for MTX and ASP respectively, by the second derivative under Fourier functions (D2/FF) were promising and guarantee its application for monitoring the two drugs in patients’ urine samples.

Keywords: chemometrics, emission curves, derivative, convolution, Fourier transform, human urine, non-parametric regression, Theil’s method

Procedia PDF Downloads 337
70 Satellite Imagery Classification Based on Deep Convolution Network

Authors: Zhong Ma, Zhuping Wang, Congxin Liu, Xiangzeng Liu


Satellite imagery classification is a challenging problem with many practical applications. In this paper, we designed a deep convolution neural network (DCNN) to classify the satellite imagery. The contributions of this paper are twofold — First, to cope with the large-scale variance in the satellite image, we introduced the inception module, which has multiple filters with different size at the same level, as the building block to build our DCNN model. Second, we proposed a genetic algorithm based method to efficiently search the best hyper-parameters of the DCNN in a large search space. The proposed method is evaluated on the benchmark database. The results of the proposed hyper-parameters search method show it will guide the search towards better regions of the parameter space. Based on the found hyper-parameters, we built our DCNN models, and evaluated its performance on satellite imagery classification, the results show the classification accuracy of proposed models outperform the state of the art method.

Keywords: satellite imagery classification, deep convolution network, genetic algorithm, hyper-parameter optimization

Procedia PDF Downloads 206
69 Facial Emotion Recognition Using Deep Learning

Authors: Ashutosh Mishra, Nikhil Goyal


A 3D facial emotion recognition model based on deep learning is proposed in this paper. Two convolution layers and a pooling layer are employed in the deep learning architecture. After the convolution process, the pooling is finished. The probabilities for various classes of human faces are calculated using the sigmoid activation function. To verify the efficiency of deep learning-based systems, a set of faces. The Kaggle dataset is used to verify the accuracy of a deep learning-based face recognition model. The model's accuracy is about 65 percent, which is lower than that of other facial expression recognition techniques. Despite significant gains in representation precision due to the nonlinearity of profound image representations.

Keywords: facial recognition, computational intelligence, convolutional neural network, depth map

Procedia PDF Downloads 36
68 Advances of Image Processing in Precision Agriculture: Using Deep Learning Convolution Neural Network for Soil Nutrient Classification

Authors: Halimatu S. Abdullahi, Ray E. Sheriff, Fatima Mahieddine


Agriculture is essential to the continuous existence of human life as they directly depend on it for the production of food. The exponential rise in population calls for a rapid increase in food with the application of technology to reduce the laborious work and maximize production. Technology can aid/improve agriculture in several ways through pre-planning and post-harvest by the use of computer vision technology through image processing to determine the soil nutrient composition, right amount, right time, right place application of farm input resources like fertilizers, herbicides, water, weed detection, early detection of pest and diseases etc. This is precision agriculture which is thought to be solution required to achieve our goals. There has been significant improvement in the area of image processing and data processing which has being a major challenge. A database of images is collected through remote sensing, analyzed and a model is developed to determine the right treatment plans for different crop types and different regions. Features of images from vegetations need to be extracted, classified, segmented and finally fed into the model. Different techniques have been applied to the processes from the use of neural network, support vector machine, fuzzy logic approach and recently, the most effective approach generating excellent results using the deep learning approach of convolution neural network for image classifications. Deep Convolution neural network is used to determine soil nutrients required in a plantation for maximum production. The experimental results on the developed model yielded results with an average accuracy of 99.58%.

Keywords: convolution, feature extraction, image analysis, validation, precision agriculture

Procedia PDF Downloads 213
67 Using Deep Learning Real-Time Object Detection Convolution Neural Networks for Fast Fruit Recognition in the Tree

Authors: K. Bresilla, L. Manfrini, B. Morandi, A. Boini, G. Perulli, L. C. Grappadelli


Image/video processing for fruit in the tree using hard-coded feature extraction algorithms have shown high accuracy during recent years. While accurate, these approaches even with high-end hardware are computationally intensive and too slow for real-time systems. This paper details the use of deep convolution neural networks (CNNs), specifically an algorithm (YOLO - You Only Look Once) with 24+2 convolution layers. Using deep-learning techniques eliminated the need for hard-code specific features for specific fruit shapes, color and/or other attributes. This CNN is trained on more than 5000 images of apple and pear fruits on 960 cores GPU (Graphical Processing Unit). Testing set showed an accuracy of 90%. After this, trained data were transferred to an embedded device (Raspberry Pi gen.3) with camera for more portability. Based on correlation between number of visible fruits or detected fruits on one frame and the real number of fruits on one tree, a model was created to accommodate this error rate. Speed of processing and detection of the whole platform was higher than 40 frames per second. This speed is fast enough for any grasping/harvesting robotic arm or other real-time applications.

Keywords: artificial intelligence, computer vision, deep learning, fruit recognition, harvesting robot, precision agriculture

Procedia PDF Downloads 294
66 Unsupervised Images Generation Based on Sloan Digital Sky Survey with Deep Convolutional Generative Neural Networks

Authors: Guanghua Zhang, Fubao Wang, Weijun Duan


Convolution neural network (CNN) has attracted more and more attention on recent years. Especially in the field of computer vision and image classification. However, unsupervised learning with CNN has received less attention than supervised learning. In this work, we use a new powerful tool which is deep convolutional generative adversarial networks (DCGANs) to generate images from Sloan Digital Sky Survey. Training by various star and galaxy images, it shows that both the generator and the discriminator are good for unsupervised learning. In this paper, we also took several experiments to choose the best value for hyper-parameters and which could help to stabilize the training process and promise a good quality of the output.

Keywords: convolution neural network, discriminator, generator, unsupervised learning

Procedia PDF Downloads 150
65 Defect Detection for Nanofibrous Images with Deep Learning-Based Approaches

Authors: Gaokai Liu


Automatic defect detection for nanomaterial images is widely required in industrial scenarios. Deep learning approaches are considered as the most effective solutions for the great majority of image-based tasks. In this paper, an edge guidance network for defect segmentation is proposed. First, the encoder path with multiple convolution and downsampling operations is applied to the acquisition of shared features. Then two decoder paths both are connected to the last convolution layer of the encoder and supervised by the edge and segmentation labels, respectively, to guide the whole training process. Meanwhile, the edge and encoder outputs from the same stage are concatenated to the segmentation corresponding part to further tune the segmentation result. Finally, the effectiveness of the proposed method is verified via the experiments on open nanofibrous datasets.

Keywords: deep learning, defect detection, image segmentation, nanomaterials

Procedia PDF Downloads 38
64 Keyframe Extraction Using Face Quality Assessment and Convolution Neural Network

Authors: Rahma Abed, Sahbi Bahroun, Ezzeddine Zagrouba


Due to the huge amount of data in videos, extracting the relevant frames became a necessity and an essential step prior to performing face recognition. In this context, we propose a method for extracting keyframes from videos based on face quality and deep learning for a face recognition task. This method has two steps. We start by generating face quality scores for each face image based on the use of three face feature extractors, including Gabor, LBP, and HOG. The second step consists in training a Deep Convolutional Neural Network in a supervised manner in order to select the frames that have the best face quality. The obtained results show the effectiveness of the proposed method compared to the methods of the state of the art.

Keywords: keyframe extraction, face quality assessment, face in video recognition, convolution neural network

Procedia PDF Downloads 45
63 Edge Detection Using Multi-Agent System: Evaluation on Synthetic and Medical MR Images

Authors: A. Nachour, L. Ouzizi, Y. Aoura


Recent developments on multi-agent system have brought a new research field on image processing. Several algorithms are used simultaneously and improved in deferent applications while new methods are investigated. This paper presents a new automatic method for edge detection using several agents and many different actions. The proposed multi-agent system is based on parallel agents that locally perceive their environment, that is to say, pixels and additional environmental information. This environment is built using Vector Field Convolution that attract free agent to the edges. Problems of partial, hidden or edges linking are solved with the cooperation between agents. The presented method was implemented and evaluated using several examples on different synthetic and medical images. The obtained experimental results suggest that this approach confirm the efficiency and accuracy of detected edge.

Keywords: edge detection, medical MRImages, multi-agent systems, vector field convolution

Procedia PDF Downloads 301
62 ICanny: CNN Modulation Recognition Algorithm

Authors: Jingpeng Gao, Xinrui Mao, Zhibin Deng


Aiming at the low recognition rate on the composite signal modulation in low signal to noise ratio (SNR), this paper proposes a modulation recognition algorithm based on ICanny-CNN. Firstly, the radar signal is transformed into the time-frequency image by Choi-Williams Distribution (CWD). Secondly, we propose an image processing algorithm using the Guided Filter and the threshold selection method, which is combined with the hole filling and the mask operation. Finally, the shallow convolutional neural network (CNN) is combined with the idea of the depth-wise convolution (Dw Conv) and the point-wise convolution (Pw Conv). The proposed CNN is designed to complete image classification and realize modulation recognition of radar signal. The simulation results show that the proposed algorithm can reach 90.83% at 0dB and 71.52% at -8dB. Therefore, the proposed algorithm has a good classification and anti-noise performance in radar signal modulation recognition and other fields.

Keywords: modulation recognition, image processing, composite signal, improved Canny algorithm

Procedia PDF Downloads 56
61 Convolution Neural Network Based on Hypnogram of Sleep Stages to Predict Dosages and Types of Hypnotic Drugs for Insomnia

Authors: Chi Wu, Dean Wu, Wen-Te Liu, Cheng-Yu Tsai, Shin-Mei Hsu, Yin-Tzu Lin, Ru-Yin Yang


Background: The results of previous studies compared the benefits and risks of receiving insomnia medication. However, the effects between hypnotic drugs used and enhancement of sleep quality were still unclear. Objective: The aim of this study is to establish a prediction model for hypnotic drugs' dosage used for insomnia subjects and associated the relationship between sleep stage ratio change and drug types. Methodologies: According to American Academy of Sleep Medicine (AASM) guideline, sleep stages were classified and transformed to hypnogram via the polysomnography (PSG) in a hospital in New Taipei City (Taiwan). The subjects with diagnosis for insomnia without receiving hypnotic drugs treatment were be set as the comparison group. Conversely, hypnotic drugs dosage within the past three months was obtained from the clinical registration for each subject. Furthermore, the collecting subjects were divided into two groups for training and testing. After training convolution neuron network (CNN) to predict types of hypnotics used and dosages are taken, the test group was used to evaluate the accuracy of classification. Results: We recruited 76 subjects in this study, who had been done PSG for transforming hypnogram from their sleep stages. The accuracy of dosages obtained from confusion matrix on the test group by CNN is 81.94%, and accuracy of hypnotic drug types used is 74.22%. Moreover, the subjects with high ratio of wake stage were correctly classified as requiring medical treatment. Conclusion: CNN with hypnogram was potentially used for adjusting the dosage of hypnotic drugs and providing subjects to pre-screening the types of hypnotic drugs taken.

Keywords: convolution neuron network, hypnotic drugs, insomnia, polysomnography

Procedia PDF Downloads 89
60 Convolutional Neural Network Based on Random Kernels for Analyzing Visual Imagery

Authors: Ja-Keoung Koo, Kensuke Nakamura, Hyohun Kim, Dongwha Shin, Yeonseok Kim, Ji-Su Ahn, Byung-Woo Hong


The machine learning techniques based on a convolutional neural network (CNN) have been actively developed and successfully applied to a variety of image analysis tasks including reconstruction, noise reduction, resolution enhancement, segmentation, motion estimation, object recognition. The classical visual information processing that ranges from low level tasks to high level ones has been widely developed in the deep learning framework. It is generally considered as a challenging problem to derive visual interpretation from high dimensional imagery data. A CNN is a class of feed-forward artificial neural network that usually consists of deep layers the connections of which are established by a series of non-linear operations. The CNN architecture is known to be shift invariant due to its shared weights and translation invariance characteristics. However, it is often computationally intractable to optimize the network in particular with a large number of convolution layers due to a large number of unknowns to be optimized with respect to the training set that is generally required to be large enough to effectively generalize the model under consideration. It is also necessary to limit the size of convolution kernels due to the computational expense despite of the recent development of effective parallel processing machinery, which leads to the use of the constantly small size of the convolution kernels throughout the deep CNN architecture. However, it is often desired to consider different scales in the analysis of visual features at different layers in the network. Thus, we propose a CNN model where different sizes of the convolution kernels are applied at each layer based on the random projection. We apply random filters with varying sizes and associate the filter responses with scalar weights that correspond to the standard deviation of the random filters. We are allowed to use large number of random filters with the cost of one scalar unknown for each filter. The computational cost in the back-propagation procedure does not increase with the larger size of the filters even though the additional computational cost is required in the computation of convolution in the feed-forward procedure. The use of random kernels with varying sizes allows to effectively analyze image features at multiple scales leading to a better generalization. The robustness and effectiveness of the proposed CNN based on random kernels are demonstrated by numerical experiments where the quantitative comparison of the well-known CNN architectures and our models that simply replace the convolution kernels with the random filters is performed. The experimental results indicate that our model achieves better performance with less number of unknown weights. The proposed algorithm has a high potential in the application of a variety of visual tasks based on the CNN framework. Acknowledgement—This work was supported by the MISP (Ministry of Science and ICT), Korea, under the National Program for Excellence in SW (20170001000011001) supervised by IITP, and NRF-2014R1A2A1A11051941, NRF2017R1A2B4006023.

Keywords: deep learning, convolutional neural network, random kernel, random projection, dimensionality reduction, object recognition

Procedia PDF Downloads 170
59 The Convolution Recurrent Network of Using Residual LSTM to Process the Output of the Downsampling for Monaural Speech Enhancement

Authors: Shibo Wei, Ting Jiang


Convolutional-recurrent neural networks (CRN) have achieved much success recently in the speech enhancement field. The common processing method is to use the convolution layer to compress the feature space by multiple upsampling and then model the compressed features with the LSTM layer. At last, the enhanced speech is obtained by deconvolution operation to integrate the global information of the speech sequence. However, the feature space compression process may cause the loss of information, so we propose to model the upsampling result of each step with the residual LSTM layer, then join it with the output of the deconvolution layer and input them to the next deconvolution layer, by this way, we want to integrate the global information of speech sequence better. The experimental results show the network model (RES-CRN) we introduce can achieve better performance than LSTM without residual and overlaying LSTM simply in the original CRN in terms of scale-invariant signal-to-distortion ratio (SI-SNR), speech quality (PESQ), and intelligibility (STOI).

Keywords: convolutional-recurrent neural networks, speech enhancement, residual LSTM, SI-SNR

Procedia PDF Downloads 67
58 Multimodal Convolutional Neural Network for Musical Instrument Recognition

Authors: Yagya Raj Pandeya, Joonwhoan Lee


The dynamic behavior of music and video makes it difficult to evaluate musical instrument playing in a video by computer system. Any television or film video clip with music information are rich sources for analyzing musical instruments using modern machine learning technologies. In this research, we integrate the audio and video information sources using convolutional neural network (CNN) and pass network learned features through recurrent neural network (RNN) to preserve the dynamic behaviors of audio and video. We use different pre-trained CNN for music and video feature extraction and then fine tune each model. The music network use 2D convolutional network and video network use 3D convolution (C3D). Finally, we concatenate each music and video feature by preserving the time varying features. The long short term memory (LSTM) network is used for long-term dynamic feature characterization and then use late fusion with generalized mean. The proposed network performs better performance to recognize the musical instrument using audio-video multimodal neural network.

Keywords: multimodal, 3D convolution, music-video feature extraction, generalized mean

Procedia PDF Downloads 94
57 Lean Comic GAN (LC-GAN): a Light-Weight GAN Architecture Leveraging Factorized Convolution and Teacher Forcing Distillation Style Loss Aimed to Capture Two Dimensional Animated Filtered Still Shots Using Mobile Phone Camera and Edge Devices

Authors: Kaustav Mukherjee


In this paper we propose a Neural Style Transfer solution whereby we have created a Lightweight Separable Convolution Kernel Based GAN Architecture (SC-GAN) which will very useful for designing filter for Mobile Phone Cameras and also Edge Devices which will convert any image to its 2D ANIMATED COMIC STYLE Movies like HEMAN, SUPERMAN, JUNGLE-BOOK. This will help the 2D animation artist by relieving to create new characters from real life person's images without having to go for endless hours of manual labour drawing each and every pose of a cartoon. It can even be used to create scenes from real life images.This will reduce a huge amount of turn around time to make 2D animated movies and decrease cost in terms of manpower and time. In addition to that being extreme light-weight it can be used as camera filters capable of taking Comic Style Shots using mobile phone camera or edge device cameras like Raspberry Pi 4,NVIDIA Jetson NANO etc. Existing Methods like CartoonGAN with the model size close to 170 MB is too heavy weight for mobile phones and edge devices due to their scarcity in resources. Compared to the current state of the art our proposed method which has a total model size of 31 MB which clearly makes it ideal and ultra-efficient for designing of camera filters on low resource devices like mobile phones, tablets and edge devices running OS or RTOS. .Owing to use of high resolution input and usage of bigger convolution kernel size it produces richer resolution Comic-Style Pictures implementation with 6 times lesser number of parameters and with just 25 extra epoch trained on a dataset of less than 1000 which breaks the myth that all GAN need mammoth amount of data. Our network reduces the density of the Gan architecture by using Depthwise Separable Convolution which does the convolution operation on each of the RGB channels separately then we use a Point-Wise Convolution to bring back the network into required channel number using 1 by 1 kernel.This reduces the number of parameters substantially and makes it extreme light-weight and suitable for mobile phones and edge devices. The architecture mentioned in the present paper make use of Parameterised Batch Normalization Goodfellow etc al. (Deep Learning OPTIMIZATION FOR TRAINING DEEP MODELS page 320) which makes the network to use the advantage of Batch Norm for easier training while maintaining the non-linear feature capture by inducing the learnable parameters

Keywords: comic stylisation from camera image using GAN, creating 2D animated movie style custom stickers from images, depth-wise separable convolutional neural network for light-weight GAN architecture for EDGE devices, GAN architecture for 2D animated cartoonizing neural style, neural style transfer for edge, model distilation, perceptual loss

Procedia PDF Downloads 33
56 Detecting Manipulated Media Using Deep Capsule Network

Authors: Joseph Uzuazomaro Oju


The ease at which manipulated media can be created, and the increasing difficulty in identifying fake media makes it a great threat. Most of the applications used for the creation of these high-quality fake videos and images are built with deep learning. Hence, the use of deep learning in creating a detection mechanism cannot be overemphasized. Any successful fake media that is being detected before it reached the populace will save people from the self-doubt of either a content is genuine or fake and will ensure the credibility of videos and images. The methodology introduced in this paper approaches the manipulated media detection challenge using a combo of VGG-19 and a deep capsule network. In the case of videos, they are converted into frames, which, in turn, are resized and cropped to the face region. These preprocessed images/videos are fed to the VGG-19 network to extract the latent features. The extracted latent features are inputted into a deep capsule network enhanced with a 3D -convolution dynamic routing agreement. The 3D –convolution dynamic routing agreement algorithm helps to reduce the linkages between capsules networks. Thereby limiting the poor learning shortcoming of multiple capsule network layers. The resultant output from the deep capsule network will indicate a media to be either genuine or fake.

Keywords: deep capsule network, dynamic routing, fake media detection, manipulated media

Procedia PDF Downloads 26
55 A Case Study of Deep Learning for Disease Detection in Crops

Authors: Felipe A. Guth, Shane Ward, Kevin McDonnell


In the precision agriculture area, one of the main tasks is the automated detection of diseases in crops. Machine Learning algorithms have been studied in recent decades for such tasks in view of their potential for improving economic outcomes that automated disease detection may attain over crop fields. The latest generation of deep learning convolution neural networks has presented significant results in the area of image classification. In this way, this work has tested the implementation of an architecture of deep learning convolution neural network for the detection of diseases in different types of crops. A data augmentation strategy was used to meet the requirements of the algorithm implemented with a deep learning framework. Two test scenarios were deployed. The first scenario implemented a neural network under images extracted from a controlled environment while the second one took images both from the field and the controlled environment. The results evaluated the generalisation capacity of the neural networks in relation to the two types of images presented. Results yielded a general classification accuracy of 59% in scenario 1 and 96% in scenario 2.

Keywords: convolutional neural networks, deep learning, disease detection, precision agriculture

Procedia PDF Downloads 150
54 Non-Targeted Adversarial Object Detection Attack: Fast Gradient Sign Method

Authors: Bandar Alahmadi, Manohar Mareboyana, Lethia Jackson


Today, there are many applications that are using computer vision models, such as face recognition, image classification, and object detection. The accuracy of these models is very important for the performance of these applications. One challenge that facing the computer vision models is the adversarial examples attack. In computer vision, the adversarial example is an image that is intentionally designed to cause the machine learning model to misclassify it. One of very well-known method that is used to attack the Convolution Neural Network (CNN) is Fast Gradient Sign Method (FGSM). The goal of this method is to find the perturbation that can fool the CNN using the gradient of the cost function of CNN. In this paper, we introduce a novel model that can attack Regional-Convolution Neural Network (R-CNN) that use FGSM. We first extract the regions that are detected by R-CNN, and then we resize these regions into the size of regular images. Then, we find the best perturbation of the regions that can fool CNN using FGSM. Next, we add the resulted perturbation to the attacked region to get a new region image that looks similar to the original image to human eyes. Finally, we placed the regions back to the original image and test the R-CNN with the attacked images. Our model could drop the accuracy of the R-CNN when we tested with Pascal VOC 2012 dataset.

Keywords: adversarial examples, attack, computer vision, image processing

Procedia PDF Downloads 61
53 Embedded Semantic Segmentation Network Optimized for Matrix Multiplication Accelerator

Authors: Jaeyoung Lee


Autonomous driving systems require high reliability to provide people with a safe and comfortable driving experience. However, despite the development of a number of vehicle sensors, it is difficult to always provide high perceived performance in driving environments that vary from time to season. The image segmentation method using deep learning, which has recently evolved rapidly, provides high recognition performance in various road environments stably. However, since the system controls a vehicle in real time, a highly complex deep learning network cannot be used due to time and memory constraints. Moreover, efficient networks are optimized for GPU environments, which degrade performance in embedded processor environments equipped simple hardware accelerators. In this paper, a semantic segmentation network, matrix multiplication accelerator network (MMANet), optimized for matrix multiplication accelerator (MMA) on Texas instrument digital signal processors (TI DSP) is proposed to improve the recognition performance of autonomous driving system. The proposed method is designed to maximize the number of layers that can be performed in a limited time to provide reliable driving environment information in real time. First, the number of channels in the activation map is fixed to fit the structure of MMA. By increasing the number of parallel branches, the lack of information caused by fixing the number of channels is resolved. Second, an efficient convolution is selected depending on the size of the activation. Since MMA is a fixed, it may be more efficient for normal convolution than depthwise separable convolution depending on memory access overhead. Thus, a convolution type is decided according to output stride to increase network depth. In addition, memory access time is minimized by processing operations only in L3 cache. Lastly, reliable contexts are extracted using the extended atrous spatial pyramid pooling (ASPP). The suggested method gets stable features from an extended path by increasing the kernel size and accessing consecutive data. In addition, it consists of two ASPPs to obtain high quality contexts using the restored shape without global average pooling paths since the layer uses MMA as a simple adder. To verify the proposed method, an experiment is conducted using perfsim, a timing simulator, and the Cityscapes validation sets. The proposed network can process an image with 640 x 480 resolution for 6.67 ms, so six cameras can be used to identify the surroundings of the vehicle as 20 frame per second (FPS). In addition, it achieves 73.1% mean intersection over union (mIoU) which is the highest recognition rate among embedded networks on the Cityscapes validation set.

Keywords: edge network, embedded network, MMA, matrix multiplication accelerator, semantic segmentation network

Procedia PDF Downloads 36
52 Fuzzy Inference-Assisted Saliency-Aware Convolution Neural Networks for Multi-View Summarization

Authors: Tanveer Hussain, Khan Muhammad, Amin Ullah, Mi Young Lee, Sung Wook Baik


The Big Data generated from distributed vision sensors installed on large scale in smart cities create hurdles in its efficient and beneficial exploration for browsing, retrieval, and indexing. This paper presents a three-folded framework for effective video summarization of such data and provide a compact and representative format of Big Video Data. In the first fold, the paper acquires input video data from the installed cameras and collect clues such as type and count of objects and clarity of the view from a chunk of pre-defined number of frames of each view. The decision of representative view selection for a particular interval is based on fuzzy inference system, acquiring a precise and human resembling decision, reinforced by the known clues as a part of the second fold. In the third fold, the paper forwards the selected view frames to the summary generation mechanism that is supported by a saliency-aware convolution neural network (CNN) model. The new trend of fuzzy rules for view selection followed by CNN architecture for saliency computation makes the multi-view video summarization (MVS) framework a suitable candidate for real-world practice in smart cities.

Keywords: big video data analysis, fuzzy logic, multi-view video summarization, saliency detection

Procedia PDF Downloads 77
51 Deep Learning Application for Object Image Recognition and Robot Automatic Grasping

Authors: Shiuh-Jer Huang, Chen-Zon Yan, C. K. Huang, Chun-Chien Ting


Since the vision system application in industrial environment for autonomous purposes is required intensely, the image recognition technique becomes an important research topic. Here, deep learning algorithm is employed in image system to recognize the industrial object and integrate with a 7A6 Series Manipulator for object automatic gripping task. PC and Graphic Processing Unit (GPU) are chosen to construct the 3D Vision Recognition System. Depth Camera (Intel RealSense SR300) is employed to extract the image for object recognition and coordinate derivation. The YOLOv2 scheme is adopted in Convolution neural network (CNN) structure for object classification and center point prediction. Additionally, image processing strategy is used to find the object contour for calculating the object orientation angle. Then, the specified object location and orientation information are sent to robotic controller. Finally, a six-axis manipulator can grasp the specific object in a random environment based on the user command and the extracted image information. The experimental results show that YOLOv2 has been successfully employed to detect the object location and category with confidence near 0.9 and 3D position error less than 0.4 mm. It is useful for future intelligent robotic application in industrial 4.0 environment.

Keywords: deep learning, image processing, convolution neural network, YOLOv2, 7A6 series manipulator

Procedia PDF Downloads 39
50 Coding and Decoding versus Space Diversity for ‎Rayleigh Fading Radio Frequency Channels ‎

Authors: Ahmed Mahmoud Ahmed Abouelmagd


The diversity is the usual remedy of the transmitted signal level variations (Fading phenomena) in radio frequency channels. Diversity techniques utilize two or more copies of a signal and combine those signals to combat fading. The basic concept of diversity is to transmit the signal via several independent diversity branches to get independent signal replicas via time – frequency - space - and polarization diversity domains. Coding and decoding processes can be an alternative remedy for fading phenomena, it cannot increase the channel capacity, but it can improve the error performance. In this paper we propose the use of replication decoding with BCH code class, and Viterbi decoding algorithm with convolution coding; as examples of coding and decoding processes. The results are compared to those obtained from two optimized selection space diversity techniques. The performance of Rayleigh fading channel, as the model considered for radio frequency channels, is evaluated for each case. The evaluation results show that the coding and decoding approaches, especially the BCH coding approach with replication decoding scheme, give better performance compared to that of selection space diversity optimization approaches. Also, an approach for combining the coding and decoding diversity as well as the space diversity is considered, the main disadvantage of this approach is its complexity but it yields good performance results.

Keywords: Rayleigh fading, diversity, BCH codes, Replication decoding, ‎convolution coding, viterbi decoding, space diversity

Procedia PDF Downloads 307
49 Performance Comparison of Deep Convolutional Neural Networks for Binary Classification of Fine-Grained Leaf Images

Authors: Kamal KC, Zhendong Yin, Dasen Li, Zhilu Wu


Intra-plant disease classification based on leaf images is a challenging computer vision task due to similarities in texture, color, and shape of leaves with a slight variation of leaf spot; and external environmental changes such as lighting and background noises. Deep convolutional neural network (DCNN) has proven to be an effective tool for binary classification. In this paper, two methods for binary classification of diseased plant leaves using DCNN are presented; model created from scratch and transfer learning. Our main contribution is a thorough evaluation of 4 networks created from scratch and transfer learning of 5 pre-trained models. Training and testing of these models were performed on a plant leaf images dataset belonging to 16 distinct classes, containing a total of 22,265 images from 8 different plants, consisting of a pair of healthy and diseased leaves. We introduce a deep CNN model, Optimized MobileNet. This model with depthwise separable CNN as a building block attained an average test accuracy of 99.77%. We also present a fine-tuning method by introducing the concept of a convolutional block, which is a collection of different deep neural layers. Fine-tuned models proved to be efficient in terms of accuracy and computational cost. Fine-tuned MobileNet achieved an average test accuracy of 99.89% on 8 pairs of [healthy, diseased] leaf ImageSet.

Keywords: deep convolution neural network, depthwise separable convolution, fine-grained classification, MobileNet, plant disease, transfer learning

Procedia PDF Downloads 75
48 On Fourier Type Integral Transform for a Class of Generalized Quotients

Authors: A. S. Issa, S. K. Q. AL-Omari


In this paper, we investigate certain spaces of generalized functions for the Fourier and Fourier type integral transforms. We discuss convolution theorems and establish certain spaces of distributions for the considered integrals. The new Fourier type integral is well-defined, linear, one-to-one and continuous with respect to certain types of convergences. Many properties and an inverse problem are also discussed in some details.

Keywords: Boehmian, Fourier integral, Fourier type integral, generalized quotient

Procedia PDF Downloads 209
47 Computational Modeling of Load Limits of Carbon Fibre Composite Laminates Subjected to Low-Velocity Impact Utilizing Convolution-Based Fast Fourier Data Filtering Algorithms

Authors: Farhat Imtiaz, Umar Farooq


In this work, we developed a computational model to predict ply level failure in impacted composite laminates. Data obtained from physical testing from flat and round nose impacts of 8-, 16-, 24-ply laminates were considered. Routine inspections of the tested laminates were carried out to approximate ply by ply inflicted damage incurred. Plots consisting of load–time, load–deflection, and energy–time history were drawn to approximate the inflicted damages. Impact test generated unwanted data logged due to restrictions on testing and logging systems were also filtered. Conventional filters (built-in, statistical, and numerical) reliably predicted load thresholds for relatively thin laminates such as eight and sixteen ply panels. However, for relatively thick laminates such as twenty-four ply laminates impacted by flat nose impact generated clipped data which can just be de-noised using oscillatory algorithms. The literature search reveals that modern oscillatory data filtering and extrapolation algorithms have scarcely been utilized. This investigation reports applications of filtering and extrapolation of the clipped data utilising fast Fourier Convolution algorithm to predict load thresholds. Some of the results were related to the impact-induced damage areas identified with Ultrasonic C-scans and found to be in acceptable agreement. Based on consistent findings, utilizing of modern data filtering and extrapolation algorithms to data logged by the existing machines has efficiently enhanced data interpretations without resorting to extra resources. The algorithms could be useful for impact-induced damage approximations of similar cases.

Keywords: fibre reinforced laminates, fast Fourier algorithms, mechanical testing, data filtering and extrapolation

Procedia PDF Downloads 52