Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 435

Search results for: pose normalization

435 Pose Normalization Network for Object Classification

Authors: Bingquan Shen


Convolutional Neural Networks (CNN) have demonstrated their effectiveness in synthesizing 3D views of object instances at various viewpoints. Given the problem where one have limited viewpoints of a particular object for classification, we present a pose normalization architecture to transform the object to existing viewpoints in the training dataset before classification to yield better classification performance. We have demonstrated that this Pose Normalization Network (PNN) can capture the style of the target object and is able to re-render it to a desired viewpoint. Moreover, we have shown that the PNN improves the classification result for the 3D chairs dataset and ShapeNet airplanes dataset when given only images at limited viewpoint, as compared to a CNN baseline.

Keywords: convolutional neural networks, object classification, pose normalization, viewpoint invariant

Procedia PDF Downloads 184
434 Online Pose Estimation and Tracking Approach with Siamese Region Proposal Network

Authors: Cheng Fang, Lingwei Quan, Cunyue Lu


Human pose estimation and tracking are to accurately identify and locate the positions of human joints in the video. It is a computer vision task which is of great significance for human motion recognition, behavior understanding and scene analysis. There has been remarkable progress on human pose estimation in recent years. However, more researches are needed for human pose tracking especially for online tracking. In this paper, a framework, called PoseSRPN, is proposed for online single-person pose estimation and tracking. We use Siamese network attaching a pose estimation branch to incorporate Single-person Pose Tracking (SPT) and Visual Object Tracking (VOT) into one framework. The pose estimation branch has a simple network structure that replaces the complex upsampling and convolution network structure with deconvolution. By augmenting the loss of fully convolutional Siamese network with the pose estimation task, pose estimation and tracking can be trained in one stage. Once trained, PoseSRPN only relies on a single bounding box initialization and producing human joints location. The experimental results show that while maintaining the good accuracy of pose estimation on COCO and PoseTrack datasets, the proposed method achieves a speed of 59 frame/s, which is superior to other pose tracking frameworks.

Keywords: computer vision, pose estimation, pose tracking, Siamese network

Procedia PDF Downloads 40
433 Applying Spanning Tree Graph Theory for Automatic Database Normalization

Authors: Chetneti Srisa-an


In Knowledge and Data Engineering field, relational database is the best repository to store data in a real world. It has been using around the world more than eight decades. Normalization is the most important process for the analysis and design of relational databases. It aims at creating a set of relational tables with minimum data redundancy that preserve consistency and facilitate correct insertion, deletion, and modification. Normalization is a major task in the design of relational databases. Despite its importance, very few algorithms have been developed to be used in the design of commercial automatic normalization tools. It is also rare technique to do it automatically rather manually. Moreover, for a large and complex database as of now, it make even harder to do it manually. This paper presents a new complete automated relational database normalization method. It produces the directed graph and spanning tree, first. It then proceeds with generating the 2NF, 3NF and also BCNF normal forms. The benefit of this new algorithm is that it can cope with a large set of complex function dependencies.

Keywords: relational database, functional dependency, automatic normalization, primary key, spanning tree

Procedia PDF Downloads 257
432 Deep Learning Based 6D Pose Estimation for Bin-Picking Using 3D Point Clouds

Authors: Hesheng Wang, Haoyu Wang, Chungang Zhuang


Estimating the 6D pose of objects is a core step for robot bin-picking tasks. The problem is that various objects are usually randomly stacked with heavy occlusion in real applications. In this work, we propose a method to regress 6D poses by predicting three points for each object in the 3D point cloud through deep learning. To solve the ambiguity of symmetric pose, we propose a labeling method to help the network converge better. Based on the predicted pose, an iterative method is employed for pose optimization. In real-world experiments, our method outperforms the classical approach in both precision and recall.

Keywords: pose estimation, deep learning, point cloud, bin-picking, 3D computer vision

Procedia PDF Downloads 34
431 Basic Calibration and Normalization Techniques for Time Domain Reflectometry Measurements

Authors: Shagufta Tabassum


The study of dielectric properties in a binary mixture of liquids is very useful to understand the liquid structure, molecular interaction, dynamics, and kinematics of the mixture. Time-domain reflectometry (TDR) is a powerful tool for studying the cooperation and molecular dynamics of the H-bonded system. In this paper, we discuss the basic calibration and normalization procedure for time-domain reflectometry measurements. Our approach is to explain the different types of error occur during TDR measurements and how these errors can be eliminated or minimized.

Keywords: time domain reflectometry measurement techinque, cable and connector loss, oscilloscope loss, and normalization technique

Procedia PDF Downloads 74
430 Normalizing Scientometric Indicators of Individual Publications Using Local Cluster Detection Methods on Citation Networks

Authors: Levente Varga, Dávid Deritei, Mária Ercsey-Ravasz, Răzvan Florian, Zsolt I. Lázár, István Papp, Ferenc Járai-Szabó


One of the major shortcomings of widely used scientometric indicators is that different disciplines cannot be compared with each other. The issue of cross-disciplinary normalization has been long discussed, but even the classification of publications into scientific domains poses problems. Structural properties of citation networks offer new possibilities, however, the large size and constant growth of these networks asks for precaution. Here we present a new tool that in order to perform cross-field normalization of scientometric indicators of individual publications relays on the structural properties of citation networks. Due to the large size of the networks, a systematic procedure for identifying scientific domains based on a local community detection algorithm is proposed. The algorithm is tested with different benchmark and real-world networks. Then, by the use of this algorithm, the mechanism of the scientometric indicator normalization process is shown for a few indicators like the citation number, P-index and a local version of the PageRank indicator. The fat-tail trend of the article indicator distribution enables us to successfully perform the indicator normalization process.

Keywords: citation networks, cross-field normalization, local cluster detection, scientometric indicators

Procedia PDF Downloads 99
429 Facial Pose Classification Using Hilbert Space Filling Curve and Multidimensional Scaling

Authors: Mekamı Hayet, Bounoua Nacer, Benabderrahmane Sidahmed, Taleb Ahmed


Pose estimation is an important task in computer vision. Though the majority of the existing solutions provide good accuracy results, they are often overly complex and computationally expensive. In this perspective, we propose the use of dimensionality reduction techniques to address the problem of facial pose estimation. Firstly, a face image is converted into one-dimensional time series using Hilbert space filling curve, then the approach converts these time series data to a symbolic representation. Furthermore, a distance matrix is calculated between symbolic series of an input learning dataset of images, to generate classifiers of frontal vs. profile face pose. The proposed method is evaluated with three public datasets. Experimental results have shown that our approach is able to achieve a correct classification rate exceeding 97% with K-NN algorithm.

Keywords: machine learning, pattern recognition, facial pose classification, time series

Procedia PDF Downloads 236
428 Investigating Data Normalization Techniques in Swarm Intelligence Forecasting for Energy Commodity Spot Price

Authors: Yuhanis Yusof, Zuriani Mustaffa, Siti Sakira Kamaruddin


Data mining is a fundamental technique in identifying patterns from large data sets. The extracted facts and patterns contribute in various domains such as marketing, forecasting, and medical. Prior to that, data are consolidated so that the resulting mining process may be more efficient. This study investigates the effect of different data normalization techniques, which are Min-max, Z-score, and decimal scaling, on Swarm-based forecasting models. Recent swarm intelligence algorithms employed includes the Grey Wolf Optimizer (GWO) and Artificial Bee Colony (ABC). Forecasting models are later developed to predict the daily spot price of crude oil and gasoline. Results showed that GWO works better with Z-score normalization technique while ABC produces better accuracy with the Min-Max. Nevertheless, the GWO is more superior that ABC as its model generates the highest accuracy for both crude oil and gasoline price. Such a result indicates that GWO is a promising competitor in the family of swarm intelligence algorithms.

Keywords: artificial bee colony, data normalization, forecasting, Grey Wolf optimizer

Procedia PDF Downloads 393
427 A New Criterion Using Pose and Shape of Objects for Collision Risk Estimation

Authors: DoHyeung Kim, DaeHee Seo, ByungDoo Kim, ByungGil Lee


As many recent researches being implemented in aviation and maritime aspects, strong doubts have been raised concerning the reliability of the estimation of collision risk. It is shown that using position and velocity of objects can lead to imprecise results. In this paper, therefore, a new approach to the estimation of collision risks using pose and shape of objects is proposed. Simulation results are presented validating the accuracy of the new criterion to adapt to collision risk algorithm based on fuzzy logic.

Keywords: collision risk, pose, shape, fuzzy logic

Procedia PDF Downloads 396
426 Normalizing Logarithms of Realized Volatility in an ARFIMA Model

Authors: G. L. C. Yap


Modelling realized volatility with high-frequency returns is popular as it is an unbiased and efficient estimator of return volatility. A computationally simple model is fitting the logarithms of the realized volatilities with a fractionally integrated long-memory Gaussian process. The Gaussianity assumption simplifies the parameter estimation using the Whittle approximation. Nonetheless, this assumption may not be met in the finite samples and there may be a need to normalize the financial series. Based on the empirical indices S&P500 and DAX, this paper examines the performance of the linear volatility model pre-treated with normalization compared to its existing counterpart. The empirical results show that by including normalization as a pre-treatment procedure, the forecast performance outperforms the existing model in terms of statistical and economic evaluations.

Keywords: Gaussian process, long-memory, normalization, value-at-risk, volatility, Whittle estimator

Procedia PDF Downloads 252
425 Adversarial Disentanglement Using Latent Classifier for Pose-Independent Representation

Authors: Hamed Alqahtani, Manolya Kavakli-Thorne


The large pose discrepancy is one of the critical challenges in face recognition during video surveillance. Due to the entanglement of pose attributes with identity information, the conventional approaches for pose-independent representation lack in providing quality results in recognizing largely posed faces. In this paper, we propose a practical approach to disentangle the pose attribute from the identity information followed by synthesis of a face using a classifier network in latent space. The proposed approach employs a modified generative adversarial network framework consisting of an encoder-decoder structure embedded with a classifier in manifold space for carrying out factorization on the latent encoding. It can be further generalized to other face and non-face attributes for real-life video frames containing faces with significant attribute variations. Experimental results and comparison with state of the art in the field prove that the learned representation of the proposed approach synthesizes more compelling perceptual images through a combination of adversarial and classification losses.

Keywords: disentanglement, face detection, generative adversarial networks, video surveillance

Procedia PDF Downloads 31
424 Spatiotemporal Neural Network for Video-Based Pose Estimation

Authors: Bin Ji, Kai Xu, Shunyu Yao, Jingjing Liu, Ye Pan


Human pose estimation is a popular research area in computer vision for its important application in human-machine interface. In recent years, 2D human pose estimation based on convolution neural network has got great progress and development. However, in more and more practical applications, people often need to deal with tasks based on video. It’s not far-fetched for us to consider how to combine the spatial and temporal information together to achieve a balance between computing cost and accuracy. To address this issue, this study proposes a new spatiotemporal model, namely Spatiotemporal Net (STNet) to combine both temporal and spatial information more rationally. As a result, the predicted keypoints heatmap is potentially more accurate and spatially more precise. Under the condition of ensuring the recognition accuracy, the algorithm deal with spatiotemporal series in a decoupled way, which greatly reduces the computation of the model, thus reducing the resource consumption. This study demonstrate the effectiveness of our network over the Penn Action Dataset, and the results indicate superior performance of our network over the existing methods.

Keywords: convolutional long short-term memory, deep learning, human pose estimation, spatiotemporal series

Procedia PDF Downloads 34
423 Pose-Dependency of Machine Tool Structures: Appearance, Consequences, and Challenges for Lightweight Large-Scale Machines

Authors: S. Apprich, F. Wulle, A. Lechler, A. Pott, A. Verl


Large-scale machine tools for the manufacturing of large work pieces, e.g. blades, casings or gears for wind turbines, feature pose-dependent dynamic behavior. Small structural damping coefficients lead to long decay times for structural vibrations that have negative impacts on the production process. Typically, these vibrations are handled by increasing the stiffness of the structure by adding mass. That is counterproductive to the needs of sustainable manufacturing as it leads to higher resource consumption both in material and in energy. Recent research activities have led to higher resource efficiency by radical mass reduction that rely on control-integrated active vibration avoidance and damping methods. These control methods depend on information describing the dynamic behavior of the controlled machine tools in order to tune the avoidance or reduction method parameters according to the current state of the machine. The paper presents the appearance, consequences and challenges of the pose-dependent dynamic behavior of lightweight large-scale machine tool structures in production. The paper starts with the theoretical introduction of the challenges of lightweight machine tool structures resulting from reduced stiffness. The statement of the pose-dependent dynamic behavior is corroborated by the results of the experimental modal analysis of a lightweight test structure. Afterwards, the consequences of the pose-dependent dynamic behavior of lightweight machine tool structures for the use of active control and vibration reduction methods are explained. Based on the state of the art on pose-dependent dynamic machine tool models and the modal investigation of an FE-model of the lightweight test structure, the criteria for a pose-dependent model for use in vibration reduction are derived. The description of the approach for a general pose-dependent model of the dynamic behavior of large lightweight machine tools that provides the necessary input to the aforementioned vibration avoidance and reduction methods to properly tackle machine vibrations is the outlook of the paper.

Keywords: dynamic behavior, lightweight, machine tool, pose-dependency

Procedia PDF Downloads 370
422 A Unified Deep Framework for Joint 3d Pose Estimation and Action Recognition from a Single Color Camera

Authors: Huy Hieu Pham, Houssam Salmane, Louahdi Khoudour, Alain Crouzil, Pablo Zegers, Sergio Velastin


We present a deep learning-based multitask framework for joint 3D human pose estimation and action recognition from color video sequences. Our approach proceeds along two stages. In the first, we run a real-time 2D pose detector to determine the precise pixel location of important key points of the body. A two-stream neural network is then designed and trained to map detected 2D keypoints into 3D poses. In the second, we deploy the Efficient Neural Architecture Search (ENAS) algorithm to find an optimal network architecture that is used for modeling the Spatio-temporal evolution of the estimated 3D poses via an image-based intermediate representation and performing action recognition. Experiments on Human3.6M, Microsoft Research Redmond (MSR) Action3D, and Stony Brook University (SBU) Kinect Interaction datasets verify the effectiveness of the proposed method on the targeted tasks. Moreover, we show that our method requires a low computational budget for training and inference.

Keywords: human action recognition, pose estimation, D-CNN, deep learning

Procedia PDF Downloads 34
421 Real Time Multi Person Action Recognition Using Pose Estimates

Authors: Aishrith Rao


Human activity recognition is an important aspect of video analytics, and many approaches have been recommended to enable action recognition. In this approach, the model is used to identify the action of the multiple people in the frame and classify them accordingly. A few approaches use RNNs and 3D CNNs, which are computationally expensive and cannot be trained with the small datasets which are currently available. Multi-person action recognition has been performed in order to understand the positions and action of people present in the video frame. The size of the video frame can be adjusted as a hyper-parameter depending on the hardware resources available. OpenPose has been used to calculate pose estimate using CNN to produce heap-maps, one of which provides skeleton features, which are basically joint features. The features are then extracted, and a classification algorithm can be applied to classify the action.

Keywords: human activity recognition, computer vision, pose estimates, convolutional neural networks

Procedia PDF Downloads 39
420 A New Scheme for Chain Code Normalization in Arabic and Farsi Scripts

Authors: Reza Shakoori


This paper presents a structural correction of Arabic and Persian strokes using manipulation of their chain codes in order to improve the rate and performance of Persian and Arabic handwritten word recognition systems. It collects pure and effective features to represent a character with one consolidated feature vector and reduces variations in order to decrease the number of training samples and increase the chance of successful classification. Our results also show that how the proposed approaches can simplify classification and consequently recognition by reducing variations and possible noises on the chain code by keeping orientation of characters and their backbone structures.

Keywords: Arabic, chain code normalization, OCR systems, image processing

Procedia PDF Downloads 306
419 Light-Weight Network for Real-Time Pose Estimation

Authors: Jianghao Hu, Hongyu Wang


The effective and efficient human pose estimation algorithm is an important task for real-time human pose estimation on mobile devices. This paper proposes a light-weight human key points detection algorithm, Light-Weight Network for Real-Time Pose Estimation (LWPE). LWPE uses light-weight backbone network and depthwise separable convolutions to reduce parameters and lower latency. LWPE uses the feature pyramid network (FPN) to fuse the high-resolution, semantically weak features with the low-resolution, semantically strong features. In the meantime, with multi-scale prediction, the predicted result by the low-resolution feature map is stacked to the adjacent higher-resolution feature map to intermediately monitor the network and continuously refine the results. At the last step, the key point coordinates predicted in the highest-resolution are used as the final output of the network. For the key-points that are difficult to predict, LWPE adopts the online hard key points mining strategy to focus on the key points that hard predicting. The proposed algorithm achieves excellent performance in the single-person dataset selected in the AI (artificial intelligence) challenge dataset. The algorithm maintains high-precision performance even though the model only contains 3.9M parameters, and it can run at 225 frames per second (FPS) on the generic graphics processing unit (GPU).

Keywords: depthwise separable convolutions, feature pyramid network, human pose estimation, light-weight backbone

Procedia PDF Downloads 42
418 Extensions of Schwarz Lemma in the Half-Plane

Authors: Nicolae Pascu


Aside from being a fundamental tool in Complex analysis, Schwarz Lemma-which was finalized in its most complete form at the beginning of the last century-generated an important area of research in various fields of mathematics, which continues to advance even today. We present some properties of analytic functions in the half-plane which satisfy the conditions of the classical Schwarz Lemma (Carathéodory functions) and obtain a generalization of the well-known Aleksandrov-Sobolev Lemma for analytic functions in the half-plane (the correspondent of Schwarz-Pick Lemma from the unit disk). Using this Schwarz-type lemma, we obtain a characterization for the entire class of Carathéodory functions, which might be of independent interest. We prove two monotonicity properties for Carathéodory functions that do not depend upon their normalization at infinity (the hydrodynamic normalization). The method is based on conformal mapping arguments for analytic functions in the half-plane satisfying appropriate conditions, in the spirit of Schwarz lemma. According to the research findings in this paper, our main results give estimates for the modulus and the argument for the entire class of Carathéodory functions. As applications, we give several extensions of Julia-Wolf-Carathéodory Lemma in a half-strip and show that our results are sharp.

Keywords: schwarz lemma, Julia-wolf-caratéodory lemma, analytic function, normalization condition, caratéodory function

Procedia PDF Downloads 51
417 Disentangling Audio Content and Emotion with Adaptive Instance Normalization for Expressive Facial Animation Synthesis

Authors: Che-Jui Chang, Long Zhao, Mubbasir Kapadia


3D facial animation synthesis from audio has been a focus in recent years. However, most existing works in the literature are designed for the mapping between audio and visual content, providing limited knowledge regarding the relationship between emotion in audio and expressive facial animation. In this paper, we aim to generate audio-matching facial animations with the specified emotion label. In such a task, we argue that separating the content from audio is indispensable -the proposed model must learn to generate facial contents from audio contents while expressions from the specified emotion. We achieve it by an adaptive instance normalization (AdaIN) module that isolates the content in the audio and combines the emotion embedding from the specified label. The joint content-emotion embedding is then used to generate 3D facial vertices and texture maps. We compare our method with state-of-the-art baselines, including the facial segmentation-based and voice conversion-based disentanglement approaches. We also conducted a user study to evaluate the performance of emotion conditioning, and the results indicate our proposed method outperforms the baselines in both the animation quality and accuracy of expression categorization.

Keywords: adaptive instance normalization, audio-driven animation, content-emotion disentanglement, emotion-conditioning, expressive facial animation synthesis

Procedia PDF Downloads 23
416 Single-Camera Basketball Tracker through Pose and Semantic Feature Fusion

Authors: Adrià Arbués-Sangüesa, Coloma Ballester, Gloria Haro


Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided. This paper presents an analysis of several geometric and semantic visual features to detect and track basketball players. An ablation study is carried out and then used to remark that a robust tracker can be built with Deep Learning features, without the need of extracting contextual ones, such as proximity or color similarity, nor applying camera stabilization techniques. The presented tracker consists of: (1) a detection step, which uses a pretrained deep learning model to estimate the players pose, followed by (2) a tracking step, which leverages pose and semantic information from the output of a convolutional layer in a VGG network. Its performance is analyzed in terms of MOTA over a basketball dataset with more than 10k instances.

Keywords: basketball, deep learning, feature extraction, single-camera, tracking

Procedia PDF Downloads 41
415 Evaluating the Performance of Color Constancy Algorithm

Authors: Damanjit Kaur, Avani Bhatia


Color constancy is significant for human vision since color is a pictorial cue that helps in solving different visions tasks such as tracking, object recognition, or categorization. Therefore, several computational methods have tried to simulate human color constancy abilities to stabilize machine color representations. Two different kinds of methods have been used, i.e., normalization and constancy. While color normalization creates a new representation of the image by canceling illuminant effects, color constancy directly estimates the color of the illuminant in order to map the image colors to a canonical version. Color constancy is the capability to determine colors of objects independent of the color of the light source. This research work studies the most of the well-known color constancy algorithms like white point and gray world.

Keywords: color constancy, gray world, white patch, modified white patch

Procedia PDF Downloads 171
414 Author Name Disambiguation for Biomedical Literature

Authors: Parthiban Srinivasan


PubMed provides online access to the National Library of Medicine database (MEDLINE) and other publications, which contain close to 25 million scientific citations from 1865 to the present. There are close to 80 million author name instances in those close to 25 million citations. For any work of literature, a fundamental issue is to identify the individual(s) who wrote it, and conversely, to identify all of the works that belong to a given individual. Due to the lack of universal standards for name information, there are two aspects of name ambiguity: name synonymy (a single author with multiple name representations), and name homonymy (multiple authors sharing the same name representation). In this talk, we present some results from our extensive work in author name disambiguation for PubMed citations. Information will be presented on the effectiveness and shortcomings of different aspects of successful name disambiguation such as parsing, validation, standardization and normalization.

Keywords: disambiguation, normalization, parsing, PubMed

Procedia PDF Downloads 186
413 Assessment of Pre-Processing Influence on Near-Infrared Spectra for Predicting the Mechanical Properties of Wood

Authors: Aasheesh Raturi, Vimal Kothiyal, P. D. Semalty


We studied mechanical properties of Eucalyptus tereticornis using FT-NIR spectroscopy. Firstly, spectra were pre-processed to eliminate useless information. Then, prediction model was constructed by partial least squares regression. To study the influence of pre-processing on prediction of mechanical properties for NIR analysis of wood samples, we applied various pretreatment methods like straight line subtraction, constant offset elimination, vector-normalization, min-max normalization, multiple scattering. Correction, first derivative, second derivatives and their combination with other treatment such as First derivative + straight line subtraction, First derivative+ vector normalization and First derivative+ multiplicative scattering correction. The data processing methods in combination of preprocessing with different NIR regions, RMSECV, RMSEP and optimum factors/rank were obtained by optimization process of model development. More than 350 combinations were obtained during optimization process. More than one pre-processing method gave good calibration/cross-validation and prediction/test models, but only the best calibration/cross-validation and prediction/test models are reported here. The results show that one can safely use NIR region between 4000 to 7500 cm-1 with straight line subtraction, constant offset elimination, first derivative and second derivative preprocessing method which were found to be most appropriate for models development.

Keywords: FT-NIR, mechanical properties, pre-processing, PLS

Procedia PDF Downloads 252
412 Enhancement of Underwater Haze Image with Edge Reveal Using Pixel Normalization

Authors: M. Dhana Lakshmi, S. Sakthivel Murugan


As light passes from source to observer in the water medium, it is scattered by the suspended particulate matter. This scattering effect will plague the captured images with non-uniform illumination, blurring details, halo artefacts, weak edges, etc. To overcome this, pixel normalization with an Amended Unsharp Mask (AUM) filter is proposed to enhance the degraded image. To validate the robustness of the proposed technique irrespective of atmospheric light, the considered datasets are collected on dual locations. For those images, the maxima and minima pixel intensity value is computed and normalized; then the AUM filter is applied to strengthen the blurred edges. Finally, the enhanced image is obtained with good illumination and contrast. Thus, the proposed technique removes the effect of scattering called de-hazing and restores the perceptual information with enhanced edge detail. Both qualitative and quantitative analyses are done on considering the standard non-reference metric called underwater image sharpness measure (UISM), and underwater image quality measure (UIQM) is used to measure color, sharpness, and contrast for both of the location images. It is observed that the proposed technique has shown overwhelming performance compared to other deep-based enhancement networks and traditional techniques in an adaptive manner.

Keywords: underwater drone imagery, pixel normalization, thresholding, masking, unsharp mask filter

Procedia PDF Downloads 47
411 Surface Geodesic Derivative Pattern for Deformable Textured 3D Object Comparison: Application to Expression and Pose Invariant 3D Face Recognition

Authors: Farshid Hajati, Soheila Gheisari, Ali Cheraghian, Yongsheng Gao


This paper presents a new Surface Geodesic Derivative Pattern (SGDP) for matching textured deformable 3D surfaces. SGDP encodes micro-pattern features based on local surface higher-order derivative variation. It extracts local information by encoding various distinctive textural relationships contained in a geodesic neighborhood, hence fusing texture and range information of a surface at the data level. Geodesic texture rings are encoded into local patterns for similarity measurement between non-rigid 3D surfaces. The performance of the proposed method is evaluated extensively on the Bosphorus and FRGC v2 face databases. Compared to existing benchmarks, experimental results show the effectiveness and superiority of combining the texture and 3D shape data at the earliest level in recognizing typical deformable faces under expression, illumination, and pose variations.

Keywords: 3D face recognition, pose, expression, surface matching, texture

Procedia PDF Downloads 276
410 Deep Learning Based Fall Detection Using Simplified Human Posture

Authors: Kripesh Adhikari, Hamid Bouchachia, Hammadi Nait-Charif


Falls are one of the major causes of injury and death among elderly people aged 65 and above. A support system to identify such kind of abnormal activities have become extremely important with the increase in ageing population. Pose estimation is a challenging task and to add more to this, it is even more challenging when pose estimations are performed on challenging poses that may occur during fall. Location of the body provides a clue where the person is at the time of fall. This paper presents a vision-based tracking strategy where available joints are grouped into three different feature points depending upon the section they are located in the body. The three feature points derived from different joints combinations represents the upper region or head region, mid-region or torso and lower region or leg region. Tracking is always challenging when a motion is involved. Hence the idea is to locate the regions in the body in every frame and consider it as the tracking strategy. Grouping these joints can be beneficial to achieve a stable region for tracking. The location of the body parts provides a crucial information to distinguish normal activities from falls.

Keywords: fall detection, machine learning, deep learning, pose estimation, tracking

Procedia PDF Downloads 70
409 A Normalized Non-Stationary Wavelet Based Analysis Approach for a Computer Assisted Classification of Laryngoscopic High-Speed Video Recordings

Authors: Mona K. Fehling, Jakob Unger, Dietmar J. Hecker, Bernhard Schick, Joerg Lohscheller


Voice disorders origin from disturbances of the vibration patterns of the two vocal folds located within the human larynx. Consequently, the visual examination of vocal fold vibrations is an integral part within the clinical diagnostic process. For an objective analysis of the vocal fold vibration patterns, the two-dimensional vocal fold dynamics are captured during sustained phonation using an endoscopic high-speed camera. In this work, we present an approach allowing a fully automatic analysis of the high-speed video data including a computerized classification of healthy and pathological voices. The approach bases on a wavelet-based analysis of so-called phonovibrograms (PVG), which are extracted from the high-speed videos and comprise the entire two-dimensional vibration pattern of each vocal fold individually. Using a principal component analysis (PCA) strategy a low-dimensional feature set is computed from each phonovibrogram. From the PCA-space clinically relevant measures can be derived that quantify objectively vibration abnormalities. In the first part of the work it will be shown that, using a machine learning approach, the derived measures are suitable to distinguish automatically between healthy and pathological voices. Within the approach the formation of the PCA-space and consequently the extracted quantitative measures depend on the clinical data, which were used to compute the principle components. Therefore, in the second part of the work we proposed a strategy to achieve a normalization of the PCA-space by registering the PCA-space to a coordinate system using a set of synthetically generated vibration patterns. The results show that owing to the normalization step potential ambiguousness of the parameter space can be eliminated. The normalization further allows a direct comparison of research results, which bases on PCA-spaces obtained from different clinical subjects.

Keywords: Wavelet-based analysis, Multiscale product, normalization, computer assisted classification, high-speed laryngoscopy, vocal fold analysis, phonovibrogram

Procedia PDF Downloads 162
408 A New 3D Shape Descriptor Based on Multi-Resolution and Multi-Block CS-LBP

Authors: Nihad Karim Chowdhury, Mohammad Sanaullah Chowdhury, Muhammed Jamshed Alam Patwary, Rubel Biswas


In content-based 3D shape retrieval system, achieving high search performance has become an important research problem. A challenging aspect of this problem is to find an effective shape descriptor which can discriminate similar shapes adequately. To address this problem, we propose a new shape descriptor for 3D shape models by combining multi-resolution with multi-block center-symmetric local binary pattern operator. Given an arbitrary 3D shape, we first apply pose normalization, and generate a set of multi-viewed 2D rendered images. Second, we apply Gaussian multi-resolution filter to generate several levels of images from each of 2D rendered image. Then, overlapped sub-images are computed for each image level of a multi-resolution image. Our unique multi-block CS-LBP comes next. It allows the center to be composed of m-by-n rectangular pixels, instead of a single pixel. This process is repeated for all the 2D rendered images, derived from both ‘depth-buffer’ and ‘silhouette’ rendering. Finally, we concatenate all the features vectors into one dimensional histogram as our proposed 3D shape descriptor. Through several experiments, we demonstrate that our proposed 3D shape descriptor outperform the previous methods by using a benchmark dataset.

Keywords: 3D shape retrieval, 3D shape descriptor, CS-LBP, overlapped sub-images

Procedia PDF Downloads 329
407 Improved Pitch Detection Using Fourier Approximation Method

Authors: Balachandra Kumaraswamy, P. G. Poonacha


Automatic Music Information Retrieval has been one of the challenging topics of research for a few decades now with several interesting approaches reported in the literature. In this paper we have developed a pitch extraction method based on a finite Fourier series approximation to the given window of samples. We then estimate pitch as the fundamental period of the finite Fourier series approximation to the given window of samples. This method uses analysis of the strength of harmonics present in the signal to reduce octave as well as harmonic errors. The performance of our method is compared with three best known methods for pitch extraction, namely, Yin, Windowed Special Normalization of the Auto-Correlation Function and Harmonic Product Spectrum methods of pitch extraction. Our study with artificially created signals as well as music files show that Fourier Approximation method gives much better estimate of pitch with less octave and harmonic errors.

Keywords: pitch, fourier series, yin, normalization of the auto- correlation function, harmonic product, mean square error

Procedia PDF Downloads 324
406 A Neural Network Classifier for Identifying Duplicate Image Entries in Real-Estate Databases

Authors: Sergey Ermolin, Olga Ermolin


A Deep Convolution Neural Network with Triplet Loss is used to identify duplicate images in real-estate advertisements in the presence of image artifacts such as watermarking, cropping, hue/brightness adjustment, and others. The effects of batch normalization, spatial dropout, and various convergence methodologies on the resulting detection accuracy are discussed. For comparative Return-on-Investment study (per industry request), end-2-end performance is benchmarked on both Nvidia Titan GPUs and Intel’s Xeon CPUs. A new real-estate dataset from San Francisco Bay Area is used for this work. Sufficient duplicate detection accuracy is achieved to supplement other database-grounded methods of duplicate removal. The implemented method is used in a Proof-of-Concept project in the real-estate industry.

Keywords: visual recognition, convolutional neural networks, triplet loss, spatial batch normalization with dropout, duplicate removal, advertisement technologies, performance benchmarking

Procedia PDF Downloads 229