Temporally Coherent 3D Animation Reconstruction from RGB-D Video Data
Authors: Salam Khalifa, Naveed Ahmed
Abstract:
We present a new method to reconstruct a temporally coherent 3D animation from single or multi-view RGB-D video data using unbiased feature point sampling. Given RGB-D video data, in form of a 3D point cloud sequence, our method first extracts feature points using both color and depth information. In the subsequent steps, these feature points are used to match two 3D point clouds in consecutive frames independent of their resolution. Our new motion vectors based dynamic alignement method then fully reconstruct a spatio-temporally coherent 3D animation. We perform extensive quantitative validation using novel error functions to analyze the results. We show that despite the limiting factors of temporal and spatial noise associated to RGB-D data, it is possible to extract temporal coherence to faithfully reconstruct a temporally coherent 3D animation from RGB-D video data.
Keywords: 3D video, 3D animation, RGB-D video, Temporally Coherent 3D Animation.
Digital Object Identifier (DOI): doi.org/10.5281/zenodo.1096095
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2074References:
[1] Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. Free-viewpoint video of human actors. ACM Trans. Graph., 22(3):569–577, 2003.
[2] Jonathan Starck and Adrian Hilton. Surface capture for performance-based animation. IEEE Computer Graphics and Applications, 27(3):21–31, 2007.
[3] Paul E. Debevec, Tim Hawkins, Chris Tchou, Haarm-Pieter Duiker, Westley Sarokin, and Mark Sagar. Acquiring the reflectance field of a human face. In SIGGRAPH, pages 145–156, 2000.
[4] Tim Hawkins, Per Einarsson, and Paul E. Debevec. A dual light stage. In EGSR, pages 91–98, 2005.
[5] Christian Theobalt, Naveed Ahmed, Gernot Ziegler, and Hans-Peter Seidel. High-quality reconstruction of virtual actors from multi-view video streams. IEEE Signal Processing Magazine, 24(6):45–57, 2007.
[6] Edilson de Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. Performance capture from sparse multi-view video. ACM Trans. Graph., 27(3), 2008.
[7] Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popovic. Articulated mesh animation from multi-view silhouettes. ACM Trans. Graph., 27(3), 2008.
[8] Naveed Ahmed, Christian Theobalt, Christian R¨ossl, Sebastian Thrun, and Hans-Peter Seidel. Dense correspondence finding for parametrization-free animation reconstruction from video. In CVPR, 2008.
[9] Art Tevs, Alexander Berner, Michael Wand, Ivo Ihrke, and Hans-Peter Seidel. Intrinsic shape matching by planned landmark sampling. In Eurographics, 2011.
[10] Peng Huang, Adrian Hilton, and Jonathan Starck. Shape similarity for 3d video sequences of people. International Journal of Computer Vision, 89(2-3):362–381, 2010.
[11] Masaki Hilaga, Yoshihisa Shinagawa, Taku Kohmura, and Tosiyasu L. Kunii. Topology matching for fully automatic similarity estimation of 3d shapes. In SIGGRAPH ’01, pages 203–212, New York, NY, USA, 2001. ACM.
[12] Cedric Cagniart, Edmond Boyer, and Slobdodan Ilic. Iterative mesh deformation for dense surface tracking. In ICCV Workshops, ICCV’09, 2009.
[13] Kiran Varanasi, Andrei Zaharescu, Edmond Boyer, and Radu Horaud. Temporal surface tracking using mesh evolution. In ECCV’08, pages 30–43, Berlin, Heidelberg, 2008.
[14] MICROSOFT. Kinect for microsoft windows and xbox 360. http://www.kinectforwindows.org/, November 2010.
[15] Y. M. Kim, D. Chan, Christian Theobalt, and S. Thrun. Design and calibration of a multi-view tof sensor fusion system. In CVPR Workshop, 2008.
[16] Y. M. Kim, Christian Theobalt, J. Diebel, J. Kosecka, B. Micusik, and S. Thrun. Multi-view image and tof sensor fusion for dense 3d reconstruction. In 3DIM, pages 1542–1549, Kyoto, Japan, 2009. IEEE.
[17] Victor Castaneda, Diana Mateus, and Nassir Navab. Stereo time-of-flight. In ICCV, 2011.
[18] Alexander Weiss, David Hirshberg, and Michael J. Black. Home 3d body scans from noisy image and range data. In ICCV, 2011.
[19] Andreas Baak, Meinard Muller, Gaurav Bharaj, Hans-Peter Seidel, and Christian Theobalt. A data-driven approach for real-time full body pose reconstruction from a depth camera. In ICCV, 2011.
[20] R. Girshick, J. Shotton, P. Kohli, A. Criminisi, and A. Fitzgibbon. Efficient regression of general-activity human poses from depth images. In ICCV, 2011.
[21] Kai Berger, Kai Ruhl, Yannic Schroeder, Christian Bruemmer, Alexander Scholz, and Marcus A. Magnor. Markerless motion capture using multiple color-depth sensors. In VMV, pages 317–324, 2011.
[22] Naveed Ahmed. A system for 360 degree acquisition and 3d animation reconstruction using multiple rgb-d cameras. In Proceedings of the 25th International Conference on Computer Animation and Social Agents (CASA), Casa’12, 2012.
[23] Radu Bogdan Rusu and Steve Cousins. 3D is here: Point Cloud Library (PCL). In ICRA, 2011.
[24] David G. Lowe. Object recognition from local scale-invariant features. In ICCV, pages 1150–1157, 1999.