We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. They remove the need for fitting a 3D model to the input data, which requires both a carefully designed fitting function and algorithm. We show that our approach outperforms state-of-the-art methods, and is efficient as our implementation runs at over 400fps on a single GPU.
Results
Material
Presentation: ICCV’15 presentation, ICCV’15 poster
Our results: Each line is the estimated hand pose of a frame. The pose is parametrized by the locations of the joints in (u, v, d) coordinates, ie image coordinates and depth. The coordinates of each joint are stored in sequential order.
- NYU dataset of J. Tompson: ICCV’15 Init, ICCV’15 Feedback
Citation
@inproceedings{Oberweger2015a,
author = {M.~Oberweger and P.~Wohlhart and V.~Lepetit},
title = {Training a Feedback Loop for Hand Pose Estimation},
booktitle = {Proc.~of International Conference on Computer Vision},
year = 2015
}