We propose an entirely data-driven approach to estimating the 3D pose of a hand given a depth image. We show that we can correct the mistakes made by a Convolutional Neural Network trained to predict an estimate of the 3D pose by using a feedback loop. The components of this feedback loop are also Deep Networks, optimized using training data. They remove the need for fitting a 3D model to the input data, which requires both a carefully designed fitting function and algorithm. We show that our approach outperforms state-of-the-art methods, and is efficient as our implementation runs at over 400fps on a single GPU.



Presentation: ICCV’15 presentation, ICCV’15 poster

Our results: Each line is the estimated hand pose of a frame. The pose is parametrized by the locations of the joints in (u, v, d) coordinates, ie image coordinates and depth. The coordinates of each joint are stored in sequential order.


  author = {M.~Oberweger and P.~Wohlhart and V.~Lepetit},
  title = {Training a Feedback Loop for Hand Pose Estimation},
  booktitle = {Proc.~of International Conference on Computer Vision},
  year = 2015