Deep Prior for Hand Pose Estimation

We introduce and evaluate several architectures for Convolutional Neural Networks to predict the 3D joint locations of a hand given a depth map. We first show that a prior on the 3D pose can be easily introduced and significantly improves the accuracy and reliability of the predictions. We also show how to use context efficiently to deal with ambiguities between fingers. These two contributions allow us to significantly outperform the state-of-the-art on several challenging benchmarks, both in terms of accuracy and computation times.

Details

We introduce a prior model for predicting the 3D joint locations of a hand given a depth map using Convolutional Neural Networks (CNN). In particular, we show that 1) a prior model improves the accuracy by constraining the predicted hand pose to possible ones; 2) our non-linear, compressed model can be learnt in a data-driven manner and explicitly benefits from a holistic hand pose representation; 3) our prior model seamlessly integrates into a standard CNN architecture creating an unusual “bottleneck”. We show that our contributions allow us to significantly outperform the state-of-the-art on several challenging benchmarks, both in terms of accuracy and computation times.

This ICCV’15 paper (Depth-based hand pose estimation: data, methods, and challenges) independently shows that our approach outperforms the other ones :)

Results

Material

Presentation: CVWW’15 presentation

Our results: Each line is the estimated hand pose of a frame. The pose is parametrized by the locations of the joints in (u, v, d) coordinates, ie image coordinates and depth. The coordinates of each joint are stored in sequential order.

ICVL dataset of D. Tang: CVWW’15 Prior, CVWW’15 Refinement
NYU dataset of J. Tompson: CVWW’15 Prior, CVWW’15 Refinement

Code

Here you can find the code for our CVWW’15 paper “Hands Deep in Deep Learning for Hand Pose Estimation”. It is distributed as a single package DeepPrior under GPLv3. It also includes two pretrained models for the NYU and ICVL dataset. There is no proper documentation yet, but a basic readme file is included. If you have questions please do not hesitate to contact us. If you use the code, please cite us (see below).

Citation

@inproceedings{Oberweger2015,
  author = {M.~Oberweger and P.~Wohlhart and V.~Lepetit},
  title = {Hands Deep in Deep Learning for Hand Pose Estimation},
  booktitle = {Proc.~of Computer Vision Winter Workshop},
  year = 2015
}