DensePose, is Facebook’s real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body.

Research in human understanding aims primarily at localizing a sparse set of joints, like the wrists, or elbows of humans. This may suffice for applications like gesture or action recognition, but it delivers a reduced image interpretation. We wanted to go further. Imagine trying on new clothes via a photo or putting costumes on your friend’s photos. For these types of tasks, a more complete, surface-based image interpretation is required.

The DensePose project addresses this and aims at understanding humans in images in terms of such surface-based models. Learn more about the work in this blog and our CVPR 2018 paper DensePose: Dense Human Pose Estimation In The Wild.

The DensePose project introduces:

  • DensePose-COCO, a large-scale ground-truth dataset with image-to-surface correspondences manually annotated on 50K COCO images.
  • DensePose-RCNN, a variant of Mask-RCNN, to densely regress part-specific UV coordinates within every human region at multiple frames per second.

DensePose is available under the Creative Commons license on GitHub We’re also releasing performance baselines for multiple pre-trained models alongside with the ground-truth information for DensePose-COCO.