Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

arXiv

Abstract

With the recent popularity spike of AR/VR applications, realistic and accurate control of 3D full-body avatars is a highly demanded feature. A particular challenge is that only a sparse tracking signal is available from standalone HMDs (Head Mounted Devices) and it is often limited to tracking the user’s head and wrist. While this signal is resourceful for reconstructing the upper body motion, the lower body is not tracked and must be synthesized from the limited information provided by the upper body joints. In this paper, we present AGRoL, a novel conditional diffusion model specially purposed to track full bodies given sparse upper-body tracking signals. Our model uses a simple multi-layer perceptrons (MLP) architecture and a novel conditioning scheme for motion data. It can predict accurate and smooth full-body motion, especially the challenging lower body movement. Contrary to common diffusion architectures, our compact architecture can run in real-time, making it usable for online body-tracking applications. We train and evaluate our model on AMASS motion capture dataset, and show that our approach outperforms state-of-the-art methods in generated motion accuracy and smoothness. We further justify our design choices through extensive experiments and ablations.


Featured Publications