Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
British Machine Vision Conference (BMVC)
Video inpainting has become an increasingly matured forensics technique and has caused visual misinformation on social media. Yet the countermeasures to detect in-painted regions in videos have received little attention, leaving such threats out of control. To pioneer a mitigation solution, we introduce VIDNet, the first study of learning-based video inpainting detection, which contains a two-stream encoder-decoder architecture with attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis frames to augment RGB frames, producing multimodal features at different levels with an encoder. Exploring spatial and temporal relationships, these features are further decoded by a Convolutional LSTM to predict masks of in-painted regions. In addition, when detecting whether a pixel is in-painted or not, we present a quad-directional local attention module that borrows information from its surrounding pixels from four directions. Extensive experiments validate the significant advantages of VIDNet over alternative inpainting detection baselines, as well as its generalization on unseen videos. We have released our code in: https://github.com/pengzhou1108/VIDNet.
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu
Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, Robert Künnemann