Deep Video Inpainting Detection

British Machine Vision Conference (BMVC)

Abstract

Video inpainting has become an increasingly matured forensics technique and has caused visual misinformation on social media. Yet the countermeasures to detect in-painted regions in videos have received little attention, leaving such threats out of control. To pioneer a mitigation solution, we introduce VIDNet, the first study of learning-based video inpainting detection, which contains a two-stream encoder-decoder architecture with attention module. To reveal artifacts encoded in compression, VIDNet additionally takes in Error Level Analysis frames to augment RGB frames, producing multimodal features at different levels with an encoder. Exploring spatial and temporal relationships, these features are further decoded by a Convolutional LSTM to predict masks of in-painted regions. In addition, when detecting whether a pixel is in-painted or not, we present a quad-directional local attention module that borrows information from its surrounding pixels from four directions. Extensive experiments validate the significant advantages of VIDNet over alternative inpainting detection baselines, as well as its generalization on unseen videos. We have released our code here.

Featured Publications