Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
International Joint Conference on Neural Network (IJCNN)
Access to large corpora with strongly labelled sound events is expensive and difficult in engineering applications. Many researches turn to address the problem of how to detect both the types and the timestamps of sound events with weak labels that only specify the types. This task can be treated as a multiple instance learning (MIL) problem, and a key to it in the sound event detection (SED) task is the design of a pooling function. The linear softmax pooling function achieves state-of-the-art performance since it can vary both the signs and the magnitudes of gradients. However, linear softmax pooling cannot flexibly deal with sound events of different time scales. In this paper, we propose a power pooling function which can automatically adapt to various sound events. By adding a trainable parameter to each event, power pooling can provide more accurate gradients for frames in a clip than other pooling functions. On both weakly supervised and semi-supervised SED datasets, the proposed power pooling function outperforms linear softmax pooling on both coarse-grained and fine-grained metrics. Specifically, it improves the event-based F1 score by 11.4% and 10.2% relatively on the two datasets. While this paper focuses on SED applications, the proposed method can be applied to MIL tasks in other domains.
Yuming Du, Robin Kips, Albert Pumarola, Sebastian Starke, Ali Thabet, Artsiom Sanakoyeu
Bilge Acun, Benjamin Lee, Fiodar Kazhamiaka, Kiwan Maeng, Manoj Chakkaravarthy, Udit Gupta, David Brooks, Carole-Jean Wu
Ilkan Esiyok, Pascal Berrang, Katriel Cohn-Gordon, Robert Künnemann