A Method for Animating Children’s Drawings of the Human Figure
Harrison Jesse Smith, Qingyuan Zheng, Yifei Li, Somya Jain, Jessica K. Hodgins
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
A keyword spotting (KWS) engine continuously running on the device is exposed to various speech signals that are usually unseen beforehand. It is a challenging problem to build a small-footprint and high-performing KWS model with robustness under different acoustic environments. In this paper, we explore how to effectively apply adversarial examples to improve KWS robustness. We propose data source-aware disentangled learning with adversarial examples to reduce the mismatch between the original and adversarial data as well as the mismatch across original training data sources. The KWS model architecture is based on depth-wise separable convolution and a simple attention module. Experimental results demonstrate that the proposed learning strategy improves false reject rate by 40.31% at 1% false accept rate on the internal dataset, compared to the strongest baseline without adversarial examples. Our best performing system achieves 98.06% accuracy on the Google Speech Commands V1 dataset.
Harrison Jesse Smith, Qingyuan Zheng, Yifei Li, Somya Jain, Jessica K. Hodgins
Yunbo Zhang, Deepak Gopinath, Yuting Ye, Jessica Hodgins, Greg Turk, Jungdam Won
Simran Arora, Patrick Lewis, Angela Fan, Jacob Kahn, Christopher Ré