Weighted Pointer: Error-aware Gaze-based Interaction through Fallback Modalities
Ludwig Sidenmark, Mark Parent, Chi-Hao Wu, Joannes Chan, Michael Glueck, Daniel Wigdor, Tovi Grossman, Marcello Giordano
Interspeech
We present an upper bound for the Single Channel Speech Separation task, which is based on an assumption regarding the nature of short segments of speech. Using the bound, we are able to show that while the recent methods have made great progress for a few speakers, there is room for improvement for five and ten speakers. We then introduce a Deep neural network, SepIt, that iteratively improves the different speakers’ estimation. At test time, SpeIt has a varying number of iterations per test sample, based on a mutual information criterion that arises from our analysis. In an extensive set of experiments, SepIt outperforms the state of the art neural networks for 2, 3, 5, and 10 speakers.
Ludwig Sidenmark, Mark Parent, Chi-Hao Wu, Joannes Chan, Michael Glueck, Daniel Wigdor, Tovi Grossman, Marcello Giordano
Simon Vandenhende, Dhruv Mahajan, Filip Radenovic, Deepti Ghadiyaram
Justin Theiss, Jay Leverett, Daeil Kim, Aayush Prakash