Research from Meta

All Publications

December 16, 2019
Meixu Chen, Yize Jin, Todd Goodall, Xiangxu Yu, Alan C. Bovik

Virtual Reality (VR) and its applications have attracted significant and increasing attention. However, the requirements of much larger file sizes, different storage formats, and immersive viewing conditions pose significant challenges to the goals of acquiring, transmitting, compressing and displaying high quality VR content. Towards meeting these challenges, it is important to be able to understand the distortions that arise and that can affect the perceived quality of displayed VR content. It is also important to develop ways to automatically predict VR picture quality. Meeting these challenges requires basic tools in the form of large, representative subjective VR quality databases on which VR quality models can be developed and which can be used to benchmark VR quality prediction algorithms. Towards making progress in this direction, here we present the results of an immersive 3D subjective image quality assessment study.

Areas
December 15, 2019
Jennifer L. Sullivan, Nathan Dunkelberger, Joshua Bradley, Joseph Young, Ali Israr, Frances Lau, Keith Klumb, Freddy Abnousi, Marcia K. O’Malley

We present experimental results that demonstrate that rendering haptic cues with multi-sensory components—specifically, lateral skin stretch, radial squeeze, and vibrotactile stimuli—improved perceptual distinguishability in comparison to similar cues with all-vibrotactile components. These results support the incorporation of diverse stimuli, both vibrotactile and non-vibrotactile, for applications requiring large haptic cue sets.

Areas
December 15, 2019
Lawrence H. Kim, Pablo Castillo, Sean Follmer, Ali Israr

One of the challenges in the field of haptics is to provide meaningful and realistic sensations to users. While most real world tactile sensations are composed of multiple dimensions, most commercial product only include vibration as it is the most cost effective solution. To improve on this, we introduce VPS (Vibration, Pressure, Shear) display, a multi-dimensional tactile array that increases information transfer by combining Vibration, Pressure, and Shear similar to how RGB LED combines red, blue, and green to create new colors.

Areas
December 14, 2019
Duc Le, Xiaohui Zhang, Weiyi Zhang, Christian Fuegen, Geoffrey Zweig, Michael L. Seltzer

There is an implicit assumption that traditional hybrid approaches for automatic speech recognition (ASR) cannot directly model graphemes and need to rely on phonetic lexicons to get competitive performance, especially on English which has poor grapheme-phoneme correspondence. In this work, we show for the first time that, on English, hybrid ASR systems can in fact model graphemes effectively by leveraging tied context-dependent graphemes, i.e., chenones.

December 13, 2019
David Novotny, Benjamin Graham, Jeremy Reizenstein

Given a set of a reference RGBD views of an indoor environment, and a new viewpoint, our goal is to predict the view from that location. Prior work on new-view generation has predominantly focused on significantly constrained scenarios, typically involving artificially rendered views of isolated CAD models. Here we tackle a much more challenging version of the problem. We devise an approach that exploits known geometric properties of the scene (per-frame camera extrinsics and depth) in order to warp reference views into the new ones.

December 13, 2019
Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze

The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors. Usually, either a quality control stage discards transcriptions with too many errors, or the noisy transcriptions are used as is. We introduce Lead2Gold, a method to train an ASR system that exploits the full potential of noisy transcriptions.

December 12, 2019

People can learn a new concept and use it compositionally, understanding how to “blicket twice” after learning how to “blicket.” In contrast, powerful sequence-to-sequence (seq2seq) neural networks fail such tests of compositionality, especially when composing new concepts together with existing concepts. In this paper, I show how memory-augmented neural networks can be trained to generalize compositionally through meta seq2seq learning.

December 11, 2019
Stéphane d'Ascoli, Levent Sagun, Joan Bruna, Giulio Biroli

Despite the phenomenal success of deep neural networks in a broad range of learning tasks, there is a lack of theory to understand the way they work. In particular, Convolutional Neural Networks (CNNs) are known to perform much better than Fully-Connected Networks (FCNs) on spatially structured data: the architectural structure of CNNs benefits from prior knowledge on the features of the data, for instance their translation invariance. The aim of this work is to understand this fact through the lens of dynamics in the loss landscape.