Machine Learning and AI experts from around the world will gather in Long Beach, CA, next week at NIPS 2017 to present the latest advances in machine learning and computational neuroscience. Research from Facebook will be presented in 10 peer-reviewed publications and posters. Our researchers and engineers will also be leading and presenting numerous workshops, symposiums and tutorials throughout the week.
For the first time ever, we’ll be hosting a Facebook LIVE, and streaming many of the NIPS sessions from our Facebook page starting Monday at 5:30 pm. Be sure to tune in if you can’t be there in person.
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, and Dhruv Batra
Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. For instance, consider a blind user on social media. Their friend uploads a picture. It would be nice if the AI can describe it, say “John just uploaded a picture from his vacation in Hawaii”. The user might then ask: “Great, is he at the beach?”. We would want the AI to respond naturally and accurately with something like “No, on a mountain.” Or if you are interacting with an AI Assistant, you might say “Can you see the baby in the baby monitor?” AI: “Yes, I can.” You: “Is he sleeping or playing?” and we would want an accurate answer. Or consider a human-robot team for search and rescue missions: Human: “Is there smoke in any room around you?” AI: “Yes, in one room.” Human: “Go there and look for people”. We would want the AI to understand this instruction in reference to the earlier conversation. In this paper, we develop state-of-the-art neural models that given an image, a dialog history, and a follow-up question about the image, answer the question.
ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games
Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, and Larry Zitnick
ELF is an Extensive, lightweight and flexible platform that supports parallel simulations of game environments. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with several games. One game, Mini-RTS, is a miniature version of StarCraft, captures key game dynamics and runs at 165K frame-per-second (FPS) on a laptop, an order of magnitude faster than other platforms. With only a single GPU and several CPUs, we train an AI for MiniRTS in an end-to-end manner to beat a rule-based system with over 70% win rate.
Fader Networks: Manipulating Images by Sliding Attributes
Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, and Marc’Aurelio Ranzato
We introduce a new encoder-decoder architecture that is trained to reconstruct images by disentangling their salient information and the values of particular attributes directly in a latent space using adversarial training. This disentanglement allows us to manipulate these attributes and to generate variations of pictures of faces, like fantasizing the younger or older version of a person, while preserving their naturalness. Compared to the state-of-the-art which mostly relies on training adversarial networks in pixel space by altering attribute values at train time, our approach results in much simpler training schemes and nicely scales to manipulating multiple attributes jointly.
Gradient Episodic Memory for Continual Learning
David Lopez-Paz and Marc’Aurelio Ranzato
Machine learning models struggle to learn new problems without forgetting previous tasks. In this paper, we propose new learning metrics to evaluate how models transfers knowledge across a sequence of learning tasks. Finally, we propose a novel state-of-the-art algorithm, the Gradient Episodic Memory (GEM), that allows learning machines to learn new tasks without forgetting past skills.
Houdini: Fooling Deep Structured Prediction Models
Moustapha Cisse, Yossi Adi, Natalia Neverova, and Joseph Keshet
Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.
One-Sided Unsupervised Domain Mapping
Sagie Benaim and Lior Wolf
One of the major discoveries in 2017 was that it is possible to learn analogies between two visual domains without any matching training samples. For example, given an image of a handbag, these methods can find matching shoes although it never witnessed such matchings. The recent methods all need to learn to map from one domain to the other and back. In our work, we propose a solution that does not require completing this cycle and is therefore more efficient. We also demonstrate that the resulting mappings are significantly more accurate.
On the Optimization Landscape of Tensor Decompositions
Rong Ge and Tengyu Ma
We analyze the optimization landscape of the random over-complete tensor decomposition problem, which has many applications in unsupervised learning, especially in learning latent variable models. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. In practice, it can be efficiently solved by gradient ascent on a non-convex objective. Our theoretical result show that for any small constant ϵ>0, among the set of points with function values (1+ϵ)-factor larger than the expectation of the function, all the local maxima are approximate global maxima.
Poincaré Embeddings for Learning Hierarchical Representations
Maximilian Nickel and Douwe Kiela
Representation learning has become an invaluable approach for modeling symbolic data such as text and graphs. This type of data often exhibits a latent hierarchical structure: for instance, all dolphins are mammals, all mammals are animals, all animals are living things, and so forth. Capturing this kind of hierarchical structure can be beneficial for many core problems in artificial intelligence, such as reasoning about entailment or modeling complex relationships. We introduce a new approach for learning representations that simultaneously capture information about both hierarchy and similarity. We achieve this by changing the underlying geometry of the embedding space and introduce an efficient algorithm to learn these hierarchical embeddings. We show experimentally that the proposed model significantly outperforms standard methods on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.
Unbounded Cache Model for Online Language Modeling with Open Vocabulary
Edouard Grave, Moustapha Cisse, and Armand Joulin
Modern machine learning methods are not robust to change in the data distribution, between training and testing. This problem arises, for example, when training a model on Wikipedia and testing it on news data. In this work, we propose a large-scale non-parametric memory component, used to dynamically adapt models to new data distributions. We apply this technique to language modeling, where the train and test data come from two different domains (such as Wikipedia and news).
VAIN: Attentional Multi-agent Predictive Modeling
Yedid Hoshen
Predicting the behavior of a large social or physical system requires modeling the interactions between its agents. Recent progress has significantly improved predictions by modeling each interaction as a neural network, however this approach comes at a prohibitive cost. In this work, we replace the expensive interaction model by a simple attentional mechanism, achieving similar accuracy but at a much lower cost. The linear complexity of our method can enable accurate multi-agent predictive models operate at much larger scales.
Geometric Deep Learning on Graphs and Manifolds
Arthur Szlam and Yann LeCun
Black in AI Workshop
Moustapha Cisse, Facebook organizer
Deep Learning at Supercomputer Scale Workshop
ImageNet training in 1 hour – Priya Goyal, Invited talk
Deep Reinforcement Learning Symposium
Joelle Pineau, Invited talk
Emergent Communication Workshop
Kyunghyun Cho and Douwe Kiela, Facebook organizers
Interpretable Machine Learning Symposium
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Marcus Rohrbach, Paper
Learning Disentangled Representations: from Perception to Control Workshop
Diane Bouchacourt, Facebook organizer
Learning in the Presence of Strategic Behavior Workshop
Alexander Peysakhovich, Invited talk
Machine Learning on the Phone and other Consumer Devices Workshop
Joaquin Quiñonero Candela, Facebook organizer
Machine Learning Systems Workshop
Sarah Bird and Aparna Lakshmi Ratan, Facebook organizers
Optimization for Machine Learning Workshop
Non-Uniform Stochastic Dual Coordinate Ascent for Conditional Random Fields
Remi Le Priol, Ahmed Touati, and Simon Lacoste-Julien, Poster
Women in Machine Learning (WiML) workshop
Joelle Pineau, Invited talk, and Joelle Pineau and Devi Parikh, Round table session
Workshop on Automated Knowledge Base Construction (AKBC)
Learning Hierarchical Representations of Relational Data
Maximilian Nickel, Invited talk
Workshop on Conversational AI: Today’s Practice and Tomorrow’s Potential
Antoine Bordes, Facebook organizer
Joelle Pineau, Invited talk
Workshop on Visually-Grounded Interaction and Language (ViGIL)
Devi Parikh, Dhruv Batra, Facebook organizers
Embodied Question Answering – Devi Parikh, Invited talk