Virtual (VR) and Augmented Reality (AR) has garnered mainstream attention with products such as the Oculus Rift and Oculus Go. However, these products have yet to find broad adoption by consumers. Mass market appeal for these products may require revolutions in comfort, utility, performance, and consideration of user awareness and privacy related to eye-tracking features. These revolutions can, in part, be enabled by measuring where an individual is looking, where his/her pupils are, and his/her eye expression – colloquially known as eye tracking. For example, foveated rendering greatly reduces the power required to render realistic scenes in a virtual environment.
The goal for this workshop is to engage the broader community of computer vision and machine learning scientists in a discussion surrounding the importance of eye-tracking solutions for VR and AR that work, for all individuals, under all environmental conditions.
This workshop will host two challenges that are structured around 2D eye-image datasets that we have collected using a prototype VR head mounted device. More information about these challenges is located here. Entries to these challenges will address some outstanding questions relevant to the application of eye-tracking for VR and AR platforms. We anticipate that the dataset released as part of the challenges will also serve as a benchmark dataset for future research in eye- tracking for VR and AR.
Below is the list of topics that are of particular interest for this workshop:
ICCV Workshop: Eye Tracking for VR and AR
November 2, 9:00 am to 5:00 pm
Submissions must be written in English and must be sent in PDF format. Each submitted paper must be no longer than four (4) pages, excluding references. Please refer to the ICCV submission guidelines for instructions regarding formatting, templates, and policies. The submissions will be reviewed by the program committee and selected papers will be published in ICCV Workshop proceedings.
Submit your paper using this link before the August 31st deadline.
TIER 1:
TIER 2:
Oleg Komogortsev
Texas State University
Talk Title: Eye Movement Detection Sensors, User Authentication, and Health Assessment
Abstract: The availability of eye movement detection sensors is set to explode, with billions of units available in future Virtual Reality (VR) and Augmented Reality (AR) platforms. In my talk I will discuss the past, present, and future of such sensors and their applications. I will discuss both the applications that initially necessitate the presence of such sensors in VR/AR devices, along with additional uses that would be enabled by those sensors such as eye movement driven biometrics and health assessment.
Jeff Pelz
Rochester Institute of Technology
Talk Title: The Convergence of Computer Graphics, Computer Vision, and Machine Learning in Eyetracking
Abstract: Video-based eyetracking became practical over 50 years ago with the development of analog Pupil-Corneal Reflection systems. Those systems evolved rapidly, taking advantage of the miniaturization of video cameras and the ability to perform simple video operations such as thresholding and digitization in real time on small computers. The advent of compact, efficient computer-vision modules enabled more complex eye-tracking algorithms to be implemented in the collection and analysis of eyetracking data. More recently, computer graphics has been leveraged to generate artificial images to model eyetracking systems and create images with known properties for training, simplifying and speeding the development of new systems and algorithms. The recent explosion of machine learning has brought similar advances in the difficult problems of eye-image segmentation and event detection. I will discuss how the convergence of advances in computer graphics, computer vision, and machine learning is revolutionizing eyetracking by supporting machine learning-based systems that can be trained on computer-generated ‘ground-truth’ data.
Matthias Kummerer
University of Tubingen
Talk Title: DeepGaze III: Deep Learning for predicting and understanding human free-viewing scanpaths
Abstract: Many animals gather high-resolution visual information only in the fovea, therefore they must make eye movements to explore the visual world. How fixation locations are selected has been debated for decades in neuroscience and psychology. Because different observers fixate similar image locations, it has been proposed that fixations are driven by a spatial priority or “saliency” map. The saliency map hypothesis states that priority values are assigned locally to image locations, independent of saccade history, and are only later combined with saccade history and other constraints (e.g. task demands) to select the next fixation location. A second hypothesis is that there are interactions between saccade history and image content that cannot be summarized by a single value. For example, if after long saccades different content drives the next fixation than after short saccades, then it is impossible to assign a single saliency value to image locations. Here we discriminate between these possibilities in a data-driven manner. Using human free-viewing eye scan path data we train a new model “DeepGaze III”. Given a prior scanpath history, the model predicts the next fixation location using either a simple saliency map or allowing for more complicated interactions via multiple saliency maps. DeepGaze III achieves state-of-the-art performance compared to previous scanpath models and reproduces key statistics of human scanpaths such as the distribution of saccade lengths and of angles between saccades. We find that using multiple saliency maps gives no advantage in scanpath prediction compared to a single saliency map. Since the number of saliency maps the network can use imposes strong qualitative constraints on what the model is able to predict, this suggests that – at least for free-viewing – a single saliency map may exist that does not depend on either current or previous gaze locations. This provides evidence that in VR selective rendering might be helpful even in settings without eye tracking. In AR/VR settings with eye tracking, DeepGaze III could be used for even more applications. Selective rendering or rendering priorization could be substantially improved by conditioning on the previous gaze path. Also, multiple models trained on different tasks could be used for task inference.
Ming-Yu Liu
NVIDIA
Talk Title: Few-Shot Unsupervised Image-to-Image Translation
Abstract: Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to many images in both source and destination classes at training time. We argue this greatly limits their use. Drawing inspiration from the human capability of picking up the essence of a novel object from a small number of examples and generalizing from there, we seek a few-shot, unsupervised image-to-image translation algorithm that works on previously unseen target classes that are specified, at test time, only by a few example images. Our model achieves this few-shot generation capability by coupling an adversarial training scheme with a novel network design. Through extensive experimental validation and comparisons to several baseline methods on benchmark datasets, we verify the effectiveness of the proposed framework.
Kaan Aksit
NVIDIA
Talk Title: Eye tracking for next generation displays
Abstract: Next generation Virtual and Augmented reality near-eye displays promise an immersive visual experience with the help of an eye tracker. In this talk, I will overview such near-eye display architectures with a specific focus on eye tracking, and provide guidance to the remaining challenges.
Satya Mallick
Interim-CEO, OpenCV.org; CEO & Founder, Big Vision LLC
Talk Title: Gaze Estimation Overview: A Computer Vision Scientist’s perspective
Abstract: In this talk, we will cover an overview of several gaze tracking algorithms and datasets. We will learn the conditions under which these algorithms and architectures can be employed and their limitations. One of the challenges in gaze tracking is the availability of real datasets. We will learn how synthetic data is being used to produce state of the art results. Our goal is to cover a breadth of ideas without going into extreme depth about any one algorithm. This talk will be useful for people who are interested in gaze tracking and want to get an overview before diving deep into the problem.
Email: openedschallenge@fb.com