Facebook researchers are transforming photography into entirely new consumer experiences—infusing still portraits with emotion and movement, and bringing 3D photography to the casual photographer. The Facebook team will present its latest research and also place the work into a broader context of the evolution of computer graphics and photography in the keynote address at SIGGRAPH Asia 2017 in Bangkok.
Michael Cohen, Facebook’s Director of the Computational Photography Group has been selected to deliver the SIGGRAPH Asia keynote.
In the keynote address, Michael Cohen, Facebook’s Director of the Computational Photography Group, will trace his work from the beginning of computer graphics to the incorporation of computer vision until now, as the two technologies blend seamlessly into new consumer level experiences. The talk will focus on how the lines are now blurred between simple photographs and enhanced experiences that capture the moment in more creative ways. This is also exemplified in the two papers Facebook researchers are presenting at the conference.
Facebook researchers Michael F. Cohen and Johannes Kopf, and Hadar Averbuch-Elor and Daniel Cohen-Or from Tel-Aviv University present their paper Bringing Portraits to Life, inspired by Harry Potter movies where images on the wall or the newspapers could spring to life and even react to the viewer.
Given a single image (top row), our method automatically generates photo-realistic videos that express various emotions. We use driving videos of a different subject and mimic the expressiveness of the subject in the driving video. Representative frames from the videos are displayed above. ©Unsplash photographers Lauren Ferstl, Brooke Cagle, Guillaume Bolduc, Ilya Yakover, Drew Graham and Ryan Holloway.
The research presents a technique to automatically animate a still portrait so the subject in the photo comes to life and expresses various emotions. Using a driving video (of a different subject) they transfer the expressiveness of the subject in the driving video to the target portrait. In contrast to previous work that requires an input video of the target face to reenact a facial performance, the new technique uses only a single target image.
The target image is animated through 2D warps that imitate the facial transformations in the driving video. As warps alone do not carry the full expressiveness of the face, they add fine-scale dynamic details that are commonly associated with facial expressions such as creases and wrinkles. They also hallucinate regions that are hidden in the input target face, most notably in the inner mouth.
The primary application is in creating reactive profiles, where people in still images can automatically interact with their viewers and express emotions. If you send a reactive portrait to someone or include it as part of a comment to someone else’s post, your portrait reflects your emotions, rather than just your static profile picture.
Fig. 9. We generated reactive profiles in the context of a mocked up Facebook page. Video presents a complete demonstration.
In the future, the team will consider combining their technique with 3D methods. For example, if the face departs from a frontal facing pose, they can map the face over a template 3D face and use the rotated 3D model to allow a wider motion of the facial region. They expect other fun applications to emerge from this work. For example, one can imagine coupling this work with an AI to create an interactive avatar starting from a single photograph that could come more to life by reacting with various facial expressions.
Also being presented at SIGGRAPH Asia is the paper Casual 3D Photography by Peter Hedman, University College London and Facebook’s Suhib Alsisan, Richard Szeliski and Johannes Kopf which introduces technology to make 3D photography accessible to the masses.
Imagine if you could capture any place and digitally preserve it in a way that allows you or your friends to virtually immerse themselves in the scene and re-experience the sensation of being there. Now imagine this was nearly as easy as taking a picture today, using a phone or camera you already own. You could capture and share spaces you visit with your friends, so they feel more connected to you, or preserve your personally treasured places forever as digital memories.
In this paper, the team presents technology that enables casual 3D photography, moving us closer towards fulfilling this vision. A person captures a scene by moving a hand-held camera sideways at about a half arm’s length, while taking a series of still images, possibly with the help of a dedicated capture app. The capture is unstructured, i.e., the motion does not have to be precisely executed, and takes just seconds to a few minutes, depending on the desired amount of coverage. Given this input, the algorithm automatically reconstructs a 3D photo, i.e., a textured, panoramic, multi-layered geometric mesh representation.
Our algorithm reconstructs a 3D photo, i.e., a multi-layered panoramic mesh with reconstructed surface color, depth, and normals, from casually captured cell phone or DSLR images. It can be viewed with full binocular and motion parallax in VR as well as on a regular mobile device or in a Web browser. The reconstructed depth and normals allow interacting with the scene through geometry-aware and lighting effects.
The team has developed a novel system to construct seamless two-layer 3D photographs from sequences of casually acquired photographs. Their work builds on a strong foundation in sparse and dense MVS algorithms, with enhanced results due to their novel near envelope cost volume prior. Their parallax-tolerant stitching algorithm further removes many outlier depth artifacts. It produces front and back surface panoramas with well-reconstructed depth edges, because it starts from depth maps whose edges are aligned to the reference image color edges. And, their fusion algorithm fuses the front and back panoramas into a single two-layer 3D photo.
One of our primary design goals is to make the 3D photo capture process easy for inexperienced users: the capture should be handheld, should not take too long, and should be captured with an existing, low-cost camera. These requirements influenced many of the algorithmic design decisions further down the pipeline. This research presents a step towards their goal for all people to easily capture the world around them in enough fidelity to re-experience it.
At Facebook, we want to deliver a range of experiences that can help people share their important moments and communicate with others using visual media. These two papers are parts of different puzzles. One enables new ways of capturing and sharing images of you, and the other new ways of capturing, sharing and experiencing your world.
Our Computational Photography Team explores everything that can happen between light entering your device, and light leaving your device as images or video or interactive experiences. We are also working on short form video, augmented reality experiences, as well as more traditional uses of computational photography to simply make your images and video look better. The power of the camera, and other sensors, as well as computation on mobile devices will allow us to create even more surprising ways to communicate and have fun exploring the world around us.
Oculus researchers and their collaborators will also present their paper Fast Gaze-Contingent Optimal Decompositions for Multifocal Displays.