January 17, 2019

POPL 2019 Most Influential Paper Award for research that led to Facebook Infer

By: Meta Research

We’re excited to congratulate Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn (Facebook) and Hongseok Yang (KAIST) on receiving the ACM SIGPLAN Most Influential POPL Paper Award for their paper “Compositional shape analysis by means of bi-abduction,” which they presented at POPL 2009. The techniques described in this paper were utilized in the tool Infer, which was open-sourced in 2015 and is also used at other major tech companies.

Cristiano, Dino and Peter all joined the Facebook London engineering team in 2013 with the acquisition of Monoidics, a startup they founded. We recently caught up with the three of them to learn more about the research paper that won the Most Influential Paper Award at POPL 2019.

Q: What was the research about?

It was about writing a computer program that can show (prove) that another program does not have memory safety errors — like crashes from accessing null or undefined pointers, and memory leaks. This kind of program can also find bugs when the attempt to show their absence fails. This work is part of research areas known as program verification and automatic program analysis. At the time, reasoning accurately about memory was one of the main open problems in the field, holding back the application of reasoning techniques to practical industrial code.

Q: What led you to do the initial research?

Partial success, followed by daring to dream what might be possible if we went far beyond what was possible at the time.

It started when we (the four authors, Cristiano, Dino, Peter, Hongseok) met in a cafe in the Shoreditch area of East London to talk about research ideas. We had published a paper in the CAV ’08 conference which proved memory safety properties for C programs up to 10k lines of code, and found some bugs in Microsoft device driver that had been confirmed and fixed by the Windows kernel team. This was absolutely at the leading edge of the research literature. But one of us, Cristiano, was very dissatisfied. “These results aren’t good enough,” he said, because “10k is too small.” Cristiano and Dino were dreaming about using our techniques to start a company, and Cristiano saw that we needed to go much further than the academic leading edge at the time if we were to tackle the kinds of large codebases one finds in the major companies that might be clients.

“What is the main problem blocking application to one million lines of code?” Cristiano then asked. It was an outlandish question, because the leading research had seldom reached even one thousand lines for this sort of problem, let alone a million. Spurred by Cristiano, Peter shot back: “We need a new, compositional version of our techniques.” Peter outlined the basic scheme for an analysis where little bits of proof would be done independently and then stitched together, and he gave an argument for how it should scale if we could do it…but he did not explain how the scheme could actually be realized; it might have just been a fanciful idea.

We all, the four authors, set about working on converting this so-far-fanciful scheme — automatic, compositional memory safety proving (shape analysis, in the jargon) — into something real. Hongseok made a valuable step, based on mining information from failed proofs to automatically discover preconditions for individual functions. Then Dino made a breakthrough, in discovering a logical concept (that we eventually called bi-abduction) to do the stitching that Peter had asked for. Dino wrote a prototype reasoning tool that had encouraging initial results, and it was clear we were onto something. We all worked on the theory that would become the POPL paper, and Cristiano led the further implementation effort including a new species of automatic theorem prover, a “bi-abductive prover.” The new reasoning technique soon powered its way to over a million lines of code in an experiment, we had a nice celebration, and then set about writing the POPL ’09 paper to tell the world about it.

Q: What happened when the paper was originally published, and how was it received in the community then?

The paper got a good reception at POPL, but an even better reception when we presented demos. Cristiano has a slogan — seeing is believing — that we kept in mind when thinking how to present our ideas.

Dino worked out a demo where we would analyze a program of a few hundred thousand lines of code (often, OpenSSL), and this would take 20 minutes or so, so that the analysis was finished during a talk we would be giving. Then, we would re-run the analysis and confirm that the bug was fixed, all right on the spot in front of the audience. The re-run would take in a few seconds rather than tens of minutes, because we only needed to re-analyze the changed part of the program: the algorithm was incremental. This illustrated very concretely how the reasoning technique might be used during program development, not only in in a long-running batch mode. This often caused excitement in the audience, because the potential of the technique was clear to see. Incidentally, it is this incrementality that led to subsequent impact at Facebook (as we describe in the next question).

Although the majority of the reaction was enthusiastic, a small number of insiders in the program analysis research community reacted less positively. Although we went from a thousand to a million lines of code, we received complaints that the million-line-analysis did not do the exact same thing as the thousand-line-variant. We had been up front and admitted as much — our new algorithm sometimes lost precision — but some people thought the job of the compositional technique should be to match the slow-running one in everything except speed. In hindsight, moving this sort of proving to big codebases was entering a new and unfamiliar arena, and perhaps not everybody appreciated how many opportunities scaling to big code could offer.

Q: How has the work been built upon?

Shortly after the POPL paper, Cristiano and Dino founded a startup company, Monoidics, to commercialize the techniques, and they convinced Peter to join them. After hardening the techniques and trying them out with some customers, Monoidics was acquired in 2013, and a team of seven including Cristiano, Dino and Peter joined the Facebook London engineering office. Their tool, Infer, was deployed inside Facebook in early 2014, and since then has discovered over 100,000 issues which have been fixed by Facebook’s developers before code reaches production.

In the development of Infer the techniques from the POPL paper were generalized to other reasoning techniques, but all of the checks in Infer share the compositional analysis architecture and incremental algorithmic basis of the original Infer. Infer participates as a bot during Facebook’s code review process, where it automatically comments on code modifications that are submitted for check-in to our code bases. Although the code bases are large — in the tens of millions of lines of code — Infer runs quickly on the code changes in an incremental fashion, reminiscent of the early Infer demos described previously.

Infer is run on the codebases for the WhatsApp, Messenger, Instagram and Facebook mobile apps. Thus the techniques in the POPL paper have directly affected apps used by billions of people on a regular basis. Infer was open-sourced in 2015 and is also used at other companies, including Amazon, Mozilla and Spotify.

Q: Were there any surprises along the way?

We knew that speed, scale and incrementality were important, but just how important took us by surprise.

When we landed in the company we learned that Facebook’s codebases were being modified at what seemed like a breathtaking rate, to the tune of hundreds of thousands of times per week. Infer would have to run on all these code changes. This was far beyond anything we had imagined at the time of the POPL paper, and even far beyond anything we had ever heard about for a deployment of logical reasoning. It turned out that Infer’s incrementality meant that it was up to the task: thousands of analysis runs per day would each perform the little bits of independent proof made possible by Infer’s compositional technique.

Scale was one thing, but an even bigger surprise was how important incrementality — dealing quickly with code changes — was for addressing the human side of the engineering problem we faced. We first tried a “batch” deployment of Infer, where it was run overnight and produced bug lists that developers might act on. Issues were presented to developers outside their workflow and the fix rate — the rate at which they fixed the issues discovered — was near 0%. Then, when we switched on Infer in an incremental-online mode, as a bot during code review, the fix rate shot up to 70%. We were stunned. Although the POPL paper had provided the techniques to enable this form of analysis, we had no idea at the time how powerful the effect of its fast incremental analysis of code changes would be.

In retrospect, we were fortunate that Facebook perceived the importance of the scale and incrementality in the POPL ’09 techniques. We now recall that when we talked with Facebook about potentially joining, it was laser focused on the limitations of batch deployment (the typical model presumed in academia at the time) and could see the potential of our techniques, even if we did not fully appreciate it at the time.

Q: What is your current focus?

We’re working on a variety of problems of importance to the company, including problems related to concurrent programming, to program performance, and to UI programming. Stay tuned!