Facebook and Microsoft are today introducing Open Neural Network Exchange (ONNX) format, a standard for representing deep learning models that enables models to be transferred between frameworks. ONNX is the first step toward an open ecosystem where AI developers can easily move between state-of-the-art tools and choose the combination that is best for them.
When developing learning models, engineers and researchers have many AI frameworks to choose from. At the outset of a project, developers have to choose features and commit to a framework. Many times, the features chosen when experimenting during research and development are different than the features desired for shipping to production. Many organizations are left without a good way to bridge the gap between these operating modes and have resorted to a range of creative workarounds to cope, such as requiring researchers work in the production system or translating models by hand.
We developed ONNX together with Microsoft to bridge this gap and to empower AI developers to choose the framework that fits the current stage of their project and easily switch between frameworks as the project evolves. Caffe2, PyTorch, and Cognitive Toolkit will all be releasing support for ONNX in September, which will allow models trained in one of these frameworks to be exported to another for inference. We invite the community to join the effort and support ONNX in their ecosystem. Enabling interoperability between different frameworks and streamlining the path from research to production will help increase the speed of innovation in the AI community.
ONNX is an important part of our deep learning approach here at Facebook. In Facebook’s AI teams (FAIR and AML), we are continuously trying to push the frontier of AI and develop better algorithms for learning. When we have a breakthrough, we’d like to make the better technologies available to people as soon as possible in our applications. With ONNX, we are focused on bringing the worlds of AI research and products closer together so that we can innovate and deploy faster.
People experimenting with new models, and particularly those in research, want maximum flexibility and expressiveness in writing neural networks — ranging from dynamic neural networks to supporting gradients of gradients, while keeping a bread-and-butter ConvNet performant. Researchers also want to iterate rapidly, which means that they need excellent tooling for interactive development and debugging. PyTorch has been built to push the limits of research frameworks, to unlock researchers from the constraints of a platform and allow them to express their ideas easier than before.
Conversely, product pipelines run training and inference every day on massive amounts of new data, while keeping the model largely constant. Carefully micro-optimizing the code specific to the product’s particular model, by doing tricks such as quantization and writing carefully hand-tuned code saves resources. [GG1] Caffe2 has been built with products, mobile, and extreme performance in mind. The internals of Caffe2 are flexible and highly optimized, so we can ship bigger and better models into underpowered hardware using every trick in the book.
With ONNX, we can get the best of both worlds. We can now export models for many common neural networks from PyTorch and deploy them on Caffe2. This is the first step in enabling us to rapidly move our latest research developments into production. Over the coming months, we will be enhancing ONNX and releasing improvements to Caffe2 and PyTorch that enable them to interoperate more deeply.
To implement ONNX support, we had to make changes to both PyTorch and Caffe2 and also unify operators between the frameworks. In Caffe2, this process was similar to adding a translator because Caffe2 already had a static graph representation built-in. In PyTorch, neural networks are specified as programs rather than explicit graphs, which posed a bigger challenge. In order to extract a graph from the program, we developed a tracer, which “traces”, i.e. records, the execution of the program as it runs. Tracing the program eliminates complexity and makes it easier to translate into a graph representation.
To see how it works, consider the following piece of code:
x = y * 2 if someComplicatedFunction(): z = x + y else: z = x * y
To directly export this code, ONNX would have to support conditionals and someComplicatedFunction(); in effect, becoming a general purpose programming language. However, in many deep learning models, the result of someComplicatedFunction() is always the same during inference. For example, in PyTorch conditionals are often some computation on the sizes or dimensions of input tensors. In these cases, a single trace through the code would be far simpler and can easily be represented in ONNX:
#someComplicatedFunction() == True x = y * 2 z = x + y
Currently, our tracer works with many common neural networks, but not some of the more advanced programs in PyTorch such as those with dynamic flow control. Over time, we will enhance ONNX and the tracer to support these programs, so that developers can leverage full flexibility of PyTorch with the high-performance robust deployment capabilities of Caffe2.
We are releasing the newest versions of Caffe2 and PyTorch with ONNX support today. We hope that you’re as excited as we are about the new features! They are at an early stage, so we hope that you’ll check them out and submit your feedback and improvements. We’ll continue to evolve ONNX, PyTorch, Caffe2 to make sure developers have the latest tools for AI, so expect more updates soon!