Characterization of MPC-based Private Inference for Transformer-based Models

International Symposium on Performance Analysis of Systems and Software (ISPASS)

Abstract

In this work, we provide an in-depth characterization study of the performance overhead for running Transformer models with secure multi-party computation (MPC). MPC is a cryptographic framework for protecting both the model and input data privacy in the presence of untrusted compute nodes. Our characterization study shows that Transformers introduce several performance challenges for MPC-based private machine learning inference. First, Transformers rely extensively on “softmax” functions. While softmax functions are relatively cheap in a non-private execution, softmax dominates the MPC inference runtime, consuming up to 50% of the total inference runtime. Further investigation shows that computing the maximum, needed for providing numerical stability to softmax, is a key culprit for the increase in latency. Second, MPC relies on approximating non-linear functions that are part of the softmax computations, and the narrow dynamic ranges make optimizing softmax while maintaining accuracy quite difficult. Finally, unlike CNNs, Transformer-based NLP models use large embedding tables to convert input words into embedding vectors. Accesses to these embedding tables can disclose inputs; hence, additional obfuscation for embedding access patterns is required for guaranteeing the input privacy. One approach to hide address accesses is to convert an embedding table lookup into a matrix multiplication. However, this naive approach increases MPC inference runtime significantly. We then apply tensortrain (TT) decomposition, a lossy compression technique for representing embedding tables, and evaluate its performance on embedding lookups. We show the trade-off between performance improvements and the corresponding impact on model accuracy using detailed experiments.

Latest Publications

Presto: A Decade of SQL Analytics at Meta

Yutian James Sun, Tim Meehan, Rebecca Schlussel, Wenlei Xie, Masha Basmanova, Orri Erling, Andrii Rosa, Shixuan Fan, Rongrong Zhong, Arun Thirupathi, Nikhil Collooru, Ke Wang, Sameer Agarwal, Arjun Gupta, Dionysios Logothetis, Kostas Xirogiannopoulos, Bin Fan, Amit Dutta, Varun Gajjala, Rohit Jain, Ajay Palakuzhy, Prithvi Pandian, Sergey Pershin, Abhisek Saikia, Pranjal Shankhdhar, Neerad Somanchi, Swapnil Tailor, Jialiang Tan, Sreeni Viswanadha, Zac Wen, Deepak Majeti, Aditi Pandit, Biswapesh Chattopadhyay

SIGMOD - 2023