Serving distributed inference deep learning models in serverless computing

IEEE Conference on Cloud Computing (CLOUD)

Abstract

Serverless computing (SC) in an attractive win-win paradigm for cloud providers and customers, simultaneously providing greater flexibility and control over resource utilization for cloud providers while reducing costs through pay-per-use model and no capacity management for customers. While SC has been shown effective for event-triggered web applications, the use of deep learning (DL) applications on SC is limited due to latency-sensitive DL applications and stateless SC. In this paper, we focus on two key problems impacting deployment of distributed inference (DI) models on SC: resource allocation and cold start latency. To address the two problems, we propose a hybrid scheduler for identifying the optimal server resource allocation policy. The hybrid scheduler identifies container allocation based on candidate allocations from greedy strategy as well as deep reinforcement learning based allocation model.

Featured Publications