Popularity Prediction for Social Media over Arbitrary Time Horizons
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
International Parallel and Distributed Processing Symposium (IPDPS)
Data center costs are among the major enterprise expenses, and any improvement in data center resource utilization corresponds to significant savings in true dollars. We focus on the problem of scheduling jobs in distributed execution environments to improve resource utilization. Cluster schedulers like YARN and Mesos base their scheduling decisions on resource requirements provided by end users. It is hard for end-users to predict the exact amount of resources required for a task/ job, especially since resource utilization can vary significantly over time and across tasks. In practice, users pick highly conservative estimates of peak utilization across all tasks of a job to ensure job completion, leading to resource fragmentation and severe under utilization in production clusters. We present UBIS, a utilization-aware approach to cluster scheduling, to address resource fragmentation and to improve cluster utilization and job throughput. UBIS considers actual usage of running tasks and schedules opportunistic work on under-utilized nodes. It monitors resource usage on these nodes and preempts opportunistic containers when over-subscription becomes untenable. In doing so, UBIS utilizes wasted resources while minimizing adverse effects on regularly scheduled tasks. Our implementation of UBIS on YARN yields improvements of up to 30% in makespan for representative workloads and 25% in individual job durations.
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
Liqi Yan, Qifan Wang, Yiming Cu, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu
Patrick Lewis, Barlas Oğuz, Wenhan Xiong, Fabio Petroni, Wen-tau Yih, Sebastian Riedel