Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

Deep Reinforcement Learning (Deep RL) Workshop at NeurIPS

Abstract

Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference (TD) errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.

Featured Publications

All Publications

A Method for Animating Children’s Drawings of the Human Figure

Harrison Jesse Smith, Qingyuan Zheng, Yifei Li, Somya Jain, Jessica K. Hodgins

Simulation and Retargeting of Complex Multi-Character Interactions

Yunbo Zhang, Deepak Gopinath, Yuting Ye, Jessica Hodgins, Greg Turk, Jungdam Won

Reasoning over Public and Private Data in Retrieval-Based Systems

Simran Arora, Patrick Lewis, Angela Fan, Jacob Kahn, Christopher Ré

All Publications