Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay

Deep Reinforcement Learning (Deep RL) Workshop at NeurIPS

Abstract

Prioritized Level Replay (PLR) has been shown to induce adaptive curricula that improve the sample-efficiency and generalization of reinforcement learning policies in environments featuring multiple tasks or levels. PLR selectively samples training levels weighed by a function of recent temporal-difference (TD) errors experienced on each level. We explore the dispersion of returns as an alternative prioritization criterion to address certain issues with TD error scores.

Featured Publications