The Update Equivalence Framework for Decision-Time Planning

arXiv

Abstract

The process of revising (or constructing) a pol- icy immediately prior to execution—known as decision-time planning—is key to achieving superhuman performance in perfect-information set- tings like chess and Go. A recent line of work has extended decision-time planning to more general imperfect-information settings, leading to super- human performance in poker. However, this line of work involves subgames whose sizes scale exponentially in the number of bits of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative perspective on decision-time planning: the framework of update equivalence. In this framework, decision-time planning algorithms are viewed as implementing updates of synchronous learning algorithms. This enables us to introduce a new family of principled decision-time planning algorithms that do not rely on public information, opening the door to sound and effective decision- time planning in settings with large amounts of non-public information. In experiments, members of this family produced comparable or superior results compared to state-of-the-art approaches in Hanabi and improved performance in 3x3 Abrupt Dark Hex and Phantom Tic-Tac-Toe.

Featured Publications