Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

European Workshop on Reinforcement Learning (EWRL)

Abstract

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, NPG and Q-NPG with log-linear policies can be written as approximate versions of the policy mirror descent (PMD) method. By extending a recent analysis of PMD in the tabular setting, we obtain linear convergence rates and š’Ŗ(1/Īµ2)sample complexities for both NPG and Q-NPG with log-linear policy parametrization using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. As a byproduct, we obtain sublinear convergence rates for both NPG and Q-NPG with arbitrary large constant step sizes.

Featured Publications