Stochastic optimization with decision-dependent distributions

Mathematical Programming Journal

Abstract

Stochastic optimization problems often involve data distributions that change in reaction to the decision variables. This is the case for example when members of the population respond to a deployed classifier by manipulating their features so as to improve the likelihood of being positively labeled. Recent works on performative prediction have identified an intriguing solution concept for such problems: find the decision that is optimal with respect to the static distribution that the decision induces. Continuing this line of work, we show that in the strongly convex setting, typical stochastic algorithms—originally designed for static problems—can be applied directly for finding such equilibria with little loss in efficiency. The reason is simple to explain: the main consequence of the distributional shift is that it corrupts algorithms with a bias that decays linearly with the distance to the solution. Using this perspective, we obtain convergence guarantees for popular algorithms, such as stochastic gradient, clipped gradient, prox-point, and dual averaging methods, along with their accelerated and proximal variants. In realistic applications, deployment of a decision rule is often much more expensive than sampling. We show how to modify the aforementioned algorithms so as to maintain their sample efficiency while performing only logarithmically many deployments.

Featured Publications