Sketched Newton-Raphson

Beyond first-order methods in machine learning systems workshop at ICML

Abstract

We propose a new globally convergent stochastic second order method. Our starting point is the development of a new Sketched Newton-Raphson (SNR) method for solving large scale nonlinear equations of the form F(x) = 0 with F : Rd → Rd. We then show how to design several stochastic second order optimization methods by re-writing the optimization problem of interest as a system of nonlinear equations and applying SNR. For instance, by applying SNR to find a stationary point of a generalized linear model (GLM), we derive completely new and scalable stochastic second order methods. We show that the resulting method is very competitive as compared to state-of-the-art variance reduced methods. Using a variable splitting trick, we also show that the Stochastic Newton method (SNM) is a special case of SNR, and use this connection to establish the first global convergence theory of SNM. By showing that SNR can be interpreted as a variant of the stochastic gradient descent (SGD) method, we are able to leverage proof techniques of SGD and establish a global convergence theory and rates of convergence for SNR. As a special case, our theory also provides a new global convergence theory for the original Newton-Raphson method under strictly weaker assumptions as compared to what is commonly used for global convergence. There are many ways to re-write an optimization problem as nonlinear equations. Each re-write would lead to a distinct method when using SNR. As such, we believe that SNR and its global convergence theory will open the way to designing and analysing a host of new stochastic second order methods.

Latest Publications