Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations



We show that the gradient descent algorithm provides an implicit regularization effect in the learning of over-parameterized matrix factorization models and one-hidden-layer neural networks with quadratic activations.

Concretely, we show that given Õ(dr2) random linear measurements of a rank r positive semidefinite matrix X*, we can recover X* by parameterizing it by UU with U ∈ Rd×d and minimizing the squared loss, even if rd. We prove that starting from a small initialization, gradient descent recovers X* in Õ(√r) iterations approximately. The results solve the conjecture of Gunasekar et al. [16] under the restricted isometry property.

The technique can be applied to analyzing neural networks with one-hidden-layer quadratic activations with some technical modifications.

Featured Publications