Learning One-hidden-layer Neural Networks with Landscape Design

International Conference on Learning Representations (ICLR)

Abstract

We consider the problem of learning a one-hidden-layer neural network: we assume the input x ∈ R^d is from Gaussian distribution and the label y = a^Tσ(Bx) + ξ, where a is a nonnegative vector in R m with m ≤ d, B ∈ R^m×d is a full-rank weight matrix, and ξ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function G(·) whose landscape is guaranteed to have the following properties:

All local minima of G are also global minima.
All global minima of G correspond to the ground truth parameters.
The value and gradient of G can be estimated using samples.

With these properties, stochastic gradient descent on G provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity results and validate the results by simulations.

Featured Publications

All Publications

Research