Popularity Prediction for Social Media over Arbitrary Time Horizons
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
International Conference on Learning Representations (ICLR)
We consider the problem of learning a one-hidden-layer neural network: we assume the input x ∈ Rd is from Gaussian distribution and the label y = aTσ(Bx) + ξ, where a is a nonnegative vector in R m with m ≤ d, B ∈ Rm×d is a full-rank weight matrix, and ξ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function G(·) whose landscape is guaranteed to have the following properties:
With these properties, stochastic gradient descent on G provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity results and validate the results by simulations.
Daniel Haimovich, Dima Karamshuk, Thomas Leeper, Evgeniy Riabenko, Milan Vojnovic
Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai, Michael Gschwind, Anurag Gupta, Myle Ott, Anastasia Melnikov, Salvatore Candido, David Brooks, Geeta Chauhan, Benjamin Lee, Hsien-Hsin S. Lee, Bugra Akyildiz, Max Balandat, Joe Spisak, Ravi Jain, Mike Rabbat, Kim Hazelwood
Liqi Yan, Qifan Wang, Yiming Cu, Fuli Feng, Xiaojun Quan, Xiangyu Zhang, Dongfang Liu