Timezone: »
Poster
Hybrid StochasticDeterministic Minibatch Proximal Gradient: LessThanSinglePass Optimization with Nearly Optimal Generalization
Pan Zhou · XiaoTong Yuan
Tue Jul 14 06:00 PM  06:45 PM & Wed Jul 15 04:00 AM  04:45 AM (PDT) @ None #None
Stochastic variancereduced gradient (SVRG) algorithms have been shown to work favorably in solving largescale learning problems. Despite the remarkable success, the stochastic gradient complexity of SVRGtype algorithms usually scales linearly with data size and thus could still be expensive for huge data. To address this deficiency, we propose a hybrid stochasticdeterministic minibatch proximal gradient (HSDMPG) algorithm for stronglyconvex problems that enjoys provably improved datasizeindependent complexity guarantees. More precisely, for quadratic loss $F(\wm)$ of $n$ components, we prove that HSDMPG can attain an $\epsilon$optimizationerror $E[F(\theta)F(\theta^*)] \leq \epsilon$ within $\mathcal{O}\Big(\!\frac{\kappa^{1.5}}{\epsilon^{0.25}}\! \log^{\!1.5}\!\!\big(\frac{1}{\epsilon}\big) \wedge \Big(\!\kappa \sqrt{n} \log^2\!\!\big(\frac{1}{\epsilon}\big) \!+\! \frac{\kappa^3}{n^{1.5}\epsilon} \!\Big)\!\Big)$ stochastic gradient evaluations, where $\kappa$ is condition number. For generic strongly convex loss functions, we prove a nearly identical complexity bound though at the cost of slightly increased logarithmic factors. For largescale learning problems, our complexity bounds are superior to those of the prior stateoftheart SVRG algorithms with or without dependence on data size. Particularly, in the case of $\epsilon\!=\!\mathcal{O}\big(1/\sqrt{n}\big)$ which is at the order of intrinsic excess error bound of a learning model and thus sufficient for generalization, the stochastic gradient complexity bounds of HSDMPG~for quadratic and generic loss functions are respectively $\mathcal{O} (n^{0.875}\log^{1.5}(n))$ and $\mathcal{O} (n^{0.875}\log^{2.25}(n))$, which to our best knowledge, for the first time achieve optimal generalization in less than a single pass over data. Extensive numerical results demonstrate the computational advantages of our algorithm over the prior ones.
Author Information
Pan Zhou (Salesforce)
XiaoTong Yuan (Nanjing University of Information Science & Technology)
More from the Same Authors

2021 Poster: How Important is the TrainValidation Split in MetaLearning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong 
2021 Spotlight: How Important is the TrainValidation Split in MetaLearning? »
Yu Bai · Minshuo Chen · Pan Zhou · Tuo Zhao · Jason Lee · Sham Kakade · Huan Wang · Caiming Xiong