
"Second-order optimizers converge with fewer steps than first-order optimizers. So why are second-order optimizers not being used for deep learning? Because each step of their learning is computationally expensive. In this talk, I will present a few techniques to efficiently use second-order optimization ideas for deep learning. This optimizer, WhiteGrad, converges faster than SGD and Adam. It uses a form of whitening that involves matrix multiplication. We propose a decomposition technique that makes whitening fast. In my talk, I will present the intuition and algorithm behind this optimizer. I will also show some promising results on learning convolutional networks and vision transformers. I will finally discuss a range of new ideas that become possible with this optimizer.
"
Qatar Computing Research Institute