We use cookies to make your experience better. To comply with the new e-Privacy directive, we need to ask for your consent to set the cookies. Learn more.
Try it on your next unstable training run. You might be surprised. π
Most deep learning practitioners reach for Adam by default. But when training on tasks with noisy or sparse gradients (like GANs, reinforcement learning, or large-scale language models), Adam can sometimes struggle with sudden large gradient updates that destabilize training. yogi optimizer
Enter (You Only Gradient Once).
Beyond Adam: Meet Yogi β The Optimizer That Tames Noisy Gradients Try it on your next unstable training run
Developed by researchers at Google and Stanford, Yogi modifies Adam's adaptive learning rate mechanism to make it more robust to noisy gradients. or large-scale language models)