Published on:
8 May 2024
Primary Category:
Machine Learning
Paper Authors:
Matt Schoenbauer,
Daniele Moro,
Lukasz Lew,
Andrew Howard
Many proposed gradient estimators are equivalent to the straight-through estimator
Equivalence holds after adjusting learning rate and weight initialization
Result applies for both SGD and Adam optimization
Shown to apply for small CNNs and large ResNets
Concern about 'gradient error' is unfounded based on this
Quantized neural network training equivalence
This paper proves that many proposed complex gradient estimators for quantized neural networks are equivalent to simpler estimators like the straight-through estimator. After adjustments to the learning rate and weight initialization, models using complex estimators train almost identically to those using the straight-through estimator.
Learning non-linear functions in two-layer neural networks with a single gradient step
Permutation symmetries enable linear connectivity in Bayesian neural networks
Gradient descent with large learning rates finds flat minima for simple neural networks
Training neural networks efficiently with implicit gradients
Balancing weights in neural networks through gradient noise
Optimizing neural network hardware accelerators
No comments yet, be the first to start the conversation...
Sign up to comment on this paper