Quantized neural network training equivalence

Paper Title:

Custom Gradient Estimators are Straight-Through Estimators in Disguise

Published on:

8 May 2024

Primary Category:

Machine Learning

Paper Authors:

Matt Schoenbauer,

Daniele Moro,

Lukasz Lew,

Andrew Howard

Bullets

Key Details

•

Many proposed gradient estimators are equivalent to the straight-through estimator

•

Equivalence holds after adjusting learning rate and weight initialization

•

Result applies for both SGD and Adam optimization

•

Shown to apply for small CNNs and large ResNets

•

Concern about 'gradient error' is unfounded based on this

Explore the topics in this paper

gradient estimators

network training

quantized neural networks

AI generated summary

Quantized neural network training equivalence

This paper proves that many proposed complex gradient estimators for quantized neural networks are equivalent to simpler estimators like the straight-through estimator. After adjustments to the learning rate and weight initialization, models using complex estimators train almost identically to those using the straight-through estimator.