Paper Image

Efficient attention computation for transformers

Published on:

8 May 2024

Primary Category:

Machine Learning

Paper Authors:

Jiuxiang Gu,

Yingyu Liang,

Heshan Liu,

Zhenmei Shi,

Zhao Song,

Junze Yin

Bullets

Key Details

Proposes conv basis to decompose attention matrices into convolution matrices

Shows any attention matrix can be decomposed this way for efficient FFT computation

Achieves near linear-time attention inference without changing model parameters

Also accelerates attention training forward pass and backward gradient

May enable transformer application to much longer input contexts

AI generated summary

Efficient attention computation for transformers

This paper develops a convolution-based method to efficiently approximate attention in transformers, reducing the quadratic complexity to nearly linear. It shows any attention matrix can be decomposed into convolution matrices, which enables fast Fourier transform for faster computation without changing model parameters.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up