Efficient attention computation for transformers

Paper Title:

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

Published on:

8 May 2024

Primary Category:

Machine Learning

Paper Authors:

Jiuxiang Gu,

Yingyu Liang,

Heshan Liu,

Zhenmei Shi,

Zhao Song,

Junze Yin

Bullets

Key Details

•

Proposes conv basis to decompose attention matrices into convolution matrices

•

Shows any attention matrix can be decomposed this way for efficient FFT computation

•

Achieves near linear-time attention inference without changing model parameters

•

Also accelerates attention training forward pass and backward gradient

•

May enable transformer application to much longer input contexts

Explore the topics in this paper

approximation methods

attention mechanisms

computational complexity

convolutional neural networks

transformer models

AI generated summary

Efficient attention computation for transformers

This paper develops a convolution-based method to efficiently approximate attention in transformers, reducing the quadratic complexity to nearly linear. It shows any attention matrix can be decomposed into convolution matrices, which enables fast Fourier transform for faster computation without changing model parameters.