Paper Image

Limits of Transformers on Algorithm Learning

Published on:

8 February 2024

Primary Category:

Machine Learning

Paper Authors:

Jonathan Thomm,

Aleksandar Terzic,

Geethan Karunaratne,

Giacomo Camposampiero,

Bernhard Schölkopf,

Abbas Rahimi


Key Details

Introduces two new discrete algorithm tasks requiring composition of sub-tasks

Finds very limited compositional learning in state-of-the-art Transformers

Shows sample efficiency worse than re-learning all sub-tasks

Presents theorem proving exponential inefficiency of gradient descent

AI generated summary

Limits of Transformers on Algorithm Learning

This paper analyzes the ability of Transformer language models to learn discrete algorithms through compositional tasks. The authors introduce two new tasks that require combining several sub-tasks. When training LLaMA models from scratch and prompting GPT-4 and Gemini, they find very limited compositional capabilities, with sample efficiency worse than re-learning sub-tasks. A theorem shows gradient descent on feedforward models can be exponentially inefficient for algorithmic learning.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up