Llemma: An Open Language Model For Mathematics

16 October 2023

Computation and Language

Zhangir Azerbayev,

Hailey Schoelkopf,

Keiran Paster,

Marco Dos Santos,

Stephen McAleer,

Albert Q. Jiang,

Jia Deng,

Stella Biderman,

Sean Welleck

Llemma is pretrained on Proof-Pile-2, a new 55B token dataset for math

It improves on Code Llama, the model it initializes from

Llemma exceeds other available models on math benchmarks

It can solve problems using Python code and theorem provers

The models, data, and code are publicly released

This paper introduces Llemma, a large language model specialized for mathematical reasoning by continued pretraining on a mixture of scientific text, web pages about math, and mathematical code. Llemma outperforms other available models on benchmarks for mathematical problem solving. It can also use tools like Python and theorem provers.

