Paper Title:

Llemma: An Open Language Model For Mathematics

Published on:

16 October 2023

Primary Category:

Computation and Language

Paper Authors:

Zhangir Azerbayev,

Hailey Schoelkopf,

Keiran Paster,

Marco Dos Santos,

Stephen McAleer,

Albert Q. Jiang,

Jia Deng,

Stella Biderman,

Sean Welleck

•

Llemma is pretrained on Proof-Pile-2, a new 55B token dataset for math

•

It improves on Code Llama, the model it initializes from

•

Llemma exceeds other available models on math benchmarks

•

It can solve problems using Python code and theorem provers

•

The models, data, and code are publicly released

Language model for math

This paper introduces Llemma, a large language model specialized for mathematical reasoning by continued pretraining on a mixture of scientific text, web pages about math, and mathematical code. Llemma outperforms other available models on benchmarks for mathematical problem solving. It can also use tools like Python and theorem provers.

Enhancing math reasoning in language models with code execution

Evaluating LLMs' Reasoning Abilities

The Remarkable Rise of Open Source Large Language Models: Introducing LLaMA

Enabling LLMs to solve math problems with code

Progressive LLaMA with block expansion

Language models with natural language and programming capabilities

No comments yet, be the first to start the conversation...

Sign up to comment on this paper