Paper Image

The Remarkable Rise of Open Source Large Language Models: Introducing LLaMA

Published on:

27 February 2023

Primary Category:

Computation and Language

Paper Authors:

Hugo Touvron,

Thibaut Lavril,

Gautier Izacard,

Xavier Martinet,

Marie-Anne Lachaux,

Timothée Lacroix,

Baptiste Rozière,

Naman Goyal,

Eric Hambro,

Faisal Azhar,

Aurelien Rodriguez,

Armand Joulin,

Edouard Grave,

Guillaume Lample

Bullets

Key Details

LLaMA models range from 7B to 65B parameters, trained on up to 1.4T tokens

LLaMA-13B outperforms GPT-3 on most benchmarks despite being 10x smaller

LLaMA-65B is competitive with state-of-the-art models like PaLM-540B and Chinchilla-70B

LLaMA trained exclusively on publicly available data, unlike other large models

Smaller LLaMA models can match larger models given sufficient training data

Paper examines model toxicity, bias, carbon footprint, and release models openly

AI generated summary

The Remarkable Rise of Open Source Large Language Models: Introducing LLaMA

This paper introduces LLaMA, a series of large language models trained on massive text corpora. The models range from 7B to 65B parameters and achieve state-of-the-art performance on many NLP benchmarks, despite using only publicly available data. A key finding is that smaller models can match larger models if trained on more data. For example, the 13B LLaMA outperforms the 175B GPT-3 on most tasks. The paper also examines model toxicity, bias, and carbon footprint.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up