Paper Image

Training language models with specialized skills

Published on:

12 March 2024

Primary Category:

Computation and Language

Paper Authors:

Sainbayar Sukhbaatar,

Olga Golovneva,

Vasu Sharma,

Hu Xu,

Xi Victoria Lin,

Baptiste Rozière,

Jacob Kahn,

Daniel Li,

Wen-tau Yih,

Jason Weston,

Xian Li


Key Details

Starts with a base language model then branches copies to train as specialized experts

Experts are trained in parallel on domain-specific datasets like math and code

After that, experts are combined into a mixture-of-experts model

A router learns to send tokens to the most relevant expert at each layer

Outperforms alternatives in accuracy and efficiency

AI generated summary

Training language models with specialized skills

This paper proposes a method to improve large language models' capabilities in specialized domains like math, code, and world knowledge. It trains copies of a base model on different datasets, then combines them into a single mixture-of-experts model that routes tokens to the most relevant expert. This improves efficiency and performance.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up