Paper Image

Factual language model alignment

Published on:

2 May 2024

Primary Category:

Computation and Language

Paper Authors:

Sheng-Chieh Lin,

Luyu Gao,

Barlas Oguz,

Wenhan Xiong,

Jimmy Lin,

Wen-tau Yih,

Xilun Chen

Bullets

Key Details

Standard alignment methods may encourage language models to hallucinate more

Training models on unfamiliar data introduces unknown facts that lead models to fabricate claims

Reward functions that prefer very detailed responses also increase false claims

The proposed approach elicits knowledge from the model itself to reduce unfamiliar information

It uses separate factuality and instruction following rewards to balance the tradeoff

AI generated summary

Factual language model alignment

This paper studies how to align language models to follow instructions while reducing false claims. It finds that standard alignment methods can increase hallucination by training models on unfamiliar data or rewarding very detailed responses. The authors propose methods to make alignment more factual, by eliciting knowledge from the model itself and using separate rewards for factuality and instruction following.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up