Published on:
2 May 2024
Primary Category:
Computation and Language
Paper Authors:
Sheng-Chieh Lin,
Luyu Gao,
Barlas Oguz,
Wenhan Xiong,
Jimmy Lin,
Wen-tau Yih,
Xilun Chen
Standard alignment methods may encourage language models to hallucinate more
Training models on unfamiliar data introduces unknown facts that lead models to fabricate claims
Reward functions that prefer very detailed responses also increase false claims
The proposed approach elicits knowledge from the model itself to reduce unfamiliar information
It uses separate factuality and instruction following rewards to balance the tradeoff
Factual language model alignment
This paper studies how to align language models to follow instructions while reducing false claims. It finds that standard alignment methods can increase hallucination by training models on unfamiliar data or rewarding very detailed responses. The authors propose methods to make alignment more factual, by eliciting knowledge from the model itself and using separate rewards for factuality and instruction following.
Detecting and reducing false information from language models
Mitigating Hallucination in Language Models for Medical QA
Using retrieval to detect hallucinations in AI answers
Detecting False Information in AI Models
Hallucination in Multimodal Language Models
Evaluating language models on following non-standard instructions
No comments yet, be the first to start the conversation...
Sign up to comment on this paper