Paper Image

Evaluating and Reducing Hallucinations in Vision-Language Models

Published on:

8 May 2024

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Prannay Kaul,

Zhizhong Li,

Hao Yang,

Yonatan Dukler,

Ashwin Swaminathan,

C. J. Taylor,

Stefano Soatto

Bullets

Key Details

Proposes THRONE benchmark to evaluate Type I hallucinations in open-ended LVLM responses

Uses language models to accurately identify hallucinations without hand-crafted rules

Shows that reducing Type II hallucinations may not reduce Type I hallucinations

Demonstrates limitations of existing Type I hallucination benchmarks

Introduces simple and effective data augmentation to reduce both Type I and Type II hallucinations

AI generated summary

Evaluating and Reducing Hallucinations in Vision-Language Models

The paper proposes THRONE, a new benchmark to evaluate 'Type I' hallucinations (in open-ended responses) in large vision-language models (LVLMs). It utilizes language models to identify hallucinations and introduces metrics to quantify them. The paper demonstrates that reducing 'Type II' hallucinations (in responses to specific questions) does not reduce Type I hallucinations, and that existing methods for evaluating Type I hallucinations are limited. Finally, a simple data augmentation method is introduced that reduces both Type I and Type II hallucinations in LVLMs.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up