Evaluating and Reducing Hallucinations in Vision-Language Models

Paper Title:

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models

Published on:

8 May 2024

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Prannay Kaul,

Zhizhong Li,

Hao Yang,

Yonatan Dukler,

Ashwin Swaminathan,

C. J. Taylor,

Stefano Soatto

Bullets

Key Details

•

Proposes THRONE benchmark to evaluate Type I hallucinations in open-ended LVLM responses

•

Uses language models to accurately identify hallucinations without hand-crafted rules

•

Shows that reducing Type II hallucinations may not reduce Type I hallucinations

•

Demonstrates limitations of existing Type I hallucination benchmarks

•

Introduces simple and effective data augmentation to reduce both Type I and Type II hallucinations

Explore the topics in this paper

data augmentation

evaluation

hallucinations

language models

vision-language models

AI generated summary

Evaluating and Reducing Hallucinations in Vision-Language Models

The paper proposes THRONE, a new benchmark to evaluate 'Type I' hallucinations (in open-ended responses) in large vision-language models (LVLMs). It utilizes language models to identify hallucinations and introduces metrics to quantify them. The paper demonstrates that reducing 'Type II' hallucinations (in responses to specific questions) does not reduce Type I hallucinations, and that existing methods for evaluating Type I hallucinations are limited. Finally, a simple data augmentation method is introduced that reduces both Type I and Type II hallucinations in LVLMs.