Published on:
8 May 2024
Primary Category:
Computer Vision and Pattern Recognition
Paper Authors:
Prannay Kaul,
Zhizhong Li,
Hao Yang,
Yonatan Dukler,
Ashwin Swaminathan,
C. J. Taylor,
Stefano Soatto
Proposes THRONE benchmark to evaluate Type I hallucinations in open-ended LVLM responses
Uses language models to accurately identify hallucinations without hand-crafted rules
Shows that reducing Type II hallucinations may not reduce Type I hallucinations
Demonstrates limitations of existing Type I hallucination benchmarks
Introduces simple and effective data augmentation to reduce both Type I and Type II hallucinations
Evaluating and Reducing Hallucinations in Vision-Language Models
The paper proposes THRONE, a new benchmark to evaluate 'Type I' hallucinations (in open-ended responses) in large vision-language models (LVLMs). It utilizes language models to identify hallucinations and introduces metrics to quantify them. The paper demonstrates that reducing 'Type II' hallucinations (in responses to specific questions) does not reduce Type I hallucinations, and that existing methods for evaluating Type I hallucinations are limited. Finally, a simple data augmentation method is introduced that reduces both Type I and Type II hallucinations in LVLMs.
Hallucinations in Large Language Models
Hallucination in Multimodal Language Models
Using retrieval to detect hallucinations in AI answers
Benchmarking vision-language models on visual illusion and knowledge hallucination
Detecting LLM Hallucinations via Internal States
Detecting hallucinations in language models
No comments yet, be the first to start the conversation...
Sign up to comment on this paper