LLM-guided visual search

Published on:

21 December 2023

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Penghao Wu,

Saining Xie


Proposes LLM-guided visual search for MLLMs

Algorithm uses LLMs' world knowledge to guide search

Enables precise visual grounding in high-res images

Integrates search into an MLLM meta-architecture

New benchmark tests MLLM visual search ability

LLM-guided visual search

This paper proposes integrating a visual search capability into multimodal language models to help them focus on and locate key visual details when processing complex, high-resolution images. An LLM-guided algorithm is introduced that leverages world knowledge to efficiently search images. Combined with an MLLM, this facilitates precise visual grounding and reasoning.

