Paper Image

LLM-guided visual search

Published on:

21 December 2023

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Penghao Wu,

Saining Xie


Key Details

Proposes LLM-guided visual search for MLLMs

Algorithm uses LLMs' world knowledge to guide search

Enables precise visual grounding in high-res images

Integrates search into an MLLM meta-architecture

New benchmark tests MLLM visual search ability

AI generated summary

LLM-guided visual search

This paper proposes integrating a visual search capability into multimodal language models to help them focus on and locate key visual details when processing complex, high-resolution images. An LLM-guided algorithm is introduced that leverages world knowledge to efficiently search images. Combined with an MLLM, this facilitates precise visual grounding and reasoning.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up