Paper Image

Language models as optimizers for vision-language models

Published on:

12 September 2023

Primary Category:

Computation and Language

Paper Authors:

Shihong Liu,

Samuel Yu,

Zhiqiu Lin,

Deepak Pathak,

Deva Ramanan

Bullets

Key Details

Proposes using LLMs like ChatGPT as black-box optimizers for VLMs like CLIP

Employs conversational feedback between the LLM and human to guide prompt search

Method finds interpretable prompts resembling those crafted by humans

Approach beats prior methods like CoOp and manual prompting in low-shot setting

Discovered prompts transfer better across CLIP architectures than alternatives

AI generated summary

Language models as optimizers for vision-language models

This paper proposes using large language models as black-box optimizers to find effective text prompts for adapting vision-language models to new tasks. The approach uses conversational feedback to guide an iterative search, outperforming prior methods.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up