Paper Image

Self-supervised image retrieval with open instructions

Published on:

28 March 2024

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Kai Zhang,

Yi Luan,

Hexiang Hu,

Kenton Lee,

Siyuan Qiao,

Wenhu Chen,

Yu Su,

Ming-Wei Chang


Key Details

Image pairs on the same web pages have diverse relations

Text instructions make implicit relations explicit

Trained on 36.7M triplets mined from the web

Outperforms prior work with 50x fewer parameters

Succeeds on complex search intents

AI generated summary

Self-supervised image retrieval with open instructions

This paper introduces MagicLens, a series of self-supervised image retrieval models that can follow open-ended text instructions to find relevant images. The key insight is that image pairs naturally co-occurring on web pages contain diverse implicit relations beyond visual similarity. By using large language models to make those relations explicit as text instructions, rich training data is created. Experiments show MagicLens matches or exceeds prior state-of-the-art on multiple benchmarks, with 50x fewer parameters on some. Additional analysis finds it succeeds on complex search intents missed by prior methods.

