Self-supervised image retrieval with open instructions

Paper Title:

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

Published on:

28 March 2024

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Kai Zhang,

Yi Luan,

Hexiang Hu,

Kenton Lee,

Siyuan Qiao,

Wenhu Chen,

Yu Su,

Ming-Wei Chang

Bullets

Key Details

•

Image pairs on the same web pages have diverse relations

•

Text instructions make implicit relations explicit

•

Trained on 36.7M triplets mined from the web

•

Outperforms prior work with 50x fewer parameters

•

Succeeds on complex search intents

Explore the topics in this paper

image retrieval

language models

self-supervised learning

text instructions

web mining

AI generated summary

Self-supervised image retrieval with open instructions

This paper introduces MagicLens, a series of self-supervised image retrieval models that can follow open-ended text instructions to find relevant images. The key insight is that image pairs naturally co-occurring on web pages contain diverse implicit relations beyond visual similarity. By using large language models to make those relations explicit as text instructions, rich training data is created. Experiments show MagicLens matches or exceeds prior state-of-the-art on multiple benchmarks, with 50x fewer parameters on some. Additional analysis finds it succeeds on complex search intents missed by prior methods.