Paper Image

Enhancing detection with synthetic images

Published on:

8 February 2024

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Chengjian Feng,

Yujie Zhong,

Zequn Jie,

Weidi Xie,

Lin Ma


Key Details

InstaGen integrates grounding into diffusion models to synthesize labeled images

Supervised pre-training aligns text and visual features on base categories

Self-training extends alignment to novel categories

As a data synthesizer, InstaGen boosts open-vocab (+4.5 AP) and data-sparse (+1.2-5.2 AP) detection

It outperforms state-of-the-art CLIP-based methods

AI generated summary

Enhancing detection with synthetic images

This paper introduces InstaGen, a framework to generate synthetic images with object bounding boxes for arbitrary categories. An instance-level grounding module is integrated into a diffusion model to align text embeddings of category names with visual features and infer bounding box coordinates. Through supervised pre-training on base categories and self-training on novel categories, InstaGen serves as a data synthesizer to enhance object detectors. Experiments demonstrate superior performance over state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2-5.2 AP) detection.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up