Text and image guided image editing

28 March 2024

Computer Vision and Pattern Recognition

Yulin Pan,

Chaojie Mao,

Zeyinzi Jiang,

Zhen Han,

Jingfeng Zhang


Employs noise concatenation for precise region editing

Uses decoupled cross-attention for multi-modal guidance

Introduces RefineNet to supplement subject details

Constructs training data from images using CV models

Excels at identity and text consistency

Text and image guided image editing

This paper presents a new approach called LAR-Gen that enables seamless editing of masked areas in images using both text prompts and reference images as guidance. It uses a coarse-to-fine pipeline to ensure fidelity.

