Paper Image

Text-to-image generation across languages

Published on:

19 August 2023

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Fulong Ye,

Guang Liu,

Xinya Wu,

Ledell Wu


Key Details

The paper introduces AltDiffusion, a new multilingual text-to-image model supporting 18 languages

A multilingual text encoder was trained and incorporated into a pretrained English diffusion model

A two-stage training approach aligned the text encoder and image model

New datasets MG-18 and MC-18 were introduced to evaluate quality and cultural concepts

AltDiffusion outperformed other multilingual models and translation-based Stable Diffusion

AI generated summary

Text-to-image generation across languages

This paper introduces a new multilingual text-to-image model called AltDiffusion that can generate images from text prompts in 18 different languages. The authors trained a multilingual text encoder and incorporated it into a pretrained English-only diffusion model. They used a two-stage training approach to align the text encoder with the image generation model, first focusing on concept alignment using a large multilingual dataset, then fine-tuning for quality improvement. To evaluate, they introduced two new datasets: one for general quality (MG-18), and one for culture-specific concepts (MC-18). Experiments showed AltDiffusion outperformed other multilingual models in metrics like FID and CLIP score, and beat translation-based Stable Diffusion in understanding cultural concepts. The work demonstrates state-of-the-art multilingual text-to-image capabilities.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up