Paper Image

Modular speech translation

Published on:

5 October 2023

Primary Category:

Computation and Language

Paper Authors:

Paul-Ambroise Duquenne,

Holger Schwenk,

Benoît Sagot


Key Details

Uses modular encoders and decoders for speech and text

Enables zero-shot cross-modal speech translation

Trains modules to fit a shared embedding space

Shows gains from multilingual training

Outperforms supervised approach on some languages

AI generated summary

Modular speech translation

This paper shows that independently trained speech and text modules can be combined to enable competitive zero-shot cross-modal speech translation. The key ideas are: 1) using a shared fixed-size sentence embedding space, 2) training encoders and decoders separately, 3) enabling cross-lingual transfer via multilingual training. The method even outperforms supervised approaches on some languages.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up