Paper Image

Modality-aware representation learning for text-to-code generation

Published on:

8 February 2024

Primary Category:

Computation and Language

Paper Authors:

Fenia Christopoulou,

Guchun Zhang,

Gerasimos Lampouras

Bullets

Key Details

Proposes separating embeddings for NL and code tokens during pre-training

Introduces modality-relative training objectives tailored to text-to-code data

Evaluates on two models and datasets, showing consistent improvements

Measures gains with pass@k and a new incremental pass@k metric

AI generated summary

Modality-aware representation learning for text-to-code generation

This paper investigates separating the embedding spaces of natural language and code tokens during pre-training of text-to-code models. It hypothesizes that due to their precise semantics, code tokens like 'while' may not benefit from transfer learning from natural language usage. The authors experiment with modality-relative training objectives and embedding spaces on two models, consistently observing improvements in text-to-code generation quality on two datasets.

Answers from this paper

Comments

No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up