Generalizing morpheme glossing models for endangered languages

Published on:

5 November 2023

Computation and Language

Michael Ginn,

Alexis Palmer


Models tested on texts of unseen genres to evaluate generalization ability

Weight decay optimization improved out-of-distribution performance

Output denoising helped handle unknown morphemes

Iterative pseudo-labeling further adapted models to new domains

Generalizing morpheme glossing models for endangered languages

This paper investigates strategies to improve the ability of neural models to accurately predict grammatical gloss labels for morphemes in texts of an endangered language. The models are evaluated on texts of genres not seen during training, to test their generalization ability. Techniques like weight decay, output denoising, and iterative pseudo-labeling are applied, achieving a 2% improvement in performance on out-of-distribution test data.

