Routing Language Models to Specialized Experts

8 February 2024

Machine Learning

Mohammed Muqeeth,

Haokun Liu,

Yufan Liu,

Colin Raffel


Proposes PHATGOOSE method for routing tokens to experts

Experts are from parameter-efficient fine-tuning

Routing is based on learned gates for each module

Outperforms past routing methods

Sometimes matches multitask training performance

This paper explores improving zero-shot generalization by routing tokens within a language model to different specialized expert modules at each layer. Their method, PHATGOOSE, trains routing gates for each expert module that determine which tokens should use that module. Experiments find PHATGOOSE outperforms past routing methods and sometimes matches multitask training.

