8 February 2024
Proposes PHATGOOSE method for routing tokens to experts
Experts are from parameter-efficient fine-tuning
Routing is based on learned gates for each module
Outperforms past routing methods
Sometimes matches multitask training performance
Routing Language Models to Specialized Experts
This paper explores improving zero-shot generalization by routing tokens within a language model to different specialized expert modules at each layer. Their method, PHATGOOSE, trains routing gates for each expert module that determine which tokens should use that module. Experiments find PHATGOOSE outperforms past routing methods and sometimes matches multitask training.
No comments yet, be the first to start the conversation...
Sign up to comment on this paper