Paper Image

Self-emerging token labeling for vision transformers

Published on:

8 January 2024

Primary Category:

Computer Vision and Pattern Recognition

Paper Authors:

Bingyin Zhao,

Zhiding Yu,

Shiyi Lan,

Yutao Cheng,

Anima Anandkumar,

Yingjie Lao,

Jose M. Alvarez


Key Details

Proposes self-emerging token labeling framework with vision transformer token labeler

Token labeler trained to produce semantic patch token labels

Student models trained using self-emerging token labels and original labels

Achieves SOTA accuracy and robustness on ImageNet

Also improves robustness in downstream tasks

AI generated summary

Self-emerging token labeling for vision transformers

This paper proposes a self-emerging token labeling framework to improve the pre-training of vision transformers. It contains two stages - first training a vision transformer token labeler to generate semantic token labels, then training a student model using both original labels and self-emerging token labels. The best model achieves state-of-the-art accuracy on ImageNet benchmarks and robustness against out-of-distribution data, significantly outperforming prior counterparts. Downstream tasks also show enhanced performance in robustness.

Answers from this paper


No comments yet, be the first to start the conversation...

Sign up to comment on this paper

Sign Up