Published on:
30 November 2023
Primary Category:
Computer Vision and Pattern Recognition
Paper Authors:
Yau Shing Jonathan Cheung,
Xi Chen,
Lihe Yang,
Hengshuang Zhao
Attention features from self-supervised vision transformers have strong foreground/background differences
Cluster attention features at dataset, category, and image levels
Ensure consistency between clustering levels to extract high-quality pseudo-masks
Refine masks and perform class assignment using vision transformer outputs
Achieves state-of-the-art segmentation performance with low computation cost
Lightweight clustering for semantic segmentation
This paper proposes a lightweight clustering framework to perform semantic segmentation without labels. It utilizes attention features from self-supervised vision transformers, which have strong foreground/background differences. These features are clustered into groups at the dataset, category, and image levels. Consistency across levels extracts high-quality binary pseudo-masks separating foreground/background. The masks are refined and class assignment uses vision transformer outputs. This achieves state-of-the-art performance on PASCAL VOC and COCO with low computation cost.
Self-supervised object discovery in videos
Self-supervised video object segmentation via attention
Weak Supervision for Semantic Segmentation in Driving Scenes
Vision Transformer for Semantic Image Compression
Learning semi-supervised classification across varied tasks
Efficient attention model for scene parsing
No comments yet, be the first to start the conversation...
Sign up to comment on this paper