1.0 KiB
1.0 KiB
Introduction
- Two-stage approach
- Method
- Generate class-agnostic mask proposal.
- Leverage pre-trained CLIP to perform open-vocabulary classification.
- Assumption
- The model can generate classagnostic mask proposals.
- Pre-trained CLIP can transfer its classification performance to masked image proposals.
- Examination
- Using ground-truth masks as region proposal.
- Feed masked images to a pre-trained CLIP for classification.
- Get mIoU of 20.1% on the ADE20K-150 dataset.
- Use MaskFormer(a mask proposal generator trained on COCO) as an region proposal generator.
- Select the region proposals with highest overlap with ground-truth masks.
- Assign the object label to this region.
- This model reach mIoU of 66.5%(despite)
- Method
Vocabularies
- ground-truth masks: refer to the manually annotated masks or pixel-level labels that are used to define the correct segmentation of objects in an image. Each pixel in the ground-truth mask is assigned a specific class label corresponding to the object or region it belongs to.