Obsidian/Paper/Open-Vocabulary Semantic Segmentation.md at f1bf108986da54030d8fd50952ea7ffffce0a016 - Obsidian - Rain&Bus

Rain-Bus/Obsidian

Files

RainBus f1bf108986 vault backup: 2023-12-19 20:16:21

2023-12-19 20:16:21 +08:00

1.0 KiB

Raw Blame History

Introduction

Two-stage approach
- Method
  1. Generate class-agnostic mask proposal.
  2. Leverage pre-trained CLIP to perform open-vocabulary classification.
- Assumption
  1. The model can generate classagnostic mask proposals.
  2. Pre-trained CLIP can transfer its classification performance to masked image proposals.
- Examination
  1. Using ground-truth masks as region proposal.
  2. Feed masked images to a pre-trained CLIP for classification.
  3. Get mIoU of 20.1% on the ADE20K-150 dataset.
  4. Use MaskFormer(a mask proposal generator trained on COCO) as an region proposal generator.
  5. Select the region proposals with highest overlap with ground-truth masks.
  6. Assign the object label to this region.
  7. This model reach mIoU of 66.5%(despite)

Vocabularies

ground-truth masks: refer to the manually annotated masks or pixel-level labels that are used to define the correct segmentation of objects in an image. Each pixel in the ground-truth mask is assigned a specific class label corresponding to the object or region it belongs to.