Files
Obsidian/Paper/Open-Vocabulary Semantic Segmentation.md

808 B

Introduction

  • Two-stage approach
    • Method
      1. Generate class-agnostic mask proposal.
      2. Leverage pre-trained CLIP to perform open-vocabulary classification.
    • Assumption
      1. The model can generate classagnostic mask proposals.
      2. Pre-trained CLIP can transfer its classification performance to masked image proposals.
    • Examination
      1. Using ground-truth masks as region proposal.
      2. Feed masked images to a pre-trained CLIP for classification.
      3. Get mIoU of 20.1% on the ADE20K-150 dataset.

Vocabularies

  1. ground-truth masks: refer to the manually annotated masks or pixel-level labels that are used to define the correct segmentation of objects in an image. Each pixel in the ground-truth mask is assigned a specific class label corresponding to the object or region it belongs to.