vault backup: 2025-02-21 15:27:04

This commit is contained in:
2025-02-21 15:27:04 +08:00
parent a3bcf83819
commit 63f81386a9
32 changed files with 3033 additions and 16036 deletions

View File

@@ -1,4 +0,0 @@
# TODO
- How to train encoder with the opposite modality frozen encoder?
- How to generate high level prompt?
- Why add the category-agnostic vector as global prompt?

View File

@@ -1,21 +0,0 @@
## Introduction
- Two-stage approach
- Method
1. Generate class-agnostic mask proposal.
2. Leverage pre-trained CLIP to perform open-vocabulary classification.
- Assumption
1. The model can generate classagnostic mask proposals.
2. Pre-trained CLIP can transfer its classification performance to masked image proposals.
- Examination
1. Using ground-truth masks as region proposal.
2. Feed masked images to a pre-trained CLIP for classification.
3. Get mIoU of 20.1% on the ADE20K-150 dataset.
4. Use MaskFormer(a mask proposal generator trained on COCO) as an region proposal generator.
5. Select the region proposals with highest overlap with ground-truth masks.
6. Assign the object label to this region.
7. This model reach mIoU of 66.5%. (despite imperfect region proposal)
- Conclusion
Pre-trained CLIP not performed well over masked images, we hypothesize that CLIP trained on natural image which are not cropped or noised by segmentation masks.
## Vocabularies
1. ground-truth masks: refer to the manually annotated masks or pixel-level labels that are used to define the correct segmentation of objects in an image. Each pixel in the ground-truth mask is assigned a specific class label corresponding to the object or region it belongs to.

View File

@@ -0,0 +1,5 @@
---
CreateAt: 2025-02-21
ModleName: CoAPT
Repo: https://github.com/LeeGun4488/CoAPT
---