diff --git a/Paper/Open-Vocabulary Semantic Segmentation.md b/Paper/Open-Vocabulary Semantic Segmentation.md index 322fd20..75d6367 100644 --- a/Paper/Open-Vocabulary Semantic Segmentation.md +++ b/Paper/Open-Vocabulary Semantic Segmentation.md @@ -13,8 +13,9 @@ 4. Use MaskFormer(a mask proposal generator trained on COCO) as an region proposal generator. 5. Select the region proposals with highest overlap with ground-truth masks. 6. Assign the object label to this region. - 7. This model reach mIoU of 66.5%(despite) - + 7. This model reach mIoU of 66.5%. (despite imperfect region proposal) + - Conclusion + Pre-trained CLIP not performed well over masked images, we hypothesize that CLIP trained on natural image which are not cropped or noised by segmentation masks. ## Vocabularies 1. ground-truth masks: refer to the manually annotated masks or pixel-level labels that are used to define the correct segmentation of objects in an image. Each pixel in the ground-truth mask is assigned a specific class label corresponding to the object or region it belongs to. \ No newline at end of file