diff --git a/Paper/Open-Vocabulary Semantic Segmentation.md b/Paper/Open-Vocabulary Semantic Segmentation.md
index 322fd20..75d6367 100644
--- a/Paper/Open-Vocabulary Semantic Segmentation.md	
+++ b/Paper/Open-Vocabulary Semantic Segmentation.md	
@@ -13,8 +13,9 @@
 		4. Use MaskFormer(a mask proposal generator trained on COCO) as an region proposal generator.
 		5. Select the region proposals with highest overlap with ground-truth masks.
 		6. Assign the object label to this region.
-		7. This model reach mIoU of 66.5%(despite)
-
+		7. This model reach mIoU of 66.5%. (despite imperfect region proposal)
+	- Conclusion
+	  Pre-trained CLIP not performed well over masked images, we hypothesize that CLIP trained on natural image which are not cropped or noised by segmentation masks.
 
 ## Vocabularies
 1. ground-truth masks: refer to the manually annotated masks or pixel-level labels that are used to define the correct segmentation of objects in an image. Each pixel in the ground-truth mask is assigned a specific class label corresponding to the object or region it belongs to.
\ No newline at end of file