Release of PromptSRC with pretrained models.
This commit is contained in:
22
LICENSE
Normal file
22
LICENSE
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
MIT License
|
||||||
|
Copyright (c) 2023 Muhammad Uzair Khattak
|
||||||
|
Copyright (c) 2022 Muhammad Uzair Khattak
|
||||||
|
Copyright (c) 2021 Kaiyang Zhou
|
||||||
|
|
||||||
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||||
|
of this software and associated documentation files (the "Software"), to deal
|
||||||
|
in the Software without restriction, including without limitation the rights
|
||||||
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||||
|
copies of the Software, and to permit persons to whom the Software is
|
||||||
|
furnished to do so, subject to the following conditions:
|
||||||
|
|
||||||
|
The above copyright notice and this permission notice shall be included in all
|
||||||
|
copies or substantial portions of the Software.
|
||||||
|
|
||||||
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||||
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||||
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||||
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||||
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||||
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||||
|
SOFTWARE.
|
||||||
158
README.md
Normal file
158
README.md
Normal file
@@ -0,0 +1,158 @@
|
|||||||
|
# Self-regulating Prompts: Foundational Model Adaptation without Forgetting
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
> [**Self-regulating Prompts: Foundational Model Adaptation without Forgetting**]()<br>
|
||||||
|
> [Muhammad Uzair Khattak*](https://muzairkhattak.github.io/), [Syed Talal Wasim*](https://talalwasim.github.io), [Muzammal Naseer](https://scholar.google.com/citations?user=tM9xKA8AAAAJ&hl=en&oi=ao), [Salman Khan](https://salman-h-khan.github.io/), [Ming-Hsuan Yang](http://faculty.ucmerced.edu/mhyang/), [Fahad Shahbaz Khan](https://scholar.google.es/citations?user=zvaeYnUAAAAJ&hl=en)
|
||||||
|
|
||||||
|
*Joint first authors
|
||||||
|
|
||||||
|
[]()
|
||||||
|
[](https://muzairkhattak.github.io/PromptSRC/)
|
||||||
|
[](https://drive.google.com/file/d/1d14q8hhAl6qGsiPYpNIVfShMCulVJSUa/view?usp=sharing)
|
||||||
|
|
||||||
|
|
||||||
|
Official implementation of the paper "[Self-regulating Prompts: Foundational Model Adaptation without Forgetting](https://arxiv.org/abs/2210.03117)".
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
# :rocket: News
|
||||||
|
* **(July 12, 2023)**
|
||||||
|
* Pre-trained models and evaluation codes for reproducing PromptSRC official benchmark results are released.
|
||||||
|
* Training codes for [PromptSRC](configs/trainers/PromptSRC) are released.
|
||||||
|
* This repository also supports [MaPle (CVPR'23)](configs/trainers/MaPLe),
|
||||||
|
[CoOp (IJCV'22)](configs/trainers/CoOp), [Co-CoOp (CVPR'22)](configs/trainers/CoCoOp)
|
||||||
|
architectures.
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
## Highlights
|
||||||
|
|
||||||
|

|
||||||
|
> <p align="justify"> <b> <span style="color: blue;">Left</span></b>:
|
||||||
|
> Existing prompt learning approaches for foundational Vision-Language models like CLIP rely on task-specific objectives that restrict
|
||||||
|
> prompt learning to learn a feature space suitable only for downstream tasks and
|
||||||
|
> consequently lose the generalized knowledge of CLIP (shown in <span style="color: purple;">purple</span></b>).
|
||||||
|
> Our self-regulating framework explicitly guides the training trajectory of prompts
|
||||||
|
> towards the closest point between two optimal solution manifolds (solid line) to
|
||||||
|
> learn task-specific representations while also retaining generalized CLIP knowledge
|
||||||
|
> (shown in <span style="color: green;">green</span>). <b><span style="color: blue;">Middle</span></b>: Averaged
|
||||||
|
> across 11 image recognition datasets, PromptSRC surpasses existing methods on the
|
||||||
|
> base-to-novel generalization setting. <b><span style="color: blue;">Right</span></b>: We evaluate
|
||||||
|
> our approach on four diverse image recognition benchmarks for CLIP and show
|
||||||
|
> consistent gains over previous state-of-the-art approaches. </p>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
> **<p align="justify"> Abstract:** *Prompt learning has emerged as an efficient alternative
|
||||||
|
> for fine-tuning foundational models, such as CLIP, for various downstream tasks.
|
||||||
|
> Conventionally trained using the task-specific objective, i.e., cross-entropy loss,
|
||||||
|
> prompts tend to overfit downstream data distributions and find it challenging to capture
|
||||||
|
> task-agnostic general features from the frozen CLIP. This leads to the loss of the model's
|
||||||
|
> original generalization capability. To address this issue, our work introduces a
|
||||||
|
> self-regularization framework for prompting called PromptSRC (Prompting with Self-regulating
|
||||||
|
> Constraints). PromptSRC guides the prompts to optimize for both task-specific and task-agnostic
|
||||||
|
> general representations using a three-pronged approach by: (a) regulating {prompted}
|
||||||
|
> representations via mutual agreement maximization with the frozen model, (b) regulating
|
||||||
|
> with self-ensemble of prompts over the training trajectory to encode their complementary
|
||||||
|
> strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance
|
||||||
|
> with the visual branch. To the best of our knowledge, this is the first regularization
|
||||||
|
> framework for prompt learning that avoids overfitting by jointly attending to pre-trained
|
||||||
|
> model features, the training trajectory during prompting, and the textual diversity.
|
||||||
|
> PromptSRC explicitly steers the prompts to learn a representation space that maximizes
|
||||||
|
> performance on downstream tasks without compromising CLIP generalization. We perform
|
||||||
|
> experiments on 4 benchmarks where PromptSRC performs favorably well compared
|
||||||
|
> to the existing methods. Our code and pre-trained models are publicly available.* </p>
|
||||||
|
|
||||||
|
## Regularization Framework for Prompt Learning
|
||||||
|
|
||||||
|
We propose PromptSRC (Prompting with Self-regulating Constraints) which steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization.
|
||||||
|
|
||||||
|
**Key components of PromptSRC:**
|
||||||
|
1) **Mutual agreement maximization:** PromptSRC explicitly guides the prompts to jointly acquire both <i>task-specific knowledge</i> and <i>task-agnostic generalized knowledge</i> by maximizing the mutual agreement between prompted and features of the frozen VL model.
|
||||||
|
2) **Gaussian weighted prompt aggregation:** We propose a weighted self-ensembling strategy for prompts over the training trajectory that captures complementary features and enhances their generalization abilities.
|
||||||
|
3) **Textual diversity:** PromptSRC regulates prompts with textual diversity to mitigate sample diversity imbalance compared to the visual branch during training.
|
||||||
|
|
||||||
|
|
||||||
|
## :ballot_box_with_check: Supported Methods
|
||||||
|
|
||||||
|
| Method | Paper | Configs | Training Scripts |
|
||||||
|
|---------------------------|:----------------------------------------------|:---------------------------------------------------------------:|:-------------------------------:|
|
||||||
|
| PromptSRC | [arXiv]() | [link](configs/trainers/PromptSRC/) | [link](scripts/promptsrc) |
|
||||||
|
| Independent V-L Prompting | - | [link](configs/trainers/IVLP/) | [link](scripts/independent-vlp) |
|
||||||
|
| MaPLe | [CVPR 2023](https://arxiv.org/abs/2210.03117) | [link](configs/trainers/CoOp) | [link](scripts/maple) |
|
||||||
|
| CoOp | [IJCV 2022](https://arxiv.org/abs/2109.01134) | [link](configs/trainers/CoOp) | [link](scripts/coop) |
|
||||||
|
| Co-CoOp | [CVPR 2022](https://arxiv.org/abs/2203.05557) | [link](configs/trainers/CoCoOp) | [link](scripts/cocoop) |
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
## Results
|
||||||
|
Results reported below show accuracy for base and novel classes for across 11 recognition datasets averaged over 3 seeds.
|
||||||
|
|
||||||
|
|
||||||
|
### Effectiveness of PromptSRC in comparison with baseline Independent V-L Prompting
|
||||||
|
PromptSRC effectively maximizes supervised task performance (base classes) without compromising on CLIP's original generalization towards new unseen tasks (novel classes).
|
||||||
|
|
||||||
|
| Name | Base Acc. | Novel Acc. | HM | Epochs |
|
||||||
|
|---------------------------------------------------------------------------------|:---------:|:----------:|:---------:|:------:|
|
||||||
|
| CLIP | 69.34 | 74.22 | 71.70 | - |
|
||||||
|
| Independent V-L Prompting | 84.21 | 71.79 | 77.51 | 20 |
|
||||||
|
| PromptSRC (ours) | **84.26** | **76.10** | **79.97** | 20 |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### PromptSRC in comparison with existing state-of-the-art
|
||||||
|
|
||||||
|
| Name | Base Acc. | Novel Acc. | HM | Epochs |
|
||||||
|
|--------------------------------------------|:---------:|:----------:|:---------:|:------:|
|
||||||
|
| [CLIP](https://arxiv.org/abs/2103.00020) | 69.34 | 74.22 | 71.70 | - |
|
||||||
|
| [CoOp](https://arxiv.org/abs/2109.01134) | 82.69 | 63.22 | 71.66 | 200 |
|
||||||
|
| [CoCoOp](https://arxiv.org/abs/2203.05557) | 80.47 | 71.69 | 75.83 | 10 |
|
||||||
|
| [ProDA](https://arxiv.org/abs/2205.03340) | 81.56 | 75.83 | 76.65 | 100 |
|
||||||
|
| [MaPLe](https://arxiv.org/abs/2210.03117) | 82.28 | 75.14 | 78.55 | 5 |
|
||||||
|
| [PromptSRC (ours)]() | **84.26** | **76.10** | **79.97** | 20 |
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
For installation and other package requirements, please follow the instructions detailed in [INSTALL.md](docs/INSTALL.md).
|
||||||
|
|
||||||
|
## Data Preparation
|
||||||
|
Please follow the instructions at [DATASETS.md](docs/DATASETS.md) to prepare all datasets.
|
||||||
|
|
||||||
|
## Model Zoo
|
||||||
|
|
||||||
|
### Vision-Language prompting methods
|
||||||
|
| Name (configs) | Model checkpoints |
|
||||||
|
|---------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------:|
|
||||||
|
| [Independent V-L Prompting](configs/trainers/IVLP/vit_b16_c2_ep20_batch4_4+4ctx.yaml) | [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/EuIwh-yMh_JBqB2Y_o8Jl14BPDKDRHC0JBPE1BugIeZiSQ?e=AJ8MhY) |
|
||||||
|
| [PromptSRC](configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml) | [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/EqFXPs2Zl9pKp39w3SqlR7QBDACTv-AgCXH6_cGflrUFwg?e=l33EBA) |
|
||||||
|
|
||||||
|
|
||||||
|
## Evaluation
|
||||||
|
Please refer to the [EVAL.md](docs/EVAL.md) for detailed instructions on using the evaluation scripts and reproducing the official results using our pre-trained models.
|
||||||
|
|
||||||
|
## Training
|
||||||
|
Please refer to the [TRAIN.md](docs/TRAIN.md) for detailed instructions on training PromptSRC and IVLP baseline from scratch.
|
||||||
|
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
## Citation
|
||||||
|
If you find our work, this repository, or pretrained models useful, please consider giving a star :star: and citation.
|
||||||
|
```bibtex
|
||||||
|
@article{khattak2023PromptSRC,
|
||||||
|
title={Self-regulating Prompts: Foundational Model Adaptation without Forgetting},
|
||||||
|
author={khattak, Muhammad Uzair and Wasim, Syed Talal and Muzzamal, Naseer and Khan, Salman and Yang, Ming-Hsuan and Khan, Fahad Shahbaz},
|
||||||
|
journal={arXiv:},
|
||||||
|
year={2023}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
If you have any questions, please create an issue on this repository or contact at uzair.khattak@mbzuai.ac.ae or syed.wasim@mbzuai.ac.ae.
|
||||||
|
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
Our code is based on [MaPLe](https://github.com/muzairkhattak/multimodal-prompt-learning), along with [Co-CoOp and CoOp](https://github.com/KaiyangZhou/CoOp) repository. We thank the authors for releasing their code. If you use our model and code, please consider citing these works as well.
|
||||||
|
|
||||||
1
clip/__init__.py
Normal file
1
clip/__init__.py
Normal file
@@ -0,0 +1 @@
|
|||||||
|
from .clip import *
|
||||||
BIN
clip/__pycache__/__init__.cpython-37.pyc
Normal file
BIN
clip/__pycache__/__init__.cpython-37.pyc
Normal file
Binary file not shown.
BIN
clip/__pycache__/clip.cpython-37.pyc
Normal file
BIN
clip/__pycache__/clip.cpython-37.pyc
Normal file
Binary file not shown.
BIN
clip/__pycache__/model.cpython-37.pyc
Normal file
BIN
clip/__pycache__/model.cpython-37.pyc
Normal file
Binary file not shown.
BIN
clip/__pycache__/simple_tokenizer.cpython-37.pyc
Normal file
BIN
clip/__pycache__/simple_tokenizer.cpython-37.pyc
Normal file
Binary file not shown.
BIN
clip/bpe_simple_vocab_16e6.txt.gz
Normal file
BIN
clip/bpe_simple_vocab_16e6.txt.gz
Normal file
Binary file not shown.
221
clip/clip.py
Normal file
221
clip/clip.py
Normal file
@@ -0,0 +1,221 @@
|
|||||||
|
import hashlib
|
||||||
|
import os
|
||||||
|
import urllib
|
||||||
|
import warnings
|
||||||
|
from typing import Union, List
|
||||||
|
|
||||||
|
import torch
|
||||||
|
from PIL import Image
|
||||||
|
from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
|
||||||
|
from tqdm import tqdm
|
||||||
|
|
||||||
|
from .model import build_model
|
||||||
|
from .simple_tokenizer import SimpleTokenizer as _Tokenizer
|
||||||
|
|
||||||
|
try:
|
||||||
|
from torchvision.transforms import InterpolationMode
|
||||||
|
BICUBIC = InterpolationMode.BICUBIC
|
||||||
|
except ImportError:
|
||||||
|
BICUBIC = Image.BICUBIC
|
||||||
|
|
||||||
|
|
||||||
|
if torch.__version__.split(".") < ["1", "7", "1"]:
|
||||||
|
warnings.warn("PyTorch version 1.7.1 or higher is recommended")
|
||||||
|
|
||||||
|
|
||||||
|
__all__ = ["available_models", "load", "tokenize"]
|
||||||
|
_tokenizer = _Tokenizer()
|
||||||
|
|
||||||
|
_MODELS = {
|
||||||
|
"RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
|
||||||
|
"RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
|
||||||
|
"RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
|
||||||
|
"RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
|
||||||
|
"ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
|
||||||
|
"ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _download(url: str, root: str = os.path.expanduser("~/.cache/clip")):
|
||||||
|
os.makedirs(root, exist_ok=True)
|
||||||
|
filename = os.path.basename(url)
|
||||||
|
|
||||||
|
expected_sha256 = url.split("/")[-2]
|
||||||
|
download_target = os.path.join(root, filename)
|
||||||
|
|
||||||
|
if os.path.exists(download_target) and not os.path.isfile(download_target):
|
||||||
|
raise RuntimeError(f"{download_target} exists and is not a regular file")
|
||||||
|
|
||||||
|
if os.path.isfile(download_target):
|
||||||
|
if hashlib.sha256(open(download_target, "rb").read()).hexdigest() == expected_sha256:
|
||||||
|
return download_target
|
||||||
|
else:
|
||||||
|
warnings.warn(f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file")
|
||||||
|
|
||||||
|
with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
|
||||||
|
with tqdm(total=int(source.info().get("Content-Length")), ncols=80, unit='iB', unit_scale=True) as loop:
|
||||||
|
while True:
|
||||||
|
buffer = source.read(8192)
|
||||||
|
if not buffer:
|
||||||
|
break
|
||||||
|
|
||||||
|
output.write(buffer)
|
||||||
|
loop.update(len(buffer))
|
||||||
|
|
||||||
|
if hashlib.sha256(open(download_target, "rb").read()).hexdigest() != expected_sha256:
|
||||||
|
raise RuntimeError(f"Model has been downloaded but the SHA256 checksum does not not match")
|
||||||
|
|
||||||
|
return download_target
|
||||||
|
|
||||||
|
|
||||||
|
def _transform(n_px):
|
||||||
|
return Compose([
|
||||||
|
Resize(n_px, interpolation=BICUBIC),
|
||||||
|
CenterCrop(n_px),
|
||||||
|
lambda image: image.convert("RGB"),
|
||||||
|
ToTensor(),
|
||||||
|
Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
|
||||||
|
])
|
||||||
|
|
||||||
|
|
||||||
|
def available_models() -> List[str]:
|
||||||
|
"""Returns the names of available CLIP models"""
|
||||||
|
return list(_MODELS.keys())
|
||||||
|
|
||||||
|
|
||||||
|
def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu", jit=False):
|
||||||
|
"""Load a CLIP model
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
name : str
|
||||||
|
A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict
|
||||||
|
|
||||||
|
device : Union[str, torch.device]
|
||||||
|
The device to put the loaded model
|
||||||
|
|
||||||
|
jit : bool
|
||||||
|
Whether to load the optimized JIT model or more hackable non-JIT model (default).
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
model : torch.nn.Module
|
||||||
|
The CLIP model
|
||||||
|
|
||||||
|
preprocess : Callable[[PIL.Image], torch.Tensor]
|
||||||
|
A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
|
||||||
|
"""
|
||||||
|
if name in _MODELS:
|
||||||
|
model_path = _download(_MODELS[name])
|
||||||
|
elif os.path.isfile(name):
|
||||||
|
model_path = name
|
||||||
|
else:
|
||||||
|
raise RuntimeError(f"Model {name} not found; available models = {available_models()}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# loading JIT archive
|
||||||
|
model = torch.jit.load(model_path, map_location=device if jit else "cpu").eval()
|
||||||
|
state_dict = None
|
||||||
|
except RuntimeError:
|
||||||
|
# loading saved state dict
|
||||||
|
if jit:
|
||||||
|
warnings.warn(f"File {model_path} is not a JIT archive. Loading as a state dict instead")
|
||||||
|
jit = False
|
||||||
|
state_dict = torch.load(model_path, map_location="cpu")
|
||||||
|
|
||||||
|
if not jit:
|
||||||
|
model = build_model(state_dict or model.state_dict()).to(device)
|
||||||
|
if str(device) == "cpu":
|
||||||
|
model.float()
|
||||||
|
return model, _transform(model.visual.input_resolution)
|
||||||
|
|
||||||
|
# patch the device names
|
||||||
|
device_holder = torch.jit.trace(lambda: torch.ones([]).to(torch.device(device)), example_inputs=[])
|
||||||
|
device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1]
|
||||||
|
|
||||||
|
def patch_device(module):
|
||||||
|
try:
|
||||||
|
graphs = [module.graph] if hasattr(module, "graph") else []
|
||||||
|
except RuntimeError:
|
||||||
|
graphs = []
|
||||||
|
|
||||||
|
if hasattr(module, "forward1"):
|
||||||
|
graphs.append(module.forward1.graph)
|
||||||
|
|
||||||
|
for graph in graphs:
|
||||||
|
for node in graph.findAllNodes("prim::Constant"):
|
||||||
|
if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
|
||||||
|
node.copyAttributes(device_node)
|
||||||
|
|
||||||
|
model.apply(patch_device)
|
||||||
|
patch_device(model.encode_image)
|
||||||
|
patch_device(model.encode_text)
|
||||||
|
|
||||||
|
# patch dtype to float32 on CPU
|
||||||
|
if str(device) == "cpu":
|
||||||
|
float_holder = torch.jit.trace(lambda: torch.ones([]).float(), example_inputs=[])
|
||||||
|
float_input = list(float_holder.graph.findNode("aten::to").inputs())[1]
|
||||||
|
float_node = float_input.node()
|
||||||
|
|
||||||
|
def patch_float(module):
|
||||||
|
try:
|
||||||
|
graphs = [module.graph] if hasattr(module, "graph") else []
|
||||||
|
except RuntimeError:
|
||||||
|
graphs = []
|
||||||
|
|
||||||
|
if hasattr(module, "forward1"):
|
||||||
|
graphs.append(module.forward1.graph)
|
||||||
|
|
||||||
|
for graph in graphs:
|
||||||
|
for node in graph.findAllNodes("aten::to"):
|
||||||
|
inputs = list(node.inputs())
|
||||||
|
for i in [1, 2]: # dtype can be the second or third argument to aten::to()
|
||||||
|
if inputs[i].node()["value"] == 5:
|
||||||
|
inputs[i].node().copyAttributes(float_node)
|
||||||
|
|
||||||
|
model.apply(patch_float)
|
||||||
|
patch_float(model.encode_image)
|
||||||
|
patch_float(model.encode_text)
|
||||||
|
|
||||||
|
model.float()
|
||||||
|
|
||||||
|
return model, _transform(model.input_resolution.item())
|
||||||
|
|
||||||
|
|
||||||
|
def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> torch.LongTensor:
|
||||||
|
"""
|
||||||
|
Returns the tokenized representation of given input string(s)
|
||||||
|
|
||||||
|
Parameters
|
||||||
|
----------
|
||||||
|
texts : Union[str, List[str]]
|
||||||
|
An input string or a list of input strings to tokenize
|
||||||
|
|
||||||
|
context_length : int
|
||||||
|
The context length to use; all CLIP models use 77 as the context length
|
||||||
|
|
||||||
|
truncate: bool
|
||||||
|
Whether to truncate the text in case its encoding is longer than the context length
|
||||||
|
|
||||||
|
Returns
|
||||||
|
-------
|
||||||
|
A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]
|
||||||
|
"""
|
||||||
|
if isinstance(texts, str):
|
||||||
|
texts = [texts]
|
||||||
|
|
||||||
|
sot_token = _tokenizer.encoder["<|startoftext|>"]
|
||||||
|
eot_token = _tokenizer.encoder["<|endoftext|>"]
|
||||||
|
all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts]
|
||||||
|
result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
|
||||||
|
|
||||||
|
for i, tokens in enumerate(all_tokens):
|
||||||
|
if len(tokens) > context_length:
|
||||||
|
if truncate:
|
||||||
|
tokens = tokens[:context_length]
|
||||||
|
tokens[-1] = eot_token
|
||||||
|
else:
|
||||||
|
raise RuntimeError(f"Input {texts[i]} is too long for context length {context_length}")
|
||||||
|
result[i, :len(tokens)] = torch.tensor(tokens)
|
||||||
|
|
||||||
|
return result
|
||||||
699
clip/model.py
Normal file
699
clip/model.py
Normal file
@@ -0,0 +1,699 @@
|
|||||||
|
from collections import OrderedDict
|
||||||
|
from typing import Tuple, Union
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
import torch.nn.functional as F
|
||||||
|
from torch import nn
|
||||||
|
|
||||||
|
|
||||||
|
class Bottleneck(nn.Module):
|
||||||
|
expansion = 4
|
||||||
|
|
||||||
|
def __init__(self, inplanes, planes, stride=1):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
# all conv layers have stride 1. an avgpool is performed after the second convolution when stride > 1
|
||||||
|
self.conv1 = nn.Conv2d(inplanes, planes, 1, bias=False)
|
||||||
|
self.bn1 = nn.BatchNorm2d(planes)
|
||||||
|
|
||||||
|
self.conv2 = nn.Conv2d(planes, planes, 3, padding=1, bias=False)
|
||||||
|
self.bn2 = nn.BatchNorm2d(planes)
|
||||||
|
|
||||||
|
self.avgpool = nn.AvgPool2d(stride) if stride > 1 else nn.Identity()
|
||||||
|
|
||||||
|
self.conv3 = nn.Conv2d(planes, planes * self.expansion, 1, bias=False)
|
||||||
|
self.bn3 = nn.BatchNorm2d(planes * self.expansion)
|
||||||
|
|
||||||
|
self.relu = nn.ReLU(inplace=True)
|
||||||
|
self.downsample = None
|
||||||
|
self.stride = stride
|
||||||
|
|
||||||
|
if stride > 1 or inplanes != planes * Bottleneck.expansion:
|
||||||
|
# downsampling layer is prepended with an avgpool, and the subsequent convolution has stride 1
|
||||||
|
self.downsample = nn.Sequential(OrderedDict([
|
||||||
|
("-1", nn.AvgPool2d(stride)),
|
||||||
|
("0", nn.Conv2d(inplanes, planes * self.expansion, 1, stride=1, bias=False)),
|
||||||
|
("1", nn.BatchNorm2d(planes * self.expansion))
|
||||||
|
]))
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
identity = x
|
||||||
|
|
||||||
|
out = self.relu(self.bn1(self.conv1(x)))
|
||||||
|
out = self.relu(self.bn2(self.conv2(out)))
|
||||||
|
out = self.avgpool(out)
|
||||||
|
out = self.bn3(self.conv3(out))
|
||||||
|
|
||||||
|
if self.downsample is not None:
|
||||||
|
identity = self.downsample(x)
|
||||||
|
|
||||||
|
out += identity
|
||||||
|
out = self.relu(out)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
class AttentionPool2d(nn.Module):
|
||||||
|
def __init__(self, spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None):
|
||||||
|
super().__init__()
|
||||||
|
self.positional_embedding = nn.Parameter(torch.randn(spacial_dim ** 2 + 1, embed_dim) / embed_dim ** 0.5)
|
||||||
|
self.k_proj = nn.Linear(embed_dim, embed_dim)
|
||||||
|
self.q_proj = nn.Linear(embed_dim, embed_dim)
|
||||||
|
self.v_proj = nn.Linear(embed_dim, embed_dim)
|
||||||
|
self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim)
|
||||||
|
self.num_heads = num_heads
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
x = x.reshape(x.shape[0], x.shape[1], x.shape[2] * x.shape[3]).permute(2, 0, 1) # NCHW -> (HW)NC
|
||||||
|
x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0) # (HW+1)NC
|
||||||
|
x = x + self.positional_embedding[:, None, :].to(x.dtype) # (HW+1)NC
|
||||||
|
x, _ = F.multi_head_attention_forward(
|
||||||
|
query=x, key=x, value=x,
|
||||||
|
embed_dim_to_check=x.shape[-1],
|
||||||
|
num_heads=self.num_heads,
|
||||||
|
q_proj_weight=self.q_proj.weight,
|
||||||
|
k_proj_weight=self.k_proj.weight,
|
||||||
|
v_proj_weight=self.v_proj.weight,
|
||||||
|
in_proj_weight=None,
|
||||||
|
in_proj_bias=torch.cat([self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]),
|
||||||
|
bias_k=None,
|
||||||
|
bias_v=None,
|
||||||
|
add_zero_attn=False,
|
||||||
|
dropout_p=0,
|
||||||
|
out_proj_weight=self.c_proj.weight,
|
||||||
|
out_proj_bias=self.c_proj.bias,
|
||||||
|
use_separate_proj_weight=True,
|
||||||
|
training=self.training,
|
||||||
|
need_weights=False
|
||||||
|
)
|
||||||
|
|
||||||
|
return x[0]
|
||||||
|
|
||||||
|
|
||||||
|
class ModifiedResNet(nn.Module):
|
||||||
|
"""
|
||||||
|
A ResNet class that is similar to torchvision's but contains the following changes:
|
||||||
|
- There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
|
||||||
|
- Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
|
||||||
|
- The final pooling layer is a QKV attention instead of an average pool
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, layers, output_dim, heads, input_resolution=224, width=64):
|
||||||
|
super().__init__()
|
||||||
|
self.output_dim = output_dim
|
||||||
|
self.input_resolution = input_resolution
|
||||||
|
|
||||||
|
# the 3-layer stem
|
||||||
|
self.conv1 = nn.Conv2d(3, width // 2, kernel_size=3, stride=2, padding=1, bias=False)
|
||||||
|
self.bn1 = nn.BatchNorm2d(width // 2)
|
||||||
|
self.conv2 = nn.Conv2d(width // 2, width // 2, kernel_size=3, padding=1, bias=False)
|
||||||
|
self.bn2 = nn.BatchNorm2d(width // 2)
|
||||||
|
self.conv3 = nn.Conv2d(width // 2, width, kernel_size=3, padding=1, bias=False)
|
||||||
|
self.bn3 = nn.BatchNorm2d(width)
|
||||||
|
self.avgpool = nn.AvgPool2d(2)
|
||||||
|
self.relu = nn.ReLU(inplace=True)
|
||||||
|
|
||||||
|
# residual layers
|
||||||
|
self._inplanes = width # this is a *mutable* variable used during construction
|
||||||
|
self.layer1 = self._make_layer(width, layers[0])
|
||||||
|
self.layer2 = self._make_layer(width * 2, layers[1], stride=2)
|
||||||
|
self.layer3 = self._make_layer(width * 4, layers[2], stride=2)
|
||||||
|
self.layer4 = self._make_layer(width * 8, layers[3], stride=2)
|
||||||
|
|
||||||
|
embed_dim = width * 32 # the ResNet feature dimension
|
||||||
|
self.attnpool = AttentionPool2d(input_resolution // 32, embed_dim, heads, output_dim)
|
||||||
|
|
||||||
|
def _make_layer(self, planes, blocks, stride=1):
|
||||||
|
layers = [Bottleneck(self._inplanes, planes, stride)]
|
||||||
|
|
||||||
|
self._inplanes = planes * Bottleneck.expansion
|
||||||
|
for _ in range(1, blocks):
|
||||||
|
layers.append(Bottleneck(self._inplanes, planes))
|
||||||
|
|
||||||
|
return nn.Sequential(*layers)
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
def stem(x):
|
||||||
|
for conv, bn in [(self.conv1, self.bn1), (self.conv2, self.bn2), (self.conv3, self.bn3)]:
|
||||||
|
x = self.relu(bn(conv(x)))
|
||||||
|
x = self.avgpool(x)
|
||||||
|
return x
|
||||||
|
|
||||||
|
x = x.type(self.conv1.weight.dtype)
|
||||||
|
x = stem(x)
|
||||||
|
x = self.layer1(x)
|
||||||
|
x = self.layer2(x)
|
||||||
|
x = self.layer3(x)
|
||||||
|
x = self.layer4(x)
|
||||||
|
x = self.attnpool(x)
|
||||||
|
|
||||||
|
return x
|
||||||
|
|
||||||
|
|
||||||
|
class LayerNorm(nn.LayerNorm):
|
||||||
|
"""Subclass torch's LayerNorm to handle fp16."""
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
orig_type = x.dtype
|
||||||
|
ret = super().forward(x.type(torch.float32))
|
||||||
|
return ret.type(orig_type)
|
||||||
|
|
||||||
|
|
||||||
|
class QuickGELU(nn.Module):
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
return x * torch.sigmoid(1.702 * x)
|
||||||
|
|
||||||
|
|
||||||
|
class ResidualAttentionBlock(nn.Module):
|
||||||
|
def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.attn = nn.MultiheadAttention(d_model, n_head)
|
||||||
|
self.ln_1 = LayerNorm(d_model)
|
||||||
|
self.mlp = nn.Sequential(OrderedDict([
|
||||||
|
("c_fc", nn.Linear(d_model, d_model * 4)),
|
||||||
|
("gelu", QuickGELU()),
|
||||||
|
("c_proj", nn.Linear(d_model * 4, d_model))
|
||||||
|
]))
|
||||||
|
self.ln_2 = LayerNorm(d_model)
|
||||||
|
self.attn_mask = attn_mask
|
||||||
|
|
||||||
|
def attention(self, x: torch.Tensor):
|
||||||
|
self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
|
||||||
|
return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
x = x + self.attention(self.ln_1(x))
|
||||||
|
x = x + self.mlp(self.ln_2(x))
|
||||||
|
return x
|
||||||
|
|
||||||
|
|
||||||
|
class ResidualAttentionBlock_IVLP(nn.Module):
|
||||||
|
def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None, add_prompt=False,
|
||||||
|
text_layer=False, i=0, design_details=None):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.attn = nn.MultiheadAttention(d_model, n_head)
|
||||||
|
self.ln_1 = LayerNorm(d_model)
|
||||||
|
self.mlp = nn.Sequential(OrderedDict([
|
||||||
|
("c_fc", nn.Linear(d_model, d_model * 4)),
|
||||||
|
("gelu", QuickGELU()),
|
||||||
|
("c_proj", nn.Linear(d_model * 4, d_model))
|
||||||
|
]))
|
||||||
|
self.ln_2 = LayerNorm(d_model)
|
||||||
|
# Only add learnable tokens if flag is set True
|
||||||
|
# For the first iteration i, we should not add the learnable parameters
|
||||||
|
# as it is already been taken care of in the very start, for both text
|
||||||
|
# and the visual branch
|
||||||
|
self.text_layer = text_layer
|
||||||
|
self.attn_mask = attn_mask
|
||||||
|
if i != 0:
|
||||||
|
self.add_prompt = add_prompt
|
||||||
|
if self.add_prompt:
|
||||||
|
if self.text_layer:
|
||||||
|
self.n_ctx_text = design_details["language_ctx"] # hyperparameter
|
||||||
|
ctx_vectors = torch.empty(self.n_ctx_text, d_model)
|
||||||
|
else:
|
||||||
|
self.n_ctx_visual = design_details["vision_ctx"] # hyperparameter
|
||||||
|
ctx_vectors = torch.empty(self.n_ctx_visual, d_model)
|
||||||
|
# Code snippet for per layer visual prompts
|
||||||
|
nn.init.normal_(ctx_vectors, std=0.02)
|
||||||
|
self.VPT_shallow = nn.Parameter(ctx_vectors)
|
||||||
|
else:
|
||||||
|
self.add_prompt = False
|
||||||
|
|
||||||
|
def attention(self, x: torch.Tensor):
|
||||||
|
self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
|
||||||
|
return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
# Will need to append the learnable tokens for this layer here
|
||||||
|
# Check if flag was set for this layer or not
|
||||||
|
if self.add_prompt:
|
||||||
|
# Also see if this is textual transformer layer or not
|
||||||
|
if not self.text_layer:
|
||||||
|
# Remove the outputs produced by learnable tokens of previous layer
|
||||||
|
prefix = x[0:x.shape[0] - self.n_ctx_visual, :, :]
|
||||||
|
# Create/configure learnable tokens of this layer
|
||||||
|
visual_context = self.VPT_shallow.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
|
||||||
|
# Add the learnable tokens of this layer with the input, by replacing the previous
|
||||||
|
# layer learnable tokens
|
||||||
|
x = torch.cat([prefix, visual_context], dim=0)
|
||||||
|
else:
|
||||||
|
# Appending the learnable tokens in different way
|
||||||
|
# x -> [77, NCLS, DIM]
|
||||||
|
# First remove the learnable tokens from previous layer
|
||||||
|
prefix = x[:1, :, :]
|
||||||
|
suffix = x[1 + self.n_ctx_text:, :, :]
|
||||||
|
# Create/configure learnable tokens of this layer
|
||||||
|
textual_context = self.VPT_shallow.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
|
||||||
|
# Add the learnable tokens of this layer with the input, replaced by previous
|
||||||
|
# layer learnable tokens
|
||||||
|
x = torch.cat([prefix, textual_context, suffix], dim=0)
|
||||||
|
|
||||||
|
x = x + self.attention(self.ln_1(x))
|
||||||
|
x = x + self.mlp(self.ln_2(x))
|
||||||
|
return x
|
||||||
|
|
||||||
|
|
||||||
|
class ResidualAttentionBlock_MaPLe(nn.Module):
|
||||||
|
def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None, design_details=None,
|
||||||
|
text_layer=False, i=0):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.attn = nn.MultiheadAttention(d_model, n_head)
|
||||||
|
self.ln_1 = LayerNorm(d_model)
|
||||||
|
self.mlp = nn.Sequential(OrderedDict([
|
||||||
|
("c_fc", nn.Linear(d_model, d_model * 4)),
|
||||||
|
("gelu", QuickGELU()),
|
||||||
|
("c_proj", nn.Linear(d_model * 4, d_model))
|
||||||
|
]))
|
||||||
|
self.ln_2 = LayerNorm(d_model)
|
||||||
|
# For the first iteration i, we do not need to add the learnable parameters here
|
||||||
|
# as it will be added in the beginning, for both text and the vision branch
|
||||||
|
self.text_layer = text_layer
|
||||||
|
self.attn_mask = attn_mask
|
||||||
|
# This must be consistent with the config file prompt
|
||||||
|
self.compound_prompt_nctx = design_details['maple_length']
|
||||||
|
if i == 0:
|
||||||
|
self.first_layer = True
|
||||||
|
else:
|
||||||
|
self.first_layer = False
|
||||||
|
|
||||||
|
def attention(self, x: torch.Tensor):
|
||||||
|
self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
|
||||||
|
return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
|
||||||
|
|
||||||
|
def forward(self, inputs):
|
||||||
|
# For the first layer, we do not need to add any duplicate, as it is already added
|
||||||
|
# as the shallow version
|
||||||
|
x = inputs[0]
|
||||||
|
compound_prompts_deeper = inputs[1]
|
||||||
|
counter = inputs[2]
|
||||||
|
if not self.first_layer:
|
||||||
|
if len(compound_prompts_deeper) > 0:
|
||||||
|
# This means that deeper compound prompts are turned on
|
||||||
|
# Here it behaves differently for text and visual side
|
||||||
|
# Forward function is same for both
|
||||||
|
|
||||||
|
if not self.text_layer:
|
||||||
|
# First check if the ith layer needs compound prompts or not
|
||||||
|
if not (counter > len(compound_prompts_deeper) - 1):
|
||||||
|
# Remove the outputs produced by learnable tokens of previous layer
|
||||||
|
prefix = x[0:x.shape[0] - self.compound_prompt_nctx, :, :]
|
||||||
|
# Create/configure learnable tokens of this layer
|
||||||
|
visual_context = compound_prompts_deeper[counter] # extract the correct index
|
||||||
|
visual_context = visual_context.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
|
||||||
|
# Add the learnable tokens of this layer with the input, by replacing previous
|
||||||
|
# layer learnable tokens
|
||||||
|
x = torch.cat([prefix, visual_context], dim=0)
|
||||||
|
|
||||||
|
# Once done, update the counter, so that the next time, it does not use same learnable tokens
|
||||||
|
counter += 1
|
||||||
|
else:
|
||||||
|
# First check if the ith layer needs compound prompts or not
|
||||||
|
if not (counter > len(compound_prompts_deeper) - 1):
|
||||||
|
# Appending the learnable tokens in different way
|
||||||
|
# x -> [77, NCLS, DIM]
|
||||||
|
# First remove the learnable tokens from previous layer
|
||||||
|
prefix = x[:1, :, :]
|
||||||
|
suffix = x[1 + self.compound_prompt_nctx:, :, :]
|
||||||
|
# Create/configure learnable tokens of this layer
|
||||||
|
textual_context = compound_prompts_deeper[counter]
|
||||||
|
textual_context = textual_context.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
|
||||||
|
# Add the learnable tokens of this layer with the input, replaced by previous
|
||||||
|
# layer learnable tokens
|
||||||
|
x = torch.cat([prefix, textual_context, suffix], dim=0)
|
||||||
|
# Once done, update the counter, so that the next time, it does not use same learnable tokens
|
||||||
|
counter += 1
|
||||||
|
x = x + self.attention(self.ln_1(x))
|
||||||
|
x = x + self.mlp(self.ln_2(x))
|
||||||
|
return [x, compound_prompts_deeper, counter] # return again as a list, so that nn.seq can work
|
||||||
|
|
||||||
|
|
||||||
|
class Transformer(nn.Module):
|
||||||
|
def __init__(self, width: int, layers: int, heads: int, attn_mask: torch.Tensor = None, prompts_needed=0,
|
||||||
|
text_layer=False, design_details=None):
|
||||||
|
super().__init__()
|
||||||
|
self.width = width
|
||||||
|
self.layers = layers
|
||||||
|
# Implements respective encoder blocks for a given design choice
|
||||||
|
current_trainer = design_details['trainer']
|
||||||
|
if current_trainer == 'IVLP' or current_trainer == 'VPT':
|
||||||
|
self.resblocks = nn.Sequential(*[ResidualAttentionBlock_IVLP(width, heads, attn_mask, True,
|
||||||
|
text_layer, i,
|
||||||
|
design_details) if prompts_needed > i
|
||||||
|
else ResidualAttentionBlock_IVLP(width, heads, attn_mask, False,
|
||||||
|
text_layer, i, design_details)
|
||||||
|
for i in range(layers)])
|
||||||
|
elif current_trainer == 'MaPLe':
|
||||||
|
self.resblocks = nn.Sequential(
|
||||||
|
*[ResidualAttentionBlock_MaPLe(width, heads, attn_mask, design_details, text_layer, i)
|
||||||
|
for i in range(layers)])
|
||||||
|
else:
|
||||||
|
# Corresponds to default CoOp or CoCoOp
|
||||||
|
assert current_trainer == 'CoOp' or current_trainer == 'CoCoOp'
|
||||||
|
self.resblocks = nn.Sequential(*[ResidualAttentionBlock(width, heads, attn_mask) for _ in range(layers)])
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
return self.resblocks(x)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
class VisionTransformer(nn.Module):
|
||||||
|
def __init__(self, input_resolution: int, patch_size: int, width: int, layers: int, heads: int,
|
||||||
|
output_dim: int, design_details):
|
||||||
|
super().__init__()
|
||||||
|
self.input_resolution = input_resolution
|
||||||
|
self.output_dim = output_dim
|
||||||
|
self.conv1 = nn.Conv2d(in_channels=3, out_channels=width, kernel_size=patch_size, stride=patch_size, bias=False)
|
||||||
|
if design_details["vision_depth"] == 0:
|
||||||
|
self.VPT_shallow = False
|
||||||
|
else:
|
||||||
|
self.VPT_shallow = True
|
||||||
|
if self.VPT_shallow:
|
||||||
|
# Add visual prompt tokens here
|
||||||
|
n_ctx = design_details["vision_ctx"] # hyperparameter
|
||||||
|
ctx_vectors = torch.empty(n_ctx, width)
|
||||||
|
nn.init.normal_(ctx_vectors, std=0.02)
|
||||||
|
self.VPT = nn.Parameter(ctx_vectors)
|
||||||
|
# self.VPT.half()
|
||||||
|
scale = width ** -0.5
|
||||||
|
self.class_embedding = nn.Parameter(scale * torch.randn(width))
|
||||||
|
self.positional_embedding = nn.Parameter(scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width))
|
||||||
|
self.ln_pre = LayerNorm(width)
|
||||||
|
# hyper-parameter if need to add prompt embeddings inside to the input
|
||||||
|
# of transformer block or not:
|
||||||
|
self.prompt_till_layer_visual = design_details["vision_depth"]
|
||||||
|
self.transformer = Transformer(width, layers, heads, prompts_needed=self.prompt_till_layer_visual,
|
||||||
|
design_details=design_details)
|
||||||
|
|
||||||
|
self.ln_post = LayerNorm(width)
|
||||||
|
self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor):
|
||||||
|
x = self.conv1(x) # shape = [*, width, grid, grid]
|
||||||
|
x = x.reshape(x.shape[0], x.shape[1], -1) # shape = [*, width, grid ** 2]
|
||||||
|
x = x.permute(0, 2, 1) # shape = [*, grid ** 2, width]
|
||||||
|
x = torch.cat(
|
||||||
|
[self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype,
|
||||||
|
device=x.device),
|
||||||
|
x], dim=1) # shape = [*, grid ** 2 + 1, width]
|
||||||
|
x = x + self.positional_embedding.to(x.dtype)
|
||||||
|
|
||||||
|
# After positional embeddings, we will attach prompts with the model, remember only those
|
||||||
|
# are trainable parameters here in whole image encoder.
|
||||||
|
if self.VPT_shallow:
|
||||||
|
visual_ctx = self.VPT.expand(x.shape[0], -1, -1).half()
|
||||||
|
x = torch.cat([x, visual_ctx], dim=1)
|
||||||
|
else:
|
||||||
|
assert self.prompt_till_layer_visual == 0
|
||||||
|
|
||||||
|
# Normal code as before
|
||||||
|
x = self.ln_pre(x)
|
||||||
|
|
||||||
|
x = x.permute(1, 0, 2) # NLD -> LND
|
||||||
|
x = self.transformer(x)
|
||||||
|
x = x.permute(1, 0, 2) # LND -> NLD
|
||||||
|
|
||||||
|
x = self.ln_post(x[:, 0, :])
|
||||||
|
|
||||||
|
if self.proj is not None:
|
||||||
|
x = x @ self.proj
|
||||||
|
|
||||||
|
return x
|
||||||
|
|
||||||
|
|
||||||
|
class VisionTransformer_MaPLe(nn.Module):
|
||||||
|
def __init__(self, input_resolution: int, patch_size: int, width: int, layers: int, heads: int, output_dim: int,
|
||||||
|
design_details):
|
||||||
|
super().__init__()
|
||||||
|
self.input_resolution = input_resolution
|
||||||
|
self.output_dim = output_dim
|
||||||
|
self.conv1 = nn.Conv2d(in_channels=3, out_channels=width, kernel_size=patch_size, stride=patch_size, bias=False)
|
||||||
|
self.VPT_shallow = True
|
||||||
|
scale = width ** -0.5
|
||||||
|
self.class_embedding = nn.Parameter(scale * torch.randn(width))
|
||||||
|
self.positional_embedding = nn.Parameter(scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width))
|
||||||
|
self.ln_pre = LayerNorm(width)
|
||||||
|
# hyper-parameter if need to add prompt embeddings inside to the input
|
||||||
|
# of transformer block or not:
|
||||||
|
self.prompt_till_layer_visual = 0
|
||||||
|
self.transformer = Transformer(width, layers, heads, design_details=design_details)
|
||||||
|
|
||||||
|
self.ln_post = LayerNorm(width)
|
||||||
|
self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
|
||||||
|
|
||||||
|
def forward(self, x: torch.Tensor, shared_ctx, compound_deeper_prompts):
|
||||||
|
x = self.conv1(x) # shape = [*, width, grid, grid]
|
||||||
|
x = x.reshape(x.shape[0], x.shape[1], -1) # shape = [*, width, grid ** 2]
|
||||||
|
x = x.permute(0, 2, 1) # shape = [*, grid ** 2, width]
|
||||||
|
x = torch.cat(
|
||||||
|
[self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device),
|
||||||
|
x], dim=1) # shape = [*, grid ** 2 + 1, width]
|
||||||
|
x = x + self.positional_embedding.to(x.dtype)
|
||||||
|
|
||||||
|
# After positional embeddings, we will attach prompts with the model, remember only those
|
||||||
|
# are trainable parameters here in whole image encoder.
|
||||||
|
if self.VPT_shallow:
|
||||||
|
visual_ctx = shared_ctx.expand(x.shape[0], -1, -1).half()
|
||||||
|
x = torch.cat([x, visual_ctx], dim=1)
|
||||||
|
else:
|
||||||
|
assert self.prompt_till_layer_visual == 0
|
||||||
|
|
||||||
|
# Normal code as before
|
||||||
|
x = self.ln_pre(x)
|
||||||
|
|
||||||
|
x = x.permute(1, 0, 2) # NLD -> LND
|
||||||
|
# Again combine the inputs, so nn.sequential can work
|
||||||
|
outputs = self.transformer([x, compound_deeper_prompts, 0]) # third argument is counter
|
||||||
|
x = outputs[0]
|
||||||
|
x = x.permute(1, 0, 2) # LND -> NLD
|
||||||
|
|
||||||
|
x = self.ln_post(x[:, 0, :])
|
||||||
|
|
||||||
|
if self.proj is not None:
|
||||||
|
x = x @ self.proj
|
||||||
|
|
||||||
|
return x
|
||||||
|
|
||||||
|
|
||||||
|
class CLIP(nn.Module):
|
||||||
|
def __init__(self,
|
||||||
|
embed_dim: int,
|
||||||
|
# vision
|
||||||
|
image_resolution: int,
|
||||||
|
vision_layers: Union[Tuple[int, int, int, int], int],
|
||||||
|
vision_width: int,
|
||||||
|
vision_patch_size: int,
|
||||||
|
# text
|
||||||
|
context_length: int,
|
||||||
|
vocab_size: int,
|
||||||
|
transformer_width: int,
|
||||||
|
transformer_heads: int,
|
||||||
|
transformer_layers: int,
|
||||||
|
design_details
|
||||||
|
):
|
||||||
|
super().__init__()
|
||||||
|
|
||||||
|
self.context_length = context_length
|
||||||
|
trainer = design_details['trainer']
|
||||||
|
|
||||||
|
if isinstance(vision_layers, (tuple, list)):
|
||||||
|
vision_heads = vision_width * 32 // 64
|
||||||
|
self.visual = ModifiedResNet(
|
||||||
|
layers=vision_layers,
|
||||||
|
output_dim=embed_dim,
|
||||||
|
heads=vision_heads,
|
||||||
|
input_resolution=image_resolution,
|
||||||
|
width=vision_width
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
vision_heads = vision_width // 64
|
||||||
|
if trainer == "MaPLe":
|
||||||
|
self.visual = VisionTransformer_MaPLe(
|
||||||
|
input_resolution=image_resolution,
|
||||||
|
patch_size=vision_patch_size,
|
||||||
|
width=vision_width,
|
||||||
|
layers=vision_layers,
|
||||||
|
heads=vision_heads,
|
||||||
|
output_dim=embed_dim,
|
||||||
|
design_details=design_details
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
self.visual = VisionTransformer(
|
||||||
|
input_resolution=image_resolution,
|
||||||
|
patch_size=vision_patch_size,
|
||||||
|
width=vision_width,
|
||||||
|
layers=vision_layers,
|
||||||
|
heads=vision_heads,
|
||||||
|
output_dim=embed_dim,
|
||||||
|
design_details=design_details
|
||||||
|
)
|
||||||
|
# hyper-parameter if need to add prompt embeddings inside to the input
|
||||||
|
# of transformer block or not:
|
||||||
|
prompt_till_layer_text = design_details['language_depth']
|
||||||
|
self.transformer = Transformer(
|
||||||
|
width=transformer_width,
|
||||||
|
layers=transformer_layers,
|
||||||
|
heads=transformer_heads,
|
||||||
|
attn_mask=self.build_attention_mask(),
|
||||||
|
prompts_needed=prompt_till_layer_text,
|
||||||
|
text_layer=True,
|
||||||
|
design_details=design_details
|
||||||
|
)
|
||||||
|
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.token_embedding = nn.Embedding(vocab_size, transformer_width)
|
||||||
|
self.positional_embedding = nn.Parameter(torch.empty(self.context_length, transformer_width))
|
||||||
|
self.ln_final = LayerNorm(transformer_width)
|
||||||
|
|
||||||
|
self.text_projection = nn.Parameter(torch.empty(transformer_width, embed_dim))
|
||||||
|
self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07))
|
||||||
|
|
||||||
|
self.initialize_parameters()
|
||||||
|
|
||||||
|
def initialize_parameters(self):
|
||||||
|
nn.init.normal_(self.token_embedding.weight, std=0.02)
|
||||||
|
nn.init.normal_(self.positional_embedding, std=0.01)
|
||||||
|
|
||||||
|
if isinstance(self.visual, ModifiedResNet):
|
||||||
|
if self.visual.attnpool is not None:
|
||||||
|
std = self.visual.attnpool.c_proj.in_features ** -0.5
|
||||||
|
nn.init.normal_(self.visual.attnpool.q_proj.weight, std=std)
|
||||||
|
nn.init.normal_(self.visual.attnpool.k_proj.weight, std=std)
|
||||||
|
nn.init.normal_(self.visual.attnpool.v_proj.weight, std=std)
|
||||||
|
nn.init.normal_(self.visual.attnpool.c_proj.weight, std=std)
|
||||||
|
|
||||||
|
for resnet_block in [self.visual.layer1, self.visual.layer2, self.visual.layer3, self.visual.layer4]:
|
||||||
|
for name, param in resnet_block.named_parameters():
|
||||||
|
if name.endswith("bn3.weight"):
|
||||||
|
nn.init.zeros_(param)
|
||||||
|
|
||||||
|
proj_std = (self.transformer.width ** -0.5) * ((2 * self.transformer.layers) ** -0.5)
|
||||||
|
attn_std = self.transformer.width ** -0.5
|
||||||
|
fc_std = (2 * self.transformer.width) ** -0.5
|
||||||
|
for block in self.transformer.resblocks:
|
||||||
|
nn.init.normal_(block.attn.in_proj_weight, std=attn_std)
|
||||||
|
nn.init.normal_(block.attn.out_proj.weight, std=proj_std)
|
||||||
|
nn.init.normal_(block.mlp.c_fc.weight, std=fc_std)
|
||||||
|
nn.init.normal_(block.mlp.c_proj.weight, std=proj_std)
|
||||||
|
|
||||||
|
if self.text_projection is not None:
|
||||||
|
nn.init.normal_(self.text_projection, std=self.transformer.width ** -0.5)
|
||||||
|
|
||||||
|
def build_attention_mask(self):
|
||||||
|
# lazily create causal attention mask, with full attention between the vision tokens
|
||||||
|
# pytorch uses additive attention mask; fill with -inf
|
||||||
|
mask = torch.empty(self.context_length, self.context_length)
|
||||||
|
mask.fill_(float("-inf"))
|
||||||
|
mask.triu_(1) # zero out the lower diagonal
|
||||||
|
return mask
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dtype(self):
|
||||||
|
return self.visual.conv1.weight.dtype
|
||||||
|
|
||||||
|
def encode_image(self, image):
|
||||||
|
return self.visual(image.type(self.dtype))
|
||||||
|
|
||||||
|
def encode_text(self, text):
|
||||||
|
x = self.token_embedding(text).type(self.dtype) # [batch_size, n_ctx, d_model]
|
||||||
|
|
||||||
|
x = x + self.positional_embedding.type(self.dtype)
|
||||||
|
x = x.permute(1, 0, 2) # NLD -> LND
|
||||||
|
x = self.transformer(x)
|
||||||
|
x = x.permute(1, 0, 2) # LND -> NLD
|
||||||
|
x = self.ln_final(x).type(self.dtype)
|
||||||
|
|
||||||
|
# x.shape = [batch_size, n_ctx, transformer.width]
|
||||||
|
# take features from the eot embedding (eot_token is the highest number in each sequence)
|
||||||
|
x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
|
||||||
|
|
||||||
|
return x
|
||||||
|
|
||||||
|
def forward(self, image, text):
|
||||||
|
image_features = self.encode_image(image)
|
||||||
|
text_features = self.encode_text(text)
|
||||||
|
|
||||||
|
# normalized features
|
||||||
|
image_features = image_features / image_features.norm(dim=-1, keepdim=True)
|
||||||
|
text_features = text_features / text_features.norm(dim=-1, keepdim=True)
|
||||||
|
|
||||||
|
# cosine similarity as logits
|
||||||
|
logit_scale = self.logit_scale.exp()
|
||||||
|
logits_per_image = logit_scale * image_features @ text_features.t()
|
||||||
|
logits_per_text = logit_scale * text_features @ image_features.t()
|
||||||
|
|
||||||
|
# shape = [global_batch_size, global_batch_size]
|
||||||
|
return logits_per_image, logits_per_text
|
||||||
|
|
||||||
|
|
||||||
|
def convert_weights(model: nn.Module):
|
||||||
|
"""Convert applicable model parameters to fp16"""
|
||||||
|
|
||||||
|
def _convert_weights_to_fp16(l):
|
||||||
|
if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
|
||||||
|
l.weight.data = l.weight.data.half()
|
||||||
|
if l.bias is not None:
|
||||||
|
l.bias.data = l.bias.data.half()
|
||||||
|
|
||||||
|
if isinstance(l, nn.MultiheadAttention):
|
||||||
|
for attr in [*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], "in_proj_bias", "bias_k", "bias_v"]:
|
||||||
|
tensor = getattr(l, attr)
|
||||||
|
if tensor is not None:
|
||||||
|
tensor.data = tensor.data.half()
|
||||||
|
|
||||||
|
for name in ["text_projection", "proj"]:
|
||||||
|
if hasattr(l, name):
|
||||||
|
attr = getattr(l, name)
|
||||||
|
if attr is not None:
|
||||||
|
attr.data = attr.data.half()
|
||||||
|
|
||||||
|
model.apply(_convert_weights_to_fp16)
|
||||||
|
|
||||||
|
|
||||||
|
def build_model(state_dict: dict, design_details):
|
||||||
|
vit = "visual.proj" in state_dict
|
||||||
|
|
||||||
|
if vit:
|
||||||
|
vision_width = state_dict["visual.conv1.weight"].shape[0]
|
||||||
|
vision_layers = len(
|
||||||
|
[k for k in state_dict.keys() if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")])
|
||||||
|
vision_patch_size = state_dict["visual.conv1.weight"].shape[-1]
|
||||||
|
grid_size = round((state_dict["visual.positional_embedding"].shape[0] - 1) ** 0.5)
|
||||||
|
image_resolution = vision_patch_size * grid_size
|
||||||
|
else:
|
||||||
|
counts: list = [len(set(k.split(".")[2] for k in state_dict if k.startswith(f"visual.layer{b}"))) for b in
|
||||||
|
[1, 2, 3, 4]]
|
||||||
|
vision_layers = tuple(counts)
|
||||||
|
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
|
||||||
|
output_width = round((state_dict["visual.attnpool.positional_embedding"].shape[0] - 1) ** 0.5)
|
||||||
|
vision_patch_size = None
|
||||||
|
assert output_width ** 2 + 1 == state_dict["visual.attnpool.positional_embedding"].shape[0]
|
||||||
|
image_resolution = output_width * 32
|
||||||
|
|
||||||
|
embed_dim = state_dict["text_projection"].shape[1]
|
||||||
|
context_length = state_dict["positional_embedding"].shape[0]
|
||||||
|
vocab_size = state_dict["token_embedding.weight"].shape[0]
|
||||||
|
transformer_width = state_dict["ln_final.weight"].shape[0]
|
||||||
|
transformer_heads = transformer_width // 64
|
||||||
|
transformer_layers = len(set(k.split(".")[2] for k in state_dict if k.startswith(f"transformer.resblocks")))
|
||||||
|
|
||||||
|
model = CLIP(
|
||||||
|
embed_dim,
|
||||||
|
image_resolution, vision_layers, vision_width, vision_patch_size,
|
||||||
|
context_length, vocab_size, transformer_width, transformer_heads, transformer_layers, design_details
|
||||||
|
)
|
||||||
|
|
||||||
|
for key in ["input_resolution", "context_length", "vocab_size"]:
|
||||||
|
if key in state_dict:
|
||||||
|
del state_dict[key]
|
||||||
|
|
||||||
|
convert_weights(model)
|
||||||
|
try:
|
||||||
|
model.load_state_dict(state_dict)
|
||||||
|
except:
|
||||||
|
missing_keys, _ = model.load_state_dict(state_dict, strict=False)
|
||||||
|
print('Weights not found for some missing keys: ', missing_keys)
|
||||||
|
return model.eval()
|
||||||
132
clip/simple_tokenizer.py
Normal file
132
clip/simple_tokenizer.py
Normal file
@@ -0,0 +1,132 @@
|
|||||||
|
import gzip
|
||||||
|
import html
|
||||||
|
import os
|
||||||
|
from functools import lru_cache
|
||||||
|
|
||||||
|
import ftfy
|
||||||
|
import regex as re
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache()
|
||||||
|
def default_bpe():
|
||||||
|
return os.path.join(os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz")
|
||||||
|
|
||||||
|
|
||||||
|
@lru_cache()
|
||||||
|
def bytes_to_unicode():
|
||||||
|
"""
|
||||||
|
Returns list of utf-8 byte and a corresponding list of unicode strings.
|
||||||
|
The reversible bpe codes work on unicode strings.
|
||||||
|
This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
|
||||||
|
When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
|
||||||
|
This is a signficant percentage of your normal, say, 32K bpe vocab.
|
||||||
|
To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
|
||||||
|
And avoids mapping to whitespace/control characters the bpe code barfs on.
|
||||||
|
"""
|
||||||
|
bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
|
||||||
|
cs = bs[:]
|
||||||
|
n = 0
|
||||||
|
for b in range(2**8):
|
||||||
|
if b not in bs:
|
||||||
|
bs.append(b)
|
||||||
|
cs.append(2**8+n)
|
||||||
|
n += 1
|
||||||
|
cs = [chr(n) for n in cs]
|
||||||
|
return dict(zip(bs, cs))
|
||||||
|
|
||||||
|
|
||||||
|
def get_pairs(word):
|
||||||
|
"""Return set of symbol pairs in a word.
|
||||||
|
Word is represented as tuple of symbols (symbols being variable-length strings).
|
||||||
|
"""
|
||||||
|
pairs = set()
|
||||||
|
prev_char = word[0]
|
||||||
|
for char in word[1:]:
|
||||||
|
pairs.add((prev_char, char))
|
||||||
|
prev_char = char
|
||||||
|
return pairs
|
||||||
|
|
||||||
|
|
||||||
|
def basic_clean(text):
|
||||||
|
text = ftfy.fix_text(text)
|
||||||
|
text = html.unescape(html.unescape(text))
|
||||||
|
return text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
def whitespace_clean(text):
|
||||||
|
text = re.sub(r'\s+', ' ', text)
|
||||||
|
text = text.strip()
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
class SimpleTokenizer(object):
|
||||||
|
def __init__(self, bpe_path: str = default_bpe()):
|
||||||
|
self.byte_encoder = bytes_to_unicode()
|
||||||
|
self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
|
||||||
|
merges = gzip.open(bpe_path).read().decode("utf-8").split('\n')
|
||||||
|
merges = merges[1:49152-256-2+1]
|
||||||
|
merges = [tuple(merge.split()) for merge in merges]
|
||||||
|
vocab = list(bytes_to_unicode().values())
|
||||||
|
vocab = vocab + [v+'</w>' for v in vocab]
|
||||||
|
for merge in merges:
|
||||||
|
vocab.append(''.join(merge))
|
||||||
|
vocab.extend(['<|startoftext|>', '<|endoftext|>'])
|
||||||
|
self.encoder = dict(zip(vocab, range(len(vocab))))
|
||||||
|
self.decoder = {v: k for k, v in self.encoder.items()}
|
||||||
|
self.bpe_ranks = dict(zip(merges, range(len(merges))))
|
||||||
|
self.cache = {'<|startoftext|>': '<|startoftext|>', '<|endoftext|>': '<|endoftext|>'}
|
||||||
|
self.pat = re.compile(r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""", re.IGNORECASE)
|
||||||
|
|
||||||
|
def bpe(self, token):
|
||||||
|
if token in self.cache:
|
||||||
|
return self.cache[token]
|
||||||
|
word = tuple(token[:-1]) + ( token[-1] + '</w>',)
|
||||||
|
pairs = get_pairs(word)
|
||||||
|
|
||||||
|
if not pairs:
|
||||||
|
return token+'</w>'
|
||||||
|
|
||||||
|
while True:
|
||||||
|
bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))
|
||||||
|
if bigram not in self.bpe_ranks:
|
||||||
|
break
|
||||||
|
first, second = bigram
|
||||||
|
new_word = []
|
||||||
|
i = 0
|
||||||
|
while i < len(word):
|
||||||
|
try:
|
||||||
|
j = word.index(first, i)
|
||||||
|
new_word.extend(word[i:j])
|
||||||
|
i = j
|
||||||
|
except:
|
||||||
|
new_word.extend(word[i:])
|
||||||
|
break
|
||||||
|
|
||||||
|
if word[i] == first and i < len(word)-1 and word[i+1] == second:
|
||||||
|
new_word.append(first+second)
|
||||||
|
i += 2
|
||||||
|
else:
|
||||||
|
new_word.append(word[i])
|
||||||
|
i += 1
|
||||||
|
new_word = tuple(new_word)
|
||||||
|
word = new_word
|
||||||
|
if len(word) == 1:
|
||||||
|
break
|
||||||
|
else:
|
||||||
|
pairs = get_pairs(word)
|
||||||
|
word = ' '.join(word)
|
||||||
|
self.cache[token] = word
|
||||||
|
return word
|
||||||
|
|
||||||
|
def encode(self, text):
|
||||||
|
bpe_tokens = []
|
||||||
|
text = whitespace_clean(basic_clean(text)).lower()
|
||||||
|
for token in re.findall(self.pat, text):
|
||||||
|
token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))
|
||||||
|
bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
|
||||||
|
return bpe_tokens
|
||||||
|
|
||||||
|
def decode(self, tokens):
|
||||||
|
text = ''.join([self.decoder[token] for token in tokens])
|
||||||
|
text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors="replace").replace('</w>', ' ')
|
||||||
|
return text
|
||||||
49409
clip_words.csv
Normal file
49409
clip_words.csv
Normal file
File diff suppressed because it is too large
Load Diff
2
configs/datasets/caltech101.yaml
Normal file
2
configs/datasets/caltech101.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "Caltech101"
|
||||||
2
configs/datasets/dtd.yaml
Normal file
2
configs/datasets/dtd.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "DescribableTextures"
|
||||||
2
configs/datasets/eurosat.yaml
Normal file
2
configs/datasets/eurosat.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "EuroSAT"
|
||||||
2
configs/datasets/fgvc_aircraft.yaml
Normal file
2
configs/datasets/fgvc_aircraft.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "FGVCAircraft"
|
||||||
2
configs/datasets/food101.yaml
Normal file
2
configs/datasets/food101.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "Food101"
|
||||||
2
configs/datasets/imagenet.yaml
Normal file
2
configs/datasets/imagenet.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "ImageNet"
|
||||||
2
configs/datasets/imagenet_a.yaml
Normal file
2
configs/datasets/imagenet_a.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "ImageNetA"
|
||||||
2
configs/datasets/imagenet_r.yaml
Normal file
2
configs/datasets/imagenet_r.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "ImageNetR"
|
||||||
2
configs/datasets/imagenet_sketch.yaml
Normal file
2
configs/datasets/imagenet_sketch.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "ImageNetSketch"
|
||||||
2
configs/datasets/imagenetv2.yaml
Normal file
2
configs/datasets/imagenetv2.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "ImageNetV2"
|
||||||
2
configs/datasets/oxford_flowers.yaml
Normal file
2
configs/datasets/oxford_flowers.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "OxfordFlowers"
|
||||||
2
configs/datasets/oxford_pets.yaml
Normal file
2
configs/datasets/oxford_pets.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "OxfordPets"
|
||||||
2
configs/datasets/stanford_cars.yaml
Normal file
2
configs/datasets/stanford_cars.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "StanfordCars"
|
||||||
2
configs/datasets/sun397.yaml
Normal file
2
configs/datasets/sun397.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "SUN397"
|
||||||
2
configs/datasets/ucf101.yaml
Normal file
2
configs/datasets/ucf101.yaml
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
DATASET:
|
||||||
|
NAME: "UCF101"
|
||||||
35
configs/trainers/CoCoOp/vit_b16_c16_ep10_batch1.yaml
Normal file
35
configs/trainers/CoCoOp/vit_b16_c16_ep10_batch1.yaml
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 1
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 10
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
COCOOP:
|
||||||
|
N_CTX: 16
|
||||||
|
CTX_INIT: ""
|
||||||
|
PREC: "fp16"
|
||||||
35
configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1.yaml
Normal file
35
configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1.yaml
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 1
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 10
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
COCOOP:
|
||||||
|
N_CTX: 4
|
||||||
|
CTX_INIT: ""
|
||||||
|
PREC: "fp16"
|
||||||
35
configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1.yaml
Normal file
35
configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1.yaml
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 1
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 10
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
COCOOP:
|
||||||
|
N_CTX: 4
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
35
configs/trainers/CoCoOp/vit_b16_c8_ep10_batch1.yaml
Normal file
35
configs/trainers/CoCoOp/vit_b16_c8_ep10_batch1.yaml
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 1
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 10
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
COCOOP:
|
||||||
|
N_CTX: 8
|
||||||
|
CTX_INIT: ""
|
||||||
|
PREC: "fp16"
|
||||||
29
configs/trainers/CoOp/rn101.yaml
Normal file
29
configs/trainers/CoOp/rn101.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 200
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN101"
|
||||||
29
configs/trainers/CoOp/rn101_ep50.yaml
Normal file
29
configs/trainers/CoOp/rn101_ep50.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 50
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN101"
|
||||||
29
configs/trainers/CoOp/rn50.yaml
Normal file
29
configs/trainers/CoOp/rn50.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 200
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN50"
|
||||||
33
configs/trainers/CoOp/rn50_ctxv1.yaml
Normal file
33
configs/trainers/CoOp/rn50_ctxv1.yaml
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 200
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN50"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
COOP:
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
29
configs/trainers/CoOp/rn50_ep100.yaml
Normal file
29
configs/trainers/CoOp/rn50_ep100.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 100
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN50"
|
||||||
29
configs/trainers/CoOp/rn50_ep50.yaml
Normal file
29
configs/trainers/CoOp/rn50_ep50.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 50
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN50"
|
||||||
33
configs/trainers/CoOp/rn50_ep50_ctxv1.yaml
Normal file
33
configs/trainers/CoOp/rn50_ep50_ctxv1.yaml
Normal file
@@ -0,0 +1,33 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 50
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN50"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
COOP:
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
17
configs/trainers/CoOp/rn50_val.yaml
Normal file
17
configs/trainers/CoOp/rn50_val.yaml
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 200
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 200
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "RN50"
|
||||||
29
configs/trainers/CoOp/vit_b16.yaml
Normal file
29
configs/trainers/CoOp/vit_b16.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 200
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
29
configs/trainers/CoOp/vit_b16_ep100.yaml
Normal file
29
configs/trainers/CoOp/vit_b16_ep100.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 100
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
29
configs/trainers/CoOp/vit_b16_ep50.yaml
Normal file
29
configs/trainers/CoOp/vit_b16_ep50.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 50
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
29
configs/trainers/CoOp/vit_b32.yaml
Normal file
29
configs/trainers/CoOp/vit_b32.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 200
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/32"
|
||||||
29
configs/trainers/CoOp/vit_b32_ep50.yaml
Normal file
29
configs/trainers/CoOp/vit_b32_ep50.yaml
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 32
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.002
|
||||||
|
MAX_EPOCH: 50
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 5
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/32"
|
||||||
39
configs/trainers/IVLP/vit_b16_c2_ep20_batch4_4+4ctx.yaml
Normal file
39
configs/trainers/IVLP/vit_b16_c2_ep20_batch4_4+4ctx.yaml
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
# Independent Vision Language Prompting
|
||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 4
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.0025
|
||||||
|
MAX_EPOCH: 20
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
IVLP:
|
||||||
|
N_CTX_VISION: 4
|
||||||
|
N_CTX_TEXT: 4
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
|
PROMPT_DEPTH_VISION: 9
|
||||||
|
PROMPT_DEPTH_TEXT: 9
|
||||||
36
configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml
Normal file
36
configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 4
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.0035
|
||||||
|
MAX_EPOCH: 2
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
MAPLE:
|
||||||
|
N_CTX: 2
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
|
PROMPT_DEPTH: 9
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 4
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.0026
|
||||||
|
MAX_EPOCH: 2
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
MAPLE:
|
||||||
|
N_CTX: 2
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
|
PROMPT_DEPTH: 3
|
||||||
@@ -0,0 +1,43 @@
|
|||||||
|
# PromptSRC: Prompting with Self-regularizing constraints
|
||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 4
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.0025
|
||||||
|
MAX_EPOCH: 20
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
PROMPTSRC:
|
||||||
|
N_CTX_VISION: 4
|
||||||
|
N_CTX_TEXT: 4
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
|
PROMPT_DEPTH_VISION: 9
|
||||||
|
PROMPT_DEPTH_TEXT: 9
|
||||||
|
TEXT_LOSS_WEIGHT: 25
|
||||||
|
IMAGE_LOSS_WEIGHT: 10
|
||||||
|
GPA_MEAN: 15
|
||||||
|
GPA_STD: 1
|
||||||
@@ -0,0 +1,43 @@
|
|||||||
|
# PromptSRC: Prompting with Self-regularizing constraints
|
||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 4
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.0025
|
||||||
|
MAX_EPOCH: 20
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
PROMPTSRC:
|
||||||
|
N_CTX_VISION: 4
|
||||||
|
N_CTX_TEXT: 4
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
|
PROMPT_DEPTH_VISION: 3
|
||||||
|
PROMPT_DEPTH_TEXT: 3
|
||||||
|
TEXT_LOSS_WEIGHT: 25
|
||||||
|
IMAGE_LOSS_WEIGHT: 10
|
||||||
|
GPA_MEAN: 6
|
||||||
|
GPA_STD: 10
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
# PromptSRC: Prompting with Self-regularizing constraints
|
||||||
|
DATALOADER:
|
||||||
|
TRAIN_X:
|
||||||
|
BATCH_SIZE: 4
|
||||||
|
TEST:
|
||||||
|
BATCH_SIZE: 100
|
||||||
|
NUM_WORKERS: 8
|
||||||
|
|
||||||
|
INPUT:
|
||||||
|
SIZE: (224, 224)
|
||||||
|
INTERPOLATION: "bicubic"
|
||||||
|
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
|
||||||
|
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
|
||||||
|
TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
|
||||||
|
|
||||||
|
OPTIM:
|
||||||
|
NAME: "sgd"
|
||||||
|
LR: 0.0025
|
||||||
|
MAX_EPOCH: 50
|
||||||
|
LR_SCHEDULER: "cosine"
|
||||||
|
WARMUP_EPOCH: 1
|
||||||
|
WARMUP_TYPE: "constant"
|
||||||
|
WARMUP_CONS_LR: 1e-5
|
||||||
|
|
||||||
|
TRAIN:
|
||||||
|
PRINT_FREQ: 20
|
||||||
|
|
||||||
|
MODEL:
|
||||||
|
BACKBONE:
|
||||||
|
NAME: "ViT-B/16"
|
||||||
|
|
||||||
|
TRAINER:
|
||||||
|
PROMPTSRC:
|
||||||
|
N_CTX_VISION: 4
|
||||||
|
N_CTX_TEXT: 4
|
||||||
|
CTX_INIT: "a photo of a"
|
||||||
|
PREC: "fp16"
|
||||||
|
PROMPT_DEPTH_VISION: 9
|
||||||
|
PROMPT_DEPTH_TEXT: 9
|
||||||
|
TEXT_LOSS_WEIGHT: 25
|
||||||
|
IMAGE_LOSS_WEIGHT: 10
|
||||||
|
# Use the below configuration for: ImageNet, Caltech101, OxfordPets, Food101, UCF101 and SUN397
|
||||||
|
GPA_MEAN: 30
|
||||||
|
GPA_STD: 30
|
||||||
|
# Use the below configuration for: StanfordCars, Flowers102, FGVCAircraft, DTD and EuroSAT
|
||||||
|
# GPA_MEAN: 45
|
||||||
|
# GPA_STD: 5
|
||||||
0
datasets/__init__.py
Normal file
0
datasets/__init__.py
Normal file
BIN
datasets/__pycache__/__init__.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/__init__.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/caltech101.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/caltech101.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/dtd.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/dtd.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/eurosat.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/eurosat.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/fgvc_aircraft.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/fgvc_aircraft.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/food101.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/food101.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/imagenet.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/imagenet.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/imagenet_a.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/imagenet_a.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/imagenet_r.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/imagenet_r.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/imagenet_sketch.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/imagenet_sketch.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/imagenetv2.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/imagenetv2.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/oxford_flowers.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/oxford_flowers.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/oxford_pets.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/oxford_pets.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/stanford_cars.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/stanford_cars.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/sun397.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/sun397.cpython-37.pyc
Normal file
Binary file not shown.
BIN
datasets/__pycache__/ucf101.cpython-37.pyc
Normal file
BIN
datasets/__pycache__/ucf101.cpython-37.pyc
Normal file
Binary file not shown.
59
datasets/caltech101.py
Normal file
59
datasets/caltech101.py
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
from .dtd import DescribableTextures as DTD
|
||||||
|
|
||||||
|
IGNORED = ["BACKGROUND_Google", "Faces_easy"]
|
||||||
|
NEW_CNAMES = {
|
||||||
|
"airplanes": "airplane",
|
||||||
|
"Faces": "face",
|
||||||
|
"Leopards": "leopard",
|
||||||
|
"Motorbikes": "motorbike",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class Caltech101(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "caltech-101"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "101_ObjectCategories")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_Caltech101.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
train, val, test = DTD.read_and_split_data(self.image_dir, ignored=IGNORED, new_cnames=NEW_CNAMES)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
95
datasets/dtd.py
Normal file
95
datasets/dtd.py
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
import random
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import listdir_nohidden, mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class DescribableTextures(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "dtd"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "images")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_DescribableTextures.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
train, val, test = self.read_and_split_data(self.image_dir)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def read_and_split_data(image_dir, p_trn=0.5, p_val=0.2, ignored=[], new_cnames=None):
|
||||||
|
# The data are supposed to be organized into the following structure
|
||||||
|
# =============
|
||||||
|
# images/
|
||||||
|
# dog/
|
||||||
|
# cat/
|
||||||
|
# horse/
|
||||||
|
# =============
|
||||||
|
categories = listdir_nohidden(image_dir)
|
||||||
|
categories = [c for c in categories if c not in ignored]
|
||||||
|
categories.sort()
|
||||||
|
|
||||||
|
p_tst = 1 - p_trn - p_val
|
||||||
|
print(f"Splitting into {p_trn:.0%} train, {p_val:.0%} val, and {p_tst:.0%} test")
|
||||||
|
|
||||||
|
def _collate(ims, y, c):
|
||||||
|
items = []
|
||||||
|
for im in ims:
|
||||||
|
item = Datum(impath=im, label=y, classname=c) # is already 0-based
|
||||||
|
items.append(item)
|
||||||
|
return items
|
||||||
|
|
||||||
|
train, val, test = [], [], []
|
||||||
|
for label, category in enumerate(categories):
|
||||||
|
category_dir = os.path.join(image_dir, category)
|
||||||
|
images = listdir_nohidden(category_dir)
|
||||||
|
images = [os.path.join(category_dir, im) for im in images]
|
||||||
|
random.shuffle(images)
|
||||||
|
n_total = len(images)
|
||||||
|
n_train = round(n_total * p_trn)
|
||||||
|
n_val = round(n_total * p_val)
|
||||||
|
n_test = n_total - n_train - n_val
|
||||||
|
assert n_train > 0 and n_val > 0 and n_test > 0
|
||||||
|
|
||||||
|
if new_cnames is not None and category in new_cnames:
|
||||||
|
category = new_cnames[category]
|
||||||
|
|
||||||
|
train.extend(_collate(images[:n_train], label, category))
|
||||||
|
val.extend(_collate(images[n_train : n_train + n_val], label, category))
|
||||||
|
test.extend(_collate(images[n_train + n_val :], label, category))
|
||||||
|
|
||||||
|
return train, val, test
|
||||||
73
datasets/eurosat.py
Normal file
73
datasets/eurosat.py
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
from .dtd import DescribableTextures as DTD
|
||||||
|
|
||||||
|
NEW_CNAMES = {
|
||||||
|
"AnnualCrop": "Annual Crop Land",
|
||||||
|
"Forest": "Forest",
|
||||||
|
"HerbaceousVegetation": "Herbaceous Vegetation Land",
|
||||||
|
"Highway": "Highway or Road",
|
||||||
|
"Industrial": "Industrial Buildings",
|
||||||
|
"Pasture": "Pasture Land",
|
||||||
|
"PermanentCrop": "Permanent Crop Land",
|
||||||
|
"Residential": "Residential Buildings",
|
||||||
|
"River": "River",
|
||||||
|
"SeaLake": "Sea or Lake",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class EuroSAT(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "eurosat"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "2750")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_EuroSAT.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
train, val, test = DTD.read_and_split_data(self.image_dir, new_cnames=NEW_CNAMES)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def update_classname(self, dataset_old):
|
||||||
|
dataset_new = []
|
||||||
|
for item_old in dataset_old:
|
||||||
|
cname_old = item_old.classname
|
||||||
|
cname_new = NEW_CLASSNAMES[cname_old]
|
||||||
|
item_new = Datum(impath=item_old.impath, label=item_old.label, classname=cname_new)
|
||||||
|
dataset_new.append(item_new)
|
||||||
|
return dataset_new
|
||||||
71
datasets/fgvc_aircraft.py
Normal file
71
datasets/fgvc_aircraft.py
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class FGVCAircraft(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "fgvc_aircraft"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "images")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
classnames = []
|
||||||
|
with open(os.path.join(self.dataset_dir, "variants.txt"), "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
classnames.append(line.strip())
|
||||||
|
cname2lab = {c: i for i, c in enumerate(classnames)}
|
||||||
|
|
||||||
|
train = self.read_data(cname2lab, "images_variant_train.txt")
|
||||||
|
val = self.read_data(cname2lab, "images_variant_val.txt")
|
||||||
|
test = self.read_data(cname2lab, "images_variant_test.txt")
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def read_data(self, cname2lab, split_file):
|
||||||
|
filepath = os.path.join(self.dataset_dir, split_file)
|
||||||
|
items = []
|
||||||
|
|
||||||
|
with open(filepath, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip().split(" ")
|
||||||
|
imname = line[0] + ".jpg"
|
||||||
|
classname = " ".join(line[1:])
|
||||||
|
impath = os.path.join(self.image_dir, imname)
|
||||||
|
label = cname2lab[classname]
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
51
datasets/food101.py
Normal file
51
datasets/food101.py
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
from .dtd import DescribableTextures as DTD
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class Food101(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "food-101"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "images")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_Food101.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
train, val, test = DTD.read_and_split_data(self.image_dir)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
91
datasets/imagenet.py
Normal file
91
datasets/imagenet.py
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
from collections import OrderedDict
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import listdir_nohidden, mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class ImageNet(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "imagenet"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "images")
|
||||||
|
self.preprocessed = os.path.join(self.dataset_dir, "preprocessed.pkl")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.preprocessed):
|
||||||
|
with open(self.preprocessed, "rb") as f:
|
||||||
|
preprocessed = pickle.load(f)
|
||||||
|
train = preprocessed["train"]
|
||||||
|
test = preprocessed["test"]
|
||||||
|
else:
|
||||||
|
text_file = os.path.join(self.dataset_dir, "classnames.txt")
|
||||||
|
classnames = self.read_classnames(text_file)
|
||||||
|
train = self.read_data(classnames, "train")
|
||||||
|
# Follow standard practice to perform evaluation on the val set
|
||||||
|
# Also used as the val set (so evaluate the last-step model)
|
||||||
|
test = self.read_data(classnames, "val")
|
||||||
|
|
||||||
|
preprocessed = {"train": train, "test": test}
|
||||||
|
with open(self.preprocessed, "wb") as f:
|
||||||
|
pickle.dump(preprocessed, f, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train = data["train"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
data = {"train": train}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, test = OxfordPets.subsample_classes(train, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=test, test=test)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def read_classnames(text_file):
|
||||||
|
"""Return a dictionary containing
|
||||||
|
key-value pairs of <folder name>: <class name>.
|
||||||
|
"""
|
||||||
|
classnames = OrderedDict()
|
||||||
|
with open(text_file, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip().split(" ")
|
||||||
|
folder = line[0]
|
||||||
|
classname = " ".join(line[1:])
|
||||||
|
classnames[folder] = classname
|
||||||
|
return classnames
|
||||||
|
|
||||||
|
def read_data(self, classnames, split_dir):
|
||||||
|
split_dir = os.path.join(self.image_dir, split_dir)
|
||||||
|
folders = sorted(f.name for f in os.scandir(split_dir) if f.is_dir())
|
||||||
|
items = []
|
||||||
|
|
||||||
|
for label, folder in enumerate(folders):
|
||||||
|
imnames = listdir_nohidden(os.path.join(split_dir, folder))
|
||||||
|
classname = classnames[folder]
|
||||||
|
for imname in imnames:
|
||||||
|
impath = os.path.join(split_dir, folder, imname)
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
46
datasets/imagenet_a.py
Normal file
46
datasets/imagenet_a.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import listdir_nohidden
|
||||||
|
|
||||||
|
from .imagenet import ImageNet
|
||||||
|
|
||||||
|
TO_BE_IGNORED = ["README.txt"]
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class ImageNetA(DatasetBase):
|
||||||
|
"""ImageNet-A(dversarial).
|
||||||
|
|
||||||
|
This dataset is used for testing only.
|
||||||
|
"""
|
||||||
|
|
||||||
|
dataset_dir = "imagenet-adversarial"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "imagenet-a")
|
||||||
|
|
||||||
|
text_file = os.path.join(self.dataset_dir, "classnames.txt")
|
||||||
|
classnames = ImageNet.read_classnames(text_file)
|
||||||
|
|
||||||
|
data = self.read_data(classnames)
|
||||||
|
|
||||||
|
super().__init__(train_x=data, test=data)
|
||||||
|
|
||||||
|
def read_data(self, classnames):
|
||||||
|
image_dir = self.image_dir
|
||||||
|
folders = listdir_nohidden(image_dir, sort=True)
|
||||||
|
folders = [f for f in folders if f not in TO_BE_IGNORED]
|
||||||
|
items = []
|
||||||
|
|
||||||
|
for label, folder in enumerate(folders):
|
||||||
|
imnames = listdir_nohidden(os.path.join(image_dir, folder))
|
||||||
|
classname = classnames[folder]
|
||||||
|
for imname in imnames:
|
||||||
|
impath = os.path.join(image_dir, folder, imname)
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
46
datasets/imagenet_r.py
Normal file
46
datasets/imagenet_r.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import listdir_nohidden
|
||||||
|
|
||||||
|
from .imagenet import ImageNet
|
||||||
|
|
||||||
|
TO_BE_IGNORED = ["README.txt"]
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class ImageNetR(DatasetBase):
|
||||||
|
"""ImageNet-R(endition).
|
||||||
|
|
||||||
|
This dataset is used for testing only.
|
||||||
|
"""
|
||||||
|
|
||||||
|
dataset_dir = "imagenet-rendition"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "imagenet-r")
|
||||||
|
|
||||||
|
text_file = os.path.join(self.dataset_dir, "classnames.txt")
|
||||||
|
classnames = ImageNet.read_classnames(text_file)
|
||||||
|
|
||||||
|
data = self.read_data(classnames)
|
||||||
|
|
||||||
|
super().__init__(train_x=data, test=data)
|
||||||
|
|
||||||
|
def read_data(self, classnames):
|
||||||
|
image_dir = self.image_dir
|
||||||
|
folders = listdir_nohidden(image_dir, sort=True)
|
||||||
|
folders = [f for f in folders if f not in TO_BE_IGNORED]
|
||||||
|
items = []
|
||||||
|
|
||||||
|
for label, folder in enumerate(folders):
|
||||||
|
imnames = listdir_nohidden(os.path.join(image_dir, folder))
|
||||||
|
classname = classnames[folder]
|
||||||
|
for imname in imnames:
|
||||||
|
impath = os.path.join(image_dir, folder, imname)
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
43
datasets/imagenet_sketch.py
Normal file
43
datasets/imagenet_sketch.py
Normal file
@@ -0,0 +1,43 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import listdir_nohidden
|
||||||
|
|
||||||
|
from .imagenet import ImageNet
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class ImageNetSketch(DatasetBase):
|
||||||
|
"""ImageNet-Sketch.
|
||||||
|
|
||||||
|
This dataset is used for testing only.
|
||||||
|
"""
|
||||||
|
|
||||||
|
dataset_dir = "imagenet-sketch"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "images")
|
||||||
|
|
||||||
|
text_file = os.path.join(self.dataset_dir, "classnames.txt")
|
||||||
|
classnames = ImageNet.read_classnames(text_file)
|
||||||
|
|
||||||
|
data = self.read_data(classnames)
|
||||||
|
|
||||||
|
super().__init__(train_x=data, test=data)
|
||||||
|
|
||||||
|
def read_data(self, classnames):
|
||||||
|
image_dir = self.image_dir
|
||||||
|
folders = listdir_nohidden(image_dir, sort=True)
|
||||||
|
items = []
|
||||||
|
|
||||||
|
for label, folder in enumerate(folders):
|
||||||
|
imnames = listdir_nohidden(os.path.join(image_dir, folder))
|
||||||
|
classname = classnames[folder]
|
||||||
|
for imname in imnames:
|
||||||
|
impath = os.path.join(image_dir, folder, imname)
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
46
datasets/imagenetv2.py
Normal file
46
datasets/imagenetv2.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import listdir_nohidden
|
||||||
|
|
||||||
|
from .imagenet import ImageNet
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class ImageNetV2(DatasetBase):
|
||||||
|
"""ImageNetV2.
|
||||||
|
|
||||||
|
This dataset is used for testing only.
|
||||||
|
"""
|
||||||
|
|
||||||
|
dataset_dir = "imagenetv2"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
image_dir = "imagenetv2-matched-frequency-format-val"
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, image_dir)
|
||||||
|
|
||||||
|
text_file = os.path.join(self.dataset_dir, "classnames.txt")
|
||||||
|
classnames = ImageNet.read_classnames(text_file)
|
||||||
|
|
||||||
|
data = self.read_data(classnames)
|
||||||
|
|
||||||
|
super().__init__(train_x=data, test=data)
|
||||||
|
|
||||||
|
def read_data(self, classnames):
|
||||||
|
image_dir = self.image_dir
|
||||||
|
folders = list(classnames.keys())
|
||||||
|
items = []
|
||||||
|
|
||||||
|
for label in range(1000):
|
||||||
|
class_dir = os.path.join(image_dir, str(label))
|
||||||
|
imnames = listdir_nohidden(class_dir)
|
||||||
|
folder = folders[label]
|
||||||
|
classname = classnames[folder]
|
||||||
|
for imname in imnames:
|
||||||
|
impath = os.path.join(class_dir, imname)
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
89
datasets/oxford_flowers.py
Normal file
89
datasets/oxford_flowers.py
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
import random
|
||||||
|
from scipy.io import loadmat
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import read_json, mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class OxfordFlowers(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "oxford_flowers"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "jpg")
|
||||||
|
self.label_file = os.path.join(self.dataset_dir, "imagelabels.mat")
|
||||||
|
self.lab2cname_file = os.path.join(self.dataset_dir, "cat_to_name.json")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_OxfordFlowers.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
train, val, test = self.read_data()
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def read_data(self):
|
||||||
|
tracker = defaultdict(list)
|
||||||
|
label_file = loadmat(self.label_file)["labels"][0]
|
||||||
|
for i, label in enumerate(label_file):
|
||||||
|
imname = f"image_{str(i + 1).zfill(5)}.jpg"
|
||||||
|
impath = os.path.join(self.image_dir, imname)
|
||||||
|
label = int(label)
|
||||||
|
tracker[label].append(impath)
|
||||||
|
|
||||||
|
print("Splitting data into 50% train, 20% val, and 30% test")
|
||||||
|
|
||||||
|
def _collate(ims, y, c):
|
||||||
|
items = []
|
||||||
|
for im in ims:
|
||||||
|
item = Datum(impath=im, label=y - 1, classname=c) # convert to 0-based label
|
||||||
|
items.append(item)
|
||||||
|
return items
|
||||||
|
|
||||||
|
lab2cname = read_json(self.lab2cname_file)
|
||||||
|
train, val, test = [], [], []
|
||||||
|
for label, impaths in tracker.items():
|
||||||
|
random.shuffle(impaths)
|
||||||
|
n_total = len(impaths)
|
||||||
|
n_train = round(n_total * 0.5)
|
||||||
|
n_val = round(n_total * 0.2)
|
||||||
|
n_test = n_total - n_train - n_val
|
||||||
|
assert n_train > 0 and n_val > 0 and n_test > 0
|
||||||
|
cname = lab2cname[str(label)]
|
||||||
|
train.extend(_collate(impaths[:n_train], label, cname))
|
||||||
|
val.extend(_collate(impaths[n_train : n_train + n_val], label, cname))
|
||||||
|
test.extend(_collate(impaths[n_train + n_val :], label, cname))
|
||||||
|
|
||||||
|
return train, val, test
|
||||||
186
datasets/oxford_pets.py
Normal file
186
datasets/oxford_pets.py
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
import math
|
||||||
|
import random
|
||||||
|
from collections import defaultdict
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import read_json, write_json, mkdir_if_missing
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class OxfordPets(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "oxford_pets"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "images")
|
||||||
|
self.anno_dir = os.path.join(self.dataset_dir, "annotations")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_OxfordPets.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = self.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
trainval = self.read_data(split_file="trainval.txt")
|
||||||
|
test = self.read_data(split_file="test.txt")
|
||||||
|
train, val = self.split_trainval(trainval)
|
||||||
|
self.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = self.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def read_data(self, split_file):
|
||||||
|
filepath = os.path.join(self.anno_dir, split_file)
|
||||||
|
items = []
|
||||||
|
|
||||||
|
with open(filepath, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip()
|
||||||
|
imname, label, species, _ = line.split(" ")
|
||||||
|
breed = imname.split("_")[:-1]
|
||||||
|
breed = "_".join(breed)
|
||||||
|
breed = breed.lower()
|
||||||
|
imname += ".jpg"
|
||||||
|
impath = os.path.join(self.image_dir, imname)
|
||||||
|
label = int(label) - 1 # convert to 0-based index
|
||||||
|
item = Datum(impath=impath, label=label, classname=breed)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def split_trainval(trainval, p_val=0.2):
|
||||||
|
p_trn = 1 - p_val
|
||||||
|
print(f"Splitting trainval into {p_trn:.0%} train and {p_val:.0%} val")
|
||||||
|
tracker = defaultdict(list)
|
||||||
|
for idx, item in enumerate(trainval):
|
||||||
|
label = item.label
|
||||||
|
tracker[label].append(idx)
|
||||||
|
|
||||||
|
train, val = [], []
|
||||||
|
for label, idxs in tracker.items():
|
||||||
|
n_val = round(len(idxs) * p_val)
|
||||||
|
assert n_val > 0
|
||||||
|
random.shuffle(idxs)
|
||||||
|
for n, idx in enumerate(idxs):
|
||||||
|
item = trainval[idx]
|
||||||
|
if n < n_val:
|
||||||
|
val.append(item)
|
||||||
|
else:
|
||||||
|
train.append(item)
|
||||||
|
|
||||||
|
return train, val
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def save_split(train, val, test, filepath, path_prefix):
|
||||||
|
def _extract(items):
|
||||||
|
out = []
|
||||||
|
for item in items:
|
||||||
|
impath = item.impath
|
||||||
|
label = item.label
|
||||||
|
classname = item.classname
|
||||||
|
impath = impath.replace(path_prefix, "")
|
||||||
|
if impath.startswith("/"):
|
||||||
|
impath = impath[1:]
|
||||||
|
out.append((impath, label, classname))
|
||||||
|
return out
|
||||||
|
|
||||||
|
train = _extract(train)
|
||||||
|
val = _extract(val)
|
||||||
|
test = _extract(test)
|
||||||
|
|
||||||
|
split = {"train": train, "val": val, "test": test}
|
||||||
|
|
||||||
|
write_json(split, filepath)
|
||||||
|
print(f"Saved split to {filepath}")
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def read_split(filepath, path_prefix):
|
||||||
|
def _convert(items):
|
||||||
|
out = []
|
||||||
|
for impath, label, classname in items:
|
||||||
|
impath = os.path.join(path_prefix, impath)
|
||||||
|
item = Datum(impath=impath, label=int(label), classname=classname)
|
||||||
|
out.append(item)
|
||||||
|
return out
|
||||||
|
|
||||||
|
print(f"Reading split from {filepath}")
|
||||||
|
split = read_json(filepath)
|
||||||
|
train = _convert(split["train"])
|
||||||
|
val = _convert(split["val"])
|
||||||
|
test = _convert(split["test"])
|
||||||
|
|
||||||
|
return train, val, test
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def subsample_classes(*args, subsample="all"):
|
||||||
|
"""Divide classes into two groups. The first group
|
||||||
|
represents base classes while the second group represents
|
||||||
|
new classes.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
args: a list of datasets, e.g. train, val and test.
|
||||||
|
subsample (str): what classes to subsample.
|
||||||
|
"""
|
||||||
|
assert subsample in ["all", "base", "new"]
|
||||||
|
|
||||||
|
if subsample == "all":
|
||||||
|
return args
|
||||||
|
|
||||||
|
dataset = args[0]
|
||||||
|
labels = set()
|
||||||
|
for item in dataset:
|
||||||
|
labels.add(item.label)
|
||||||
|
labels = list(labels)
|
||||||
|
labels.sort()
|
||||||
|
n = len(labels)
|
||||||
|
# Divide classes into two halves
|
||||||
|
m = math.ceil(n / 2)
|
||||||
|
|
||||||
|
print(f"SUBSAMPLE {subsample.upper()} CLASSES!")
|
||||||
|
if subsample == "base":
|
||||||
|
selected = labels[:m] # take the first half
|
||||||
|
else:
|
||||||
|
selected = labels[m:] # take the second half
|
||||||
|
relabeler = {y: y_new for y_new, y in enumerate(selected)}
|
||||||
|
|
||||||
|
output = []
|
||||||
|
for dataset in args:
|
||||||
|
dataset_new = []
|
||||||
|
for item in dataset:
|
||||||
|
if item.label not in selected:
|
||||||
|
continue
|
||||||
|
item_new = Datum(
|
||||||
|
impath=item.impath,
|
||||||
|
label=relabeler[item.label],
|
||||||
|
classname=item.classname
|
||||||
|
)
|
||||||
|
dataset_new.append(item_new)
|
||||||
|
output.append(dataset_new)
|
||||||
|
|
||||||
|
return output
|
||||||
75
datasets/stanford_cars.py
Normal file
75
datasets/stanford_cars.py
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
from scipy.io import loadmat
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class StanfordCars(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "stanford_cars"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_StanfordCars.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.dataset_dir)
|
||||||
|
else:
|
||||||
|
trainval_file = os.path.join(self.dataset_dir, "devkit", "cars_train_annos.mat")
|
||||||
|
test_file = os.path.join(self.dataset_dir, "cars_test_annos_withlabels.mat")
|
||||||
|
meta_file = os.path.join(self.dataset_dir, "devkit", "cars_meta.mat")
|
||||||
|
trainval = self.read_data("cars_train", trainval_file, meta_file)
|
||||||
|
test = self.read_data("cars_test", test_file, meta_file)
|
||||||
|
train, val = OxfordPets.split_trainval(trainval)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.dataset_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def read_data(self, image_dir, anno_file, meta_file):
|
||||||
|
anno_file = loadmat(anno_file)["annotations"][0]
|
||||||
|
meta_file = loadmat(meta_file)["class_names"][0]
|
||||||
|
items = []
|
||||||
|
|
||||||
|
for i in range(len(anno_file)):
|
||||||
|
imname = anno_file[i]["fname"][0]
|
||||||
|
impath = os.path.join(self.dataset_dir, image_dir, imname)
|
||||||
|
label = anno_file[i]["class"][0, 0]
|
||||||
|
label = int(label) - 1 # convert to 0-based index
|
||||||
|
classname = meta_file[label][0]
|
||||||
|
names = classname.split(" ")
|
||||||
|
year = names.pop(-1)
|
||||||
|
names.insert(0, year)
|
||||||
|
classname = " ".join(names)
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
80
datasets/sun397.py
Normal file
80
datasets/sun397.py
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class SUN397(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "sun397"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "SUN397")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_SUN397.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
classnames = []
|
||||||
|
with open(os.path.join(self.dataset_dir, "ClassName.txt"), "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip()[1:] # remove /
|
||||||
|
classnames.append(line)
|
||||||
|
cname2lab = {c: i for i, c in enumerate(classnames)}
|
||||||
|
trainval = self.read_data(cname2lab, "Training_01.txt")
|
||||||
|
test = self.read_data(cname2lab, "Testing_01.txt")
|
||||||
|
train, val = OxfordPets.split_trainval(trainval)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def read_data(self, cname2lab, text_file):
|
||||||
|
text_file = os.path.join(self.dataset_dir, text_file)
|
||||||
|
items = []
|
||||||
|
|
||||||
|
with open(text_file, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
imname = line.strip()[1:] # remove /
|
||||||
|
classname = os.path.dirname(imname)
|
||||||
|
label = cname2lab[classname]
|
||||||
|
impath = os.path.join(self.image_dir, imname)
|
||||||
|
|
||||||
|
names = classname.split("/")[1:] # remove 1st letter
|
||||||
|
names = names[::-1] # put words like indoor/outdoor at first
|
||||||
|
classname = " ".join(names)
|
||||||
|
|
||||||
|
item = Datum(impath=impath, label=label, classname=classname)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
84
datasets/ucf101.py
Normal file
84
datasets/ucf101.py
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
import os
|
||||||
|
import pickle
|
||||||
|
import re
|
||||||
|
|
||||||
|
from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
|
||||||
|
from dassl.utils import mkdir_if_missing
|
||||||
|
|
||||||
|
from .oxford_pets import OxfordPets
|
||||||
|
|
||||||
|
|
||||||
|
@DATASET_REGISTRY.register()
|
||||||
|
class UCF101(DatasetBase):
|
||||||
|
|
||||||
|
dataset_dir = "ucf101"
|
||||||
|
|
||||||
|
def __init__(self, cfg):
|
||||||
|
root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
|
||||||
|
self.dataset_dir = os.path.join(root, self.dataset_dir)
|
||||||
|
self.image_dir = os.path.join(self.dataset_dir, "UCF-101-midframes")
|
||||||
|
self.split_path = os.path.join(self.dataset_dir, "split_zhou_UCF101.json")
|
||||||
|
self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
|
||||||
|
mkdir_if_missing(self.split_fewshot_dir)
|
||||||
|
|
||||||
|
if os.path.exists(self.split_path):
|
||||||
|
train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
|
||||||
|
else:
|
||||||
|
cname2lab = {}
|
||||||
|
filepath = os.path.join(self.dataset_dir, "ucfTrainTestlist/classInd.txt")
|
||||||
|
with open(filepath, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
label, classname = line.strip().split(" ")
|
||||||
|
label = int(label) - 1 # conver to 0-based index
|
||||||
|
cname2lab[classname] = label
|
||||||
|
|
||||||
|
trainval = self.read_data(cname2lab, "ucfTrainTestlist/trainlist01.txt")
|
||||||
|
test = self.read_data(cname2lab, "ucfTrainTestlist/testlist01.txt")
|
||||||
|
train, val = OxfordPets.split_trainval(trainval)
|
||||||
|
OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
|
||||||
|
|
||||||
|
num_shots = cfg.DATASET.NUM_SHOTS
|
||||||
|
if num_shots >= 1:
|
||||||
|
seed = cfg.SEED
|
||||||
|
preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
|
||||||
|
|
||||||
|
if os.path.exists(preprocessed):
|
||||||
|
print(f"Loading preprocessed few-shot data from {preprocessed}")
|
||||||
|
with open(preprocessed, "rb") as file:
|
||||||
|
data = pickle.load(file)
|
||||||
|
train, val = data["train"], data["val"]
|
||||||
|
else:
|
||||||
|
train = self.generate_fewshot_dataset(train, num_shots=num_shots)
|
||||||
|
val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
|
||||||
|
data = {"train": train, "val": val}
|
||||||
|
print(f"Saving preprocessed few-shot data to {preprocessed}")
|
||||||
|
with open(preprocessed, "wb") as file:
|
||||||
|
pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
|
||||||
|
|
||||||
|
subsample = cfg.DATASET.SUBSAMPLE_CLASSES
|
||||||
|
train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
|
||||||
|
|
||||||
|
super().__init__(train_x=train, val=val, test=test)
|
||||||
|
|
||||||
|
def read_data(self, cname2lab, text_file):
|
||||||
|
text_file = os.path.join(self.dataset_dir, text_file)
|
||||||
|
items = []
|
||||||
|
|
||||||
|
with open(text_file, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip().split(" ")[0] # trainlist: filename, label
|
||||||
|
action, filename = line.split("/")
|
||||||
|
label = cname2lab[action]
|
||||||
|
|
||||||
|
elements = re.findall("[A-Z][^A-Z]*", action)
|
||||||
|
renamed_action = "_".join(elements)
|
||||||
|
|
||||||
|
filename = filename.replace(".avi", ".jpg")
|
||||||
|
impath = os.path.join(self.image_dir, renamed_action, filename)
|
||||||
|
|
||||||
|
item = Datum(impath=impath, label=label, classname=renamed_action)
|
||||||
|
items.append(item)
|
||||||
|
|
||||||
|
return items
|
||||||
99
docs/Co-CoOp.md
Normal file
99
docs/Co-CoOp.md
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
# Conditional Prompt Learning for Vision-Language Models (Co-CoOp, CVPR'22)
|
||||||
|
[](https://arxiv.org/abs/2203.05557)
|
||||||
|
|
||||||
|
We provide the scripts in [scripts/cocoop](../scripts/cocoop) to reproduce Co-CoOp results (CVPR'22).
|
||||||
|
|
||||||
|
Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `PromptSRC/`.
|
||||||
|
|
||||||
|
## Generalization From Base to New Classes
|
||||||
|
|
||||||
|
This corresponds to the experiments in Section 4.1, i.e., Table 1.
|
||||||
|
|
||||||
|
You will need both `scripts/cocoop/base2new_train.sh` and `scripts/cocoop/base2new_test.sh`. The former trains a model on bash classes while the latter evaluates the trained model on new classes. Both scripts have two input arguments, i.e., `DATASET` and `SEED`.
|
||||||
|
|
||||||
|
`DATASET` takes as input a dataset name, like `imagenet` or `caltech101`. The valid names are the files' names in `CoOp/configs/datasets/`.
|
||||||
|
|
||||||
|
Below we provide an example on how to evaluate the model on ImageNet.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# seed=1
|
||||||
|
bash scripts/cocoop/base2new_train.sh imagenet 1
|
||||||
|
bash scripts/cocoop/base2new_test.sh imagenet 1
|
||||||
|
|
||||||
|
# seed=2
|
||||||
|
bash scripts/cocoop/base2new_train.sh imagenet 2
|
||||||
|
bash scripts/cocoop/base2new_test.sh imagenet 2
|
||||||
|
|
||||||
|
# seed=3
|
||||||
|
bash scripts/cocoop/base2new_train.sh imagenet 3
|
||||||
|
bash scripts/cocoop/base2new_test.sh imagenet 3
|
||||||
|
```
|
||||||
|
|
||||||
|
When the evaluation is done, you can use `parse_test_res.py` to automatically calculate the average results. For instance, after you finish the evaluation (including `base2new_train.sh` and `base2new_test.sh`) on ImageNet using the aforementioned commands, you would get
|
||||||
|
|
||||||
|
```
|
||||||
|
output
|
||||||
|
|–– base2new/
|
||||||
|
| |–– test_new/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– CoCoOp/
|
||||||
|
| | | | | |–– vit_b16_c4_ep10_batch1_ctxv1/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
| |–– train_base/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– CoCoOp/
|
||||||
|
| | | | | |–– vit_b16_c4_ep10_batch1_ctxv1/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, to get the average performance on the base classes, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1
|
||||||
|
```
|
||||||
|
|
||||||
|
To get the average performance on the new classes, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1 --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cross-Dataset Transfer
|
||||||
|
|
||||||
|
This corresponds to the experiments in Section 4.2, i.e., Table 2.
|
||||||
|
|
||||||
|
The relevant scripts are `scripts/cocoop/xd_train.sh` and `scripts/cocoop/xd_test.sh` where the `DATASET` variable is set to the default, namely `imagenet`. To train the model, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# seed=1
|
||||||
|
bash scripts/cocoop/xd_train.sh 1
|
||||||
|
|
||||||
|
# seed=2
|
||||||
|
bash scripts/cocoop/xd_train.sh 2
|
||||||
|
|
||||||
|
# seed=3
|
||||||
|
bash scripts/cocoop/xd_train.sh 3
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, you evaluate the model on other datasets, e.g.,
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for SEED in 1 2 3
|
||||||
|
do
|
||||||
|
bash scripts/cocoop/xd_test.sh caltech101 ${SEED}
|
||||||
|
bash scripts/cocoop/xd_test.sh oxford_pets ${SEED}
|
||||||
|
bash scripts/cocoop/xd_test.sh stanford_cars ${SEED}
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Domain Generalization
|
||||||
|
|
||||||
|
This corresponds to the experiments in Section 4.3, i.e., Table 3.
|
||||||
|
|
||||||
|
The steps are similar to those discussed in "Cross-Dataset Transfer" except you evaluate the model on the variants of ImageNet, i.e., `imagenetv2`, `imagenet_sketch`, `imagenet_a` and `imagenet_r`.
|
||||||
99
docs/CoOp.md
Normal file
99
docs/CoOp.md
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
# Conditional Prompt Learning for Vision-Language Models (Co-CoOp, CVPR'22)
|
||||||
|
[](https://arxiv.org/abs/2203.05557)
|
||||||
|
|
||||||
|
We provide the scripts in [scripts/cocoop](../scripts/cocoop) to reproduce Co-CoOp results (CVPR'22).
|
||||||
|
|
||||||
|
Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `PromptSRC/`.
|
||||||
|
|
||||||
|
## Generalization From Base to New Classes
|
||||||
|
|
||||||
|
This corresponds to the experiments in Section 4.1, i.e., Table 1.
|
||||||
|
|
||||||
|
You will need both `scripts/cocoop/base2new_train.sh` and `scripts/cocoop/base2new_test.sh`. The former trains a model on bash classes while the latter evaluates the trained model on new classes. Both scripts have two input arguments, i.e., `DATASET` and `SEED`.
|
||||||
|
|
||||||
|
`DATASET` takes as input a dataset name, like `imagenet` or `caltech101`. The valid names are the files' names in `CoOp/configs/datasets/`.
|
||||||
|
|
||||||
|
Below we provide an example on how to evaluate the model on ImageNet.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# seed=1
|
||||||
|
bash scripts/cocoop/base2new_train.sh imagenet 1
|
||||||
|
bash scripts/cocoop/base2new_test.sh imagenet 1
|
||||||
|
|
||||||
|
# seed=2
|
||||||
|
bash scripts/cocoop/base2new_train.sh imagenet 2
|
||||||
|
bash scripts/cocoop/base2new_test.sh imagenet 2
|
||||||
|
|
||||||
|
# seed=3
|
||||||
|
bash scripts/cocoop/base2new_train.sh imagenet 3
|
||||||
|
bash scripts/cocoop/base2new_test.sh imagenet 3
|
||||||
|
```
|
||||||
|
|
||||||
|
When the evaluation is done, you can use `parse_test_res.py` to automatically calculate the average results. For instance, after you finish the evaluation (including `base2new_train.sh` and `base2new_test.sh`) on ImageNet using the aforementioned commands, you would get
|
||||||
|
|
||||||
|
```
|
||||||
|
output
|
||||||
|
|–– base2new/
|
||||||
|
| |–– test_new/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– CoCoOp/
|
||||||
|
| | | | | |–– vit_b16_c4_ep10_batch1_ctxv1/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
| |–– train_base/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– CoCoOp/
|
||||||
|
| | | | | |–– vit_b16_c4_ep10_batch1_ctxv1/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, to get the average performance on the base classes, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1
|
||||||
|
```
|
||||||
|
|
||||||
|
To get the average performance on the new classes, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1 --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cross-Dataset Transfer
|
||||||
|
|
||||||
|
This corresponds to the experiments in Section 4.2, i.e., Table 2.
|
||||||
|
|
||||||
|
The relevant scripts are `scripts/cocoop/xd_train.sh` and `scripts/cocoop/xd_test.sh` where the `DATASET` variable is set to the default, namely `imagenet`. To train the model, run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# seed=1
|
||||||
|
bash scripts/cocoop/xd_train.sh 1
|
||||||
|
|
||||||
|
# seed=2
|
||||||
|
bash scripts/cocoop/xd_train.sh 2
|
||||||
|
|
||||||
|
# seed=3
|
||||||
|
bash scripts/cocoop/xd_train.sh 3
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, you evaluate the model on other datasets, e.g.,
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for SEED in 1 2 3
|
||||||
|
do
|
||||||
|
bash scripts/cocoop/xd_test.sh caltech101 ${SEED}
|
||||||
|
bash scripts/cocoop/xd_test.sh oxford_pets ${SEED}
|
||||||
|
bash scripts/cocoop/xd_test.sh stanford_cars ${SEED}
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
## Domain Generalization
|
||||||
|
|
||||||
|
This corresponds to the experiments in Section 4.3, i.e., Table 3.
|
||||||
|
|
||||||
|
The steps are similar to those discussed in "Cross-Dataset Transfer" except you evaluate the model on the variants of ImageNet, i.e., `imagenetv2`, `imagenet_sketch`, `imagenet_a` and `imagenet_r`.
|
||||||
233
docs/DATASETS.md
Normal file
233
docs/DATASETS.md
Normal file
@@ -0,0 +1,233 @@
|
|||||||
|
# How to install datasets
|
||||||
|
|
||||||
|
### Acknowledgement: This readme file for installing datasets has been borrowed directly from [MaPLe's](https://github.com/muzairkhattak/multimodal-prompt-learning) official repository.
|
||||||
|
|
||||||
|
We recommend putting all datasets under the same folder (say `$DATA`) to ease management and following the instructions below to organize datasets to avoid modifying the source code. The file structure should look like:
|
||||||
|
|
||||||
|
```
|
||||||
|
$DATA/
|
||||||
|
|–– imagenet/
|
||||||
|
|–– caltech-101/
|
||||||
|
|–– oxford_pets/
|
||||||
|
|–– stanford_cars/
|
||||||
|
```
|
||||||
|
|
||||||
|
If you have some datasets already installed somewhere else, you can create symbolic links in `$DATA/dataset_name` that point to the original data to avoid duplicate download.
|
||||||
|
|
||||||
|
Datasets list:
|
||||||
|
- [ImageNet](#imagenet)
|
||||||
|
- [Caltech101](#caltech101)
|
||||||
|
- [OxfordPets](#oxfordpets)
|
||||||
|
- [StanfordCars](#stanfordcars)
|
||||||
|
- [Flowers102](#flowers102)
|
||||||
|
- [Food101](#food101)
|
||||||
|
- [FGVCAircraft](#fgvcaircraft)
|
||||||
|
- [SUN397](#sun397)
|
||||||
|
- [DTD](#dtd)
|
||||||
|
- [EuroSAT](#eurosat)
|
||||||
|
- [UCF101](#ucf101)
|
||||||
|
- [ImageNetV2](#imagenetv2)
|
||||||
|
- [ImageNet-Sketch](#imagenet-sketch)
|
||||||
|
- [ImageNet-A](#imagenet-a)
|
||||||
|
- [ImageNet-R](#imagenet-r)
|
||||||
|
|
||||||
|
The instructions to prepare each dataset are detailed below. To ensure reproducibility and fair comparison for future work, we provide fixed train/val/test splits for all datasets except ImageNet where the validation set is used as test set. The fixed splits are either from the original datasets (if available) or created by us.
|
||||||
|
|
||||||
|
### ImageNet
|
||||||
|
- Create a folder named `imagenet/` under `$DATA`.
|
||||||
|
- Create `images/` under `imagenet/`.
|
||||||
|
- Download the dataset from the [official website](https://image-net.org/index.php) and extract the training and validation sets to `$DATA/imagenet/images`. The directory structure should look like
|
||||||
|
```
|
||||||
|
imagenet/
|
||||||
|
|–– images/
|
||||||
|
| |–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
|
||||||
|
| |–– val/
|
||||||
|
```
|
||||||
|
- If you had downloaded the ImageNet dataset before, you can create symbolic links to map the training and validation sets to `$DATA/imagenet/images`.
|
||||||
|
- Download the `classnames.txt` to `$DATA/imagenet/` from this [link](https://drive.google.com/file/d/1-61f_ol79pViBFDG_IDlUQSwoLcn2XXF/view?usp=sharing). The class names are copied from [CLIP](https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb).
|
||||||
|
|
||||||
|
### Caltech101
|
||||||
|
- Create a folder named `caltech-101/` under `$DATA`.
|
||||||
|
- Download `101_ObjectCategories.tar.gz` from http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz and extract the file under `$DATA/caltech-101`.
|
||||||
|
- Download `split_zhou_Caltech101.json` from this [link](https://drive.google.com/file/d/1hyarUivQE36mY6jSomru6Fjd-JzwcCzN/view?usp=sharing) and put it under `$DATA/caltech-101`.
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
caltech-101/
|
||||||
|
|–– 101_ObjectCategories/
|
||||||
|
|–– split_zhou_Caltech101.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### OxfordPets
|
||||||
|
- Create a folder named `oxford_pets/` under `$DATA`.
|
||||||
|
- Download the images from https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz.
|
||||||
|
- Download the annotations from https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz.
|
||||||
|
- Download `split_zhou_OxfordPets.json` from this [link](https://drive.google.com/file/d/1501r8Ber4nNKvmlFVQZ8SeUHTcdTTEqs/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
oxford_pets/
|
||||||
|
|–– images/
|
||||||
|
|–– annotations/
|
||||||
|
|–– split_zhou_OxfordPets.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### StanfordCars
|
||||||
|
- Create a folder named `stanford_cars/` under `$DATA`.
|
||||||
|
- Download the train images http://ai.stanford.edu/~jkrause/car196/cars_train.tgz.
|
||||||
|
- Download the test images http://ai.stanford.edu/~jkrause/car196/cars_test.tgz.
|
||||||
|
- Download the train labels https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.
|
||||||
|
- Download the test labels http://ai.stanford.edu/~jkrause/car196/cars_test_annos_withlabels.mat.
|
||||||
|
- Download `split_zhou_StanfordCars.json` from this [link](https://drive.google.com/file/d/1ObCFbaAgVu0I-k_Au-gIUcefirdAuizT/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
stanford_cars/
|
||||||
|
|–– cars_test\
|
||||||
|
|–– cars_test_annos_withlabels.mat
|
||||||
|
|–– cars_train\
|
||||||
|
|–– devkit\
|
||||||
|
|–– split_zhou_StanfordCars.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Flowers102
|
||||||
|
- Create a folder named `oxford_flowers/` under `$DATA`.
|
||||||
|
- Download the images and labels from https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz and https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat respectively.
|
||||||
|
- Download `cat_to_name.json` from [here](https://drive.google.com/file/d/1AkcxCXeK_RCGCEC_GvmWxjcjaNhu-at0/view?usp=sharing).
|
||||||
|
- Download `split_zhou_OxfordFlowers.json` from [here](https://drive.google.com/file/d/1Pp0sRXzZFZq15zVOzKjKBu4A9i01nozT/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
oxford_flowers/
|
||||||
|
|–– cat_to_name.json
|
||||||
|
|–– imagelabels.mat
|
||||||
|
|–– jpg/
|
||||||
|
|–– split_zhou_OxfordFlowers.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Food101
|
||||||
|
- Download the dataset from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract the file `food-101.tar.gz` under `$DATA`, resulting in a folder named `$DATA/food-101/`.
|
||||||
|
- Download `split_zhou_Food101.json` from [here](https://drive.google.com/file/d/1QK0tGi096I0Ba6kggatX1ee6dJFIcEJl/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
food-101/
|
||||||
|
|–– images/
|
||||||
|
|–– license_agreement.txt
|
||||||
|
|–– meta/
|
||||||
|
|–– README.txt
|
||||||
|
|–– split_zhou_Food101.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### FGVCAircraft
|
||||||
|
- Download the data from https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz.
|
||||||
|
- Extract `fgvc-aircraft-2013b.tar.gz` and keep only `data/`.
|
||||||
|
- Move `data/` to `$DATA` and rename the folder to `fgvc_aircraft/`.
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
fgvc_aircraft/
|
||||||
|
|–– images/
|
||||||
|
|–– ... # a bunch of .txt files
|
||||||
|
```
|
||||||
|
|
||||||
|
### SUN397
|
||||||
|
- Create a folder named `sun397/` under `$DATA`.
|
||||||
|
- Download the images http://vision.princeton.edu/projects/2010/SUN/SUN397.tar.gz.
|
||||||
|
- Download the partitions https://vision.princeton.edu/projects/2010/SUN/download/Partitions.zip.
|
||||||
|
- Extract these files under `$DATA/sun397/`.
|
||||||
|
- Download `split_zhou_SUN397.json` from this [link](https://drive.google.com/file/d/1y2RD81BYuiyvebdN-JymPfyWYcd8_MUq/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
sun397/
|
||||||
|
|–– SUN397/
|
||||||
|
|–– split_zhou_SUN397.json
|
||||||
|
|–– ... # a bunch of .txt files
|
||||||
|
```
|
||||||
|
|
||||||
|
### DTD
|
||||||
|
- Download the dataset from https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz and extract it to `$DATA`. This should lead to `$DATA/dtd/`.
|
||||||
|
- Download `split_zhou_DescribableTextures.json` from this [link](https://drive.google.com/file/d/1u3_QfB467jqHgNXC00UIzbLZRQCg2S7x/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
dtd/
|
||||||
|
|–– images/
|
||||||
|
|–– imdb/
|
||||||
|
|–– labels/
|
||||||
|
|–– split_zhou_DescribableTextures.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### EuroSAT
|
||||||
|
- Create a folder named `eurosat/` under `$DATA`.
|
||||||
|
- Download the dataset from http://madm.dfki.de/files/sentinel/EuroSAT.zip and extract it to `$DATA/eurosat/`.
|
||||||
|
- Download `split_zhou_EuroSAT.json` from [here](https://drive.google.com/file/d/1Ip7yaCWFi0eaOFUGga0lUdVi_DDQth1o/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
eurosat/
|
||||||
|
|–– 2750/
|
||||||
|
|–– split_zhou_EuroSAT.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### UCF101
|
||||||
|
- Create a folder named `ucf101/` under `$DATA`.
|
||||||
|
- Download the zip file `UCF-101-midframes.zip` from [here](https://drive.google.com/file/d/10Jqome3vtUA2keJkNanAiFpgbyC9Hc2O/view?usp=sharing) and extract it to `$DATA/ucf101/`. This zip file contains the extracted middle video frames.
|
||||||
|
- Download `split_zhou_UCF101.json` from this [link](https://drive.google.com/file/d/1I0S0q91hJfsV9Gf4xDIjgDq4AqBNJb1y/view?usp=sharing).
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
ucf101/
|
||||||
|
|–– UCF-101-midframes/
|
||||||
|
|–– split_zhou_UCF101.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### ImageNetV2
|
||||||
|
- Create a folder named `imagenetv2/` under `$DATA`.
|
||||||
|
- Go to this github repo https://github.com/modestyachts/ImageNetV2.
|
||||||
|
- Download the matched-frequency dataset from https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz and extract it to `$DATA/imagenetv2/`.
|
||||||
|
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenetv2/`.
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
imagenetv2/
|
||||||
|
|–– imagenetv2-matched-frequency-format-val/
|
||||||
|
|–– classnames.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### ImageNet-Sketch
|
||||||
|
- Download the dataset from https://github.com/HaohanWang/ImageNet-Sketch.
|
||||||
|
- Extract the dataset to `$DATA/imagenet-sketch`.
|
||||||
|
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-sketch/`.
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
imagenet-sketch/
|
||||||
|
|–– images/ # contains 1,000 folders whose names have the format of n*
|
||||||
|
|–– classnames.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### ImageNet-A
|
||||||
|
- Create a folder named `imagenet-adversarial/` under `$DATA`.
|
||||||
|
- Download the dataset from https://github.com/hendrycks/natural-adv-examples and extract it to `$DATA/imagenet-adversarial/`.
|
||||||
|
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-adversarial/`.
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
imagenet-adversarial/
|
||||||
|
|–– imagenet-a/ # contains 200 folders whose names have the format of n*
|
||||||
|
|–– classnames.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### ImageNet-R
|
||||||
|
- Create a folder named `imagenet-rendition/` under `$DATA`.
|
||||||
|
- Download the dataset from https://github.com/hendrycks/imagenet-r and extract it to `$DATA/imagenet-rendition/`.
|
||||||
|
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-rendition/`.
|
||||||
|
|
||||||
|
The directory structure should look like
|
||||||
|
```
|
||||||
|
imagenet-rendition/
|
||||||
|
|–– imagenet-r/ # contains 200 folders whose names have the format of n*
|
||||||
|
|–– classnames.txt
|
||||||
|
```
|
||||||
149
docs/EVAL.md
Normal file
149
docs/EVAL.md
Normal file
@@ -0,0 +1,149 @@
|
|||||||
|
# Evaluating and Reproducing PromptSRC Results
|
||||||
|
|
||||||
|
We provide bash scripts in [scripts/](../scripts) directory for evaluating PromptSRC and independent V-L prompting baseline using the provided pre-trained model checkpoints.
|
||||||
|
|
||||||
|
|
||||||
|
Make sure to update the `DATA` variable with dataset path in the script file and run the commands from the main directory `PromptSRC/`.
|
||||||
|
Below we provide the pre-trained models evaluation instructions for PromptSRC. The same instructions applies for reproducing results for the baseline *independent V-L prompting* and MaPLe.
|
||||||
|
|
||||||
|
## PromptSRC
|
||||||
|
|
||||||
|
#### (1) Base-to-Novel class generalization setting
|
||||||
|
The base-to-novel PromptSRC configuration is provided in config file at `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml`. No hyper-parameters or other settings should be changed in the config file during evaluation of pre-trained models.
|
||||||
|
|
||||||
|
We show an example to reproduce results for imagenet. Follow the instructions below to reproduce results using our pre-trained model weights:
|
||||||
|
* Download the zipped folder containing base-to-novel generalization pre-trained weights for a single dataset from this [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/Em_3tkSj6T9AmhVjmzKTL3gBYNehhvfJl8ke2pU3U0nabA?e=9ecjQA). After unzipping, the directory should look like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
imagenet
|
||||||
|
|–– base/
|
||||||
|
| |–– seed1/
|
||||||
|
| |–– seed2/
|
||||||
|
| |–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Now use the evaluation script `scripts/promptsrc/reproduce_base2novel_setting.sh` and run the commands below to calculate the results over 3 seeds:
|
||||||
|
```bash
|
||||||
|
# Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||||
|
|
||||||
|
# evaluate on base and novel classes for SEED1
|
||||||
|
bash scripts/promptsrc/reproduce_base2novel_setting.sh imagenet 1 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on base and novel classes for SEED2
|
||||||
|
bash scripts/promptsrc/reproduce_base2novel_setting.sh imagenet 2 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on base and novel classes for SEED3
|
||||||
|
bash scripts/promptsrc/reproduce_base2novel_setting.sh imagenet 3 /path/to/imagenet/weights/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# prints averaged results for base classes
|
||||||
|
python output/base2new/test_base/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
|
||||||
|
# prints averaged results for novel classes
|
||||||
|
python output/base2new/test_new/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
|
||||||
|
```
|
||||||
|
The same above steps can be repeated for other individual datasets by providing respective dataset name and checkpoints path.
|
||||||
|
|
||||||
|
|
||||||
|
#### (2) Cross-dataset and domain generalization setting
|
||||||
|
In cross-dataset and domain generalization setting, we first train PromptSRC on ImageNet-1k in few-shot manner with 16 shots for all 3 seeds and then evaluate the trained model directly on cross-datasets and out-of-distribution datasets.
|
||||||
|
|
||||||
|
We provide the instructions below to reproduce cross-datasets and domain generalization results using our pre-trained imagenet model weights for PromptSRC:
|
||||||
|
* Download the zipped folder containing pre-trained weights for imagenet from this [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/Ekr9qF0cSaVDr0X6OlP2JAEBG1xjlTMjHNLc28g1SjwW-w?e=AA5ABi). After unzipping, the directory should look like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
imagenet
|
||||||
|
|–– seed1/
|
||||||
|
|–– seed2/
|
||||||
|
|–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Now use the evaluation script `scripts/promptsrc/reproduce_xd.sh` and run the commands below to calculate the results for food101 dataset over 3 seeds:
|
||||||
|
```bash
|
||||||
|
# Other possible dataset values for cross-datasets includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||||
|
# possible dataset values for domain generalization benchmark includes [imagenetv2, imagenet_sketch, imagenet_a, imagenet_r]
|
||||||
|
|
||||||
|
# evaluate on given dataset for SEED1
|
||||||
|
bash scripts/promptsrc/reproduce_xd.sh food101 1 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for SEED2
|
||||||
|
bash scripts/promptsrc/reproduce_xd.sh food101 2 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for SEED3
|
||||||
|
bash scripts/promptsrc/reproduce_xd.sh food101 3 /path/to/imagenet/weights/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
This should evaluate and save the log files in `output/` directory. To obtain the results averaged over 3 seeds, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# prints averaged results for food101 dataset
|
||||||
|
python parse_test_res.py output/evaluation/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets_16shots/food101 --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
The same above steps can be repeated for other individual datasets by providing respective dataset name and checkpoints path.
|
||||||
|
|
||||||
|
|
||||||
|
#### (3) Few-shot setting
|
||||||
|
In this setting, PromptSRC is trained on all classes individual datasets with different few-shot splits (K = 1, 2, 4, 8, 16). The PromptSRC config for few-shot setting is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot.yaml`.
|
||||||
|
Follow the instructions below to reproduce PromptSRC few-shot setting results using our pre-trained models:
|
||||||
|
|
||||||
|
Now use the evaluation script `scripts/promptsrc/reproduce_few_shot.sh` and run the commands below to calculate the results for imagenet dataset over 3 seeds:
|
||||||
|
```bash
|
||||||
|
# reproduce_few_shot.sh calculates results for all 3 seeds for a given K
|
||||||
|
# Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||||
|
|
||||||
|
# evaluate on given dataset for K=1 shot
|
||||||
|
bash scripts/promptsrc/reproduce_few_shot.sh food101 1 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for K=2 shot
|
||||||
|
bash scripts/promptsrc/reproduce_few_shot.sh food101 2 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for K=4 shot
|
||||||
|
bash scripts/promptsrc/reproduce_few_shot.sh food101 4 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for K=8 shot
|
||||||
|
bash scripts/promptsrc/reproduce_few_shot.sh food101 8 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for K=16 shot
|
||||||
|
bash scripts/promptsrc/reproduce_few_shot.sh food101 16 /path/to/imagenet/weights/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
This should evaluate and save the log files in `output/` directory. To obtain the results averaged over 3 seeds for all shots, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# prints averaged results for food101 dataset for K=1
|
||||||
|
python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_1shots/food101 --test-log
|
||||||
|
# prints averaged results for food101 dataset for K=2
|
||||||
|
python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_2shots/food101 --test-log
|
||||||
|
# prints averaged results for food101 dataset for K=4
|
||||||
|
python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_4shots/food101 --test-log
|
||||||
|
# prints averaged results for food101 dataset for K=8
|
||||||
|
python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_8shots/food101 --test-log
|
||||||
|
# prints averaged results for food101 dataset for K=16
|
||||||
|
python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_16shots/food101 --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
The same above steps can be repeated for other individual datasets by providing respective dataset name and checkpoints path.
|
||||||
|
|
||||||
|
<br>
|
||||||
|
|
||||||
|
## Training and Evaluating the independent V-L prompting baseline results
|
||||||
|
|
||||||
|
For IVLP baseline method, we provide its corresponding default configs and evaluation scripts as follows.
|
||||||
|
|
||||||
|
```
|
||||||
|
configs
|
||||||
|
|–– datasets/
|
||||||
|
|–– trainers/
|
||||||
|
| |–– CoCoOp/
|
||||||
|
| |–– CoOp/
|
||||||
|
| |–– MaPLe/
|
||||||
|
| |–– IVLP/
|
||||||
|
| |–– PromptSRC/
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
scripts
|
||||||
|
|–– cocoop/
|
||||||
|
|–– coop/
|
||||||
|
|–– maple/
|
||||||
|
|–– independent-vlp/
|
||||||
|
|–– promptsrc/
|
||||||
|
```
|
||||||
|
|
||||||
|
Please use the corresponding config and script files and follow the same instructions as provided for PromptSRC in order to evaluate and reproduce results of IVLP baseline approach. The pretrained weights for IVLP baseline are provided [at this link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/EuIwh-yMh_JBqB2Y_o8Jl14BPDKDRHC0JBPE1BugIeZiSQ?e=oJnJwy).
|
||||||
|
This repository also supports using official [CoOp](CoOp.md) and [Co-CoOp](Co-CoOp.md) configs and models.
|
||||||
48
docs/INSTALL.md
Normal file
48
docs/INSTALL.md
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
# Installation
|
||||||
|
|
||||||
|
### Acknowledgement: This readme file for installing datasets is modified from [MaPLe's](https://github.com/muzairkhattak/multimodal-prompt-learning) official repository.
|
||||||
|
|
||||||
|
This codebase is tested on Ubuntu 20.04.2 LTS with python 3.8. Follow the below steps to create environment and install dependencies.
|
||||||
|
|
||||||
|
* Setup conda environment (recommended).
|
||||||
|
```bash
|
||||||
|
# Create a conda environment
|
||||||
|
conda create -y -n promptsrc python=3.8
|
||||||
|
|
||||||
|
# Activate the environment
|
||||||
|
conda activate promptsrc
|
||||||
|
|
||||||
|
# Install torch (requires version >= 1.8.1) and torchvision
|
||||||
|
# Please refer to https://pytorch.org/ if you need a different cuda version
|
||||||
|
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
|
||||||
|
```
|
||||||
|
|
||||||
|
* Install dassl library.
|
||||||
|
```bash
|
||||||
|
# Instructions borrowed from https://github.com/KaiyangZhou/Dassl.pytorch#installation
|
||||||
|
|
||||||
|
# Clone this repo
|
||||||
|
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
|
||||||
|
cd Dassl.pytorch/
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Install this library (no need to re-build if the source code is modified)
|
||||||
|
python setup.py develop
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
* Clone PromptSRC code repository and install requirements
|
||||||
|
```bash
|
||||||
|
# Clone PromptSRC code base
|
||||||
|
git clone https://github.com/muzairkhattak/PromptSRC.git
|
||||||
|
|
||||||
|
cd PromptSRC/
|
||||||
|
# Install requirements
|
||||||
|
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Update setuptools package
|
||||||
|
pip install setuptools==59.5.0
|
||||||
|
```
|
||||||
211
docs/MaPLe.md
Normal file
211
docs/MaPLe.md
Normal file
@@ -0,0 +1,211 @@
|
|||||||
|
# Training and Evaluation
|
||||||
|
|
||||||
|
We provide bash scripts in [scripts/](../scripts) for each prompting variant including MaPLe, vision, language and independent V-L prompting.
|
||||||
|
Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `multimodal-prompt-learning/`.
|
||||||
|
Below we provide training and evaluation instructions for MaPLe. The same instructions applies for all other variants including *Vision (VPT), Language and independent V-L prompting*.
|
||||||
|
|
||||||
|
|
||||||
|
### Training time and compute
|
||||||
|
We train MaPLe on each dataset with a batch size of 4 using a **single** NVIDIA A100 GPU.
|
||||||
|
Training MaPle on ImageNet for 5 epochs takes 1 hour for a single seed. So results for 3 seeds takes around 3 hours. For all remaining 10 datasets, it combinedly takes around 4 hours (for all 3 seeds) on a single A100 GPU. To ease reproduction of MaPLe results, we have provided [training logs](https://drive.google.com/drive/folders/1EvuvgR8566bL0T7ucvAL3LFVwuUPMRas?usp=sharing) for all datasets.
|
||||||
|
|
||||||
|
## MaPLe
|
||||||
|
|
||||||
|
#### (1) Base-to-Novel class generalization setting
|
||||||
|
The default training settings are provided in config file at `configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml`. All hyper-parameters such as prompt length, prompt depth, etc., can be modified using this config file.
|
||||||
|
|
||||||
|
Below, we provide instructions to train MaPLe on imagenet.
|
||||||
|
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||||
|
|
||||||
|
# seed=1
|
||||||
|
# trains and evaluates on base classes
|
||||||
|
bash scripts/maple/base2new_train_maple.sh imagenet 1
|
||||||
|
# evaluates on novel classes
|
||||||
|
bash scripts/maple/base2new_test_maple.sh imagenet 1
|
||||||
|
|
||||||
|
# seed=2
|
||||||
|
# trains and evaluates on base classes
|
||||||
|
bash scripts/maple/base2new_train_maple.sh imagenet 2
|
||||||
|
# evaluates on novel classes
|
||||||
|
bash scripts/maple/base2new_test_maple.sh imagenet 2
|
||||||
|
|
||||||
|
# seed=3
|
||||||
|
# trains and evaluates on base classes
|
||||||
|
bash scripts/maple/base2new_train_maple.sh imagenet 3
|
||||||
|
# evaluates on novel classes
|
||||||
|
bash scripts/maple/base2new_test_maple.sh imagenet 3
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Averaging results over 3 seeds:
|
||||||
|
Once the above trainings and evaluations are completed, the `output/` directory should have the following structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
output
|
||||||
|
|–– base2new/
|
||||||
|
| |–– test_new/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– MaPLe/
|
||||||
|
| | | | | |–– vit_b16_c2_ep5_batch4_2ctx/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
| |–– train_base/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– MaPLe/
|
||||||
|
| | | | | |–– vit_b16_c2_ep5_batch4_2ctx/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Now use the script `parse_test_res.py` and run the commands below to calculate the averaged results:
|
||||||
|
```bash
|
||||||
|
# prints averaged results for base classes
|
||||||
|
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx
|
||||||
|
# averaged results for novel classes
|
||||||
|
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
The above steps can be repeated for other individual datasets.
|
||||||
|
|
||||||
|
#### Reproducing results using pre-trained weights for base-to-novel generalization setting
|
||||||
|
|
||||||
|
We show an example to reproduce results for imagenet. Follow the instructions below to reproduce results using our pre-trained model weights:
|
||||||
|
* Download the zipped folder containing pre-trained weights for a single dataset from this [link](https://drive.google.com/drive/folders/1-tB6BUDBzs9CXTOJ7p5hM4Svq1tL_mGz?usp=sharing). Additionally we also provide the log files for both training and evaluation. After unzipping, the directory should look like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
imagenet
|
||||||
|
|–– base/
|
||||||
|
| |–– seed1/
|
||||||
|
| |–– seed2/
|
||||||
|
| |–– seed3/
|
||||||
|
|–– novel/
|
||||||
|
| |–– seed1/
|
||||||
|
| |–– seed2/
|
||||||
|
| |–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Now use the evaluation script `scripts/maple/reproduce_maple.sh` and run the commands below to calculate the averaged results:
|
||||||
|
```bash
|
||||||
|
# evaluate on base and novel classes for SEED1
|
||||||
|
bash scripts/maple/reproduce_maple.sh imagenet 1 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on base and novel classes for SEED2
|
||||||
|
bash scripts/maple/reproduce_maple.sh imagenet 2 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on base and novel classes for SEED3
|
||||||
|
bash scripts/maple/reproduce_maple.sh imagenet 3 /path/to/imagenet/weights/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# prints averaged results for base classes
|
||||||
|
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx
|
||||||
|
# averaged results for novel classes
|
||||||
|
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
#### (2) Cross-Dataset Transfer
|
||||||
|
We provide instructions to train MaPLe on imageNet using all 1000 classes and then evaluating it directly on new downstream datasets.
|
||||||
|
We provide cross-dataset config for MaPLe: `configs/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets.yaml`.
|
||||||
|
* Firstly, train MaPLe on imagenet in few-shot manner (for all 3 seeds).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# seed=1
|
||||||
|
bash scripts/maple/xd_train_maple.sh imagenet 1
|
||||||
|
# seed=2
|
||||||
|
bash scripts/maple/xd_train_maple.sh imagenet 2
|
||||||
|
# seed=3
|
||||||
|
bash scripts/maple/xd_train_maple.sh imagenet 3
|
||||||
|
```
|
||||||
|
|
||||||
|
* Now evaluate imageNet model on downstream datasets.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for SEED in 1 2 3
|
||||||
|
do
|
||||||
|
bash scripts/maple/xd_test_maple.sh caltech101 ${SEED}
|
||||||
|
bash scripts/maple/xd_test_maple.sh oxford_pets ${SEED}
|
||||||
|
bash scripts/maple/xd_test_maple.sh stanford_cars ${SEED}
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
#### (3) Domain Generalization
|
||||||
|
We use imagenet trained MaPLe model for domain generalization experiments. The steps are similar to above cross-dataset experiments, however, model is evaluated on imagenet variants.
|
||||||
|
* Evaluate imageNet model on variants of imagenet (domain shift datasets).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for SEED in 1 2 3
|
||||||
|
do
|
||||||
|
bash scripts/maple/xd_test_maple.sh imagenetv2 ${SEED}
|
||||||
|
bash scripts/maple/xd_test_maple.sh imagenet_sketch ${SEED}
|
||||||
|
bash scripts/maple/xd_test_maple.sh imagenet_a ${SEED}
|
||||||
|
bash scripts/maple/xd_test_maple.sh imagenet_r ${SEED}
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
|
||||||
|
#### Reproducing official results for cross-dataset and domain generalization setting
|
||||||
|
|
||||||
|
We provide the instructions below to reproduce domain-generalization and cross-datasets results using our pre-trained imagenet model weights for MaPLe:
|
||||||
|
* Download the zipped folder containing pre-trained weights for imagenet from this [link](https://drive.google.com/drive/folders/1bmhvmNZc13WJ5U71qt0t8k91wyuoemVF?usp=sharing). Additionally, we also provide the log files for both training and evaluation. After unzipping, the directory should look like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
imagenet
|
||||||
|
|–– seed1/
|
||||||
|
|–– seed2/
|
||||||
|
|–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Now use the evaluation script `scripts/maple/reproduce_maple_xd.sh` and run the commands below to calculate the averaged results:
|
||||||
|
```bash
|
||||||
|
# evaluate on given dataset for SEED1
|
||||||
|
bash scripts/maple/reproduce_maple_xd.sh food101 1 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for SEED2
|
||||||
|
bash scripts/maple/reproduce_maple_xd.sh food101 2 /path/to/imagenet/weights/folder
|
||||||
|
# evaluate on given dataset for SEED3
|
||||||
|
bash scripts/maple/reproduce_maple_xd.sh food101 3 /path/to/imagenet/weights/folder
|
||||||
|
```
|
||||||
|
|
||||||
|
This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# prints averaged results for food101 dataset
|
||||||
|
python parse_test_res.py output/evaluation/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets_16shots/food101 --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
#### Training and Evaluating other variants
|
||||||
|
|
||||||
|
For other variants including vision, language and independent V-L prompting techniques, we provide their corresponding configs and scripts as follows.
|
||||||
|
|
||||||
|
```
|
||||||
|
configs
|
||||||
|
|–– datasets/
|
||||||
|
|–– trainers/
|
||||||
|
| |–– CoCoOp/
|
||||||
|
| |–– CoOp/
|
||||||
|
| |–– MaPLe/
|
||||||
|
| |–– IVLP/
|
||||||
|
| |–– VPT/
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
scripts
|
||||||
|
|–– cocoop/
|
||||||
|
|–– coop/
|
||||||
|
|–– language-prompting/
|
||||||
|
|–– maple/
|
||||||
|
|–– independent-vlp/
|
||||||
|
```
|
||||||
|
|
||||||
|
Please use the corresponding config and script files and follow the same instructions as provided for MaPLe in order to train and evaluate the other variants. Same instructions can be followed to reproduce results of other variants using provided pretrained weights.
|
||||||
169
docs/TRAIN.md
Normal file
169
docs/TRAIN.md
Normal file
@@ -0,0 +1,169 @@
|
|||||||
|
# PromptSRC Training
|
||||||
|
|
||||||
|
We provide bash scripts in [scripts/](../scripts) for training PromptSRC and independent V-L prompting baseline.
|
||||||
|
Make sure to update the `DATA` variable with dataset path in the script file and run the commands from the main directory `PromptSRC/`.
|
||||||
|
Below we provide training and testing instructions for PromptSRC. The same instructions are applicable for the baseline *independent V-L prompting* approach, MaPLe, CoOp and CoCoOp.
|
||||||
|
|
||||||
|
### Training time and compute
|
||||||
|
We train PromptSRC on each dataset with a batch size of 4 using a **single** NVIDIA A100 GPU.
|
||||||
|
Training PromptSRC on ImageNet for 20 epochs takes around 6 hours for a single seed. So results for 3 seeds takes around 18 hours. For all remaining 10 datasets, it combinedly takes around around 8 hours (for all 3 seeds) on a single A100 GPU.
|
||||||
|
|
||||||
|
## PromptSRC
|
||||||
|
|
||||||
|
#### (1) Base-to-Novel class generalization setting
|
||||||
|
The base-to-novel PromptSRC configuration is provided in config file at `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml`. All hyper-parameters such as GPA STD, GPA Mean, SCL loss weights coefficients, prompt length and prompt depth etc., can be modified using this config file.
|
||||||
|
|
||||||
|
Run the commands below to train PromptSRC on ImageNet.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||||
|
|
||||||
|
# seed=1
|
||||||
|
# trains and evaluates on base classes
|
||||||
|
bash scripts/promptsrc/base2new_train.sh imagenet 1
|
||||||
|
# evaluates on novel classes
|
||||||
|
bash scripts/promptsrc/base2new_test.sh imagenet 1
|
||||||
|
|
||||||
|
# seed=2
|
||||||
|
# trains and evaluates on base classes
|
||||||
|
bash scripts/promptsrc/base2new_train.sh imagenet 2
|
||||||
|
# evaluates on novel classes
|
||||||
|
bash scripts/promptsrc/base2new_test.sh imagenet 2
|
||||||
|
|
||||||
|
# seed=3
|
||||||
|
# trains and evaluates on base classes
|
||||||
|
bash scripts/promptsrc/base2new_train.sh imagenet 3
|
||||||
|
# evaluates on novel classes
|
||||||
|
bash scripts/promptsrc/base2new_test.sh imagenet 3
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Averaging results over 3 seeds:
|
||||||
|
Once the above trainings and evaluations are completed, the `output/` directory should have the following structure:
|
||||||
|
|
||||||
|
```
|
||||||
|
output
|
||||||
|
|–– base2new/
|
||||||
|
| |–– test_new/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– PromptSRC/
|
||||||
|
| | | | | |–– vit_b16_c2_ep20_batch4_4+4ctx/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
| |–– train_base/
|
||||||
|
| | |–– imagenet/
|
||||||
|
| | | |–– shots_16/
|
||||||
|
| | | | |–– PromptSRC/
|
||||||
|
| | | | | |–– vit_b16_c2_ep20_batch4_4+4ctx/
|
||||||
|
| | | | | | |–– seed1/
|
||||||
|
| | | | | | |–– seed2/
|
||||||
|
| | | | | | |–– seed3/
|
||||||
|
```
|
||||||
|
|
||||||
|
Now use the script `parse_test_res.py` and run the commands below to calculate the averaged results:
|
||||||
|
```bash
|
||||||
|
# prints averaged results for base classes
|
||||||
|
python output/base2new/train_base/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
|
||||||
|
# averaged results for novel classes
|
||||||
|
python output/base2new/test_new/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
|
||||||
|
```
|
||||||
|
|
||||||
|
The above steps can be repeated for other individual datasets.
|
||||||
|
|
||||||
|
#### (2) Cross-Dataset Transfer setting
|
||||||
|
We provide instructions to train PromptSRC on ImageNet using all 1000 classes with 16 shots and then evaluating it directly on new downstream datasets.
|
||||||
|
The corresponding cross-dataset config for PromptSRC is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets.yaml`. All PromptSRC hyper-parameters can be modified in this config file.
|
||||||
|
* Firstly, train PromptSRC on imagenet in few-shot manner (for all 3 seeds).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# seed=1
|
||||||
|
bash scripts/promptsrc/xd_train.sh imagenet 1
|
||||||
|
# seed=2
|
||||||
|
bash scripts/promptsrc/xd_train.sh imagenet 2
|
||||||
|
# seed=3
|
||||||
|
bash scripts/promptsrc/xd_train.sh imagenet 3
|
||||||
|
```
|
||||||
|
|
||||||
|
* Now directly evaluate the ImageNet trained model on downstream cross-datasets.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Other possible dataset values includes [imagenet, food101, dtd, ucf101, oxford_flowers, fgvc_aircraft, sun397, eurosat]
|
||||||
|
|
||||||
|
for SEED in 1 2 3
|
||||||
|
do
|
||||||
|
bash scripts/promptsrc/xd_test.sh caltech101 ${SEED}
|
||||||
|
bash scripts/promptsrc/xd_test.sh oxford_pets ${SEED}
|
||||||
|
bash scripts/promptsrc/xd_test.sh stanford_cars ${SEED}
|
||||||
|
done
|
||||||
|
```
|
||||||
|
You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
|
||||||
|
|
||||||
|
|
||||||
|
#### (3) Domain Generalization setting
|
||||||
|
We use the same ImageNet trained PromptSRC model for domain generalization experiments. The steps are similar to above cross-dataset experiments, however, the trained model is now evaluated on ImageNet variants.
|
||||||
|
The corresponding domain generalization config for PromptSRC is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets.yaml`.
|
||||||
|
* Evaluate ImageNet model on different variants of ImageNet (datasets with domain shifts).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
for SEED in 1 2 3
|
||||||
|
do
|
||||||
|
bash scripts/promptsrc/xd_test.sh imagenetv2 ${SEED}
|
||||||
|
bash scripts/promptsrc/xd_test.sh imagenet_sketch ${SEED}
|
||||||
|
bash scripts/promptsrc/xd_test.sh imagenet_a ${SEED}
|
||||||
|
bash scripts/promptsrc/xd_test.sh imagenet_r ${SEED}
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
|
||||||
|
|
||||||
|
#### (4) Few-shot setting
|
||||||
|
In this setting, PromptSRC is trained on all classes individual datasets with different few-shot splits (K = 1, 2, 4, 8, 16). The corresponding few-shot setting config for PromptSRC is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot.yaml`.
|
||||||
|
|
||||||
|
Now use the training script `scripts/promptsrc/few_shot.sh` and run the commands below to calculate the results for imagenet dataset for all shots over 3 seeds:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||||
|
|
||||||
|
# train and test on given dataset for K=1 shot
|
||||||
|
bash scripts/promptsrc/few_shot.sh imagenet 1
|
||||||
|
# train and test on given dataset for K=2 shot
|
||||||
|
bash scripts/promptsrc/few_shot.sh imagenet 2
|
||||||
|
# train and test on given dataset for K=4 shot
|
||||||
|
bash scripts/promptsrc/few_shot.sh imagenet 4
|
||||||
|
# train and test on given dataset for K=8 shot
|
||||||
|
bash scripts/promptsrc/few_shot.sh imagenet 8
|
||||||
|
# train and test on given dataset for K=17 shot
|
||||||
|
bash scripts/promptsrc/few_shot.sh imagenet 16
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
|
||||||
|
<br>
|
||||||
|
|
||||||
|
|
||||||
|
#### Training and testing independent V-L prompting baseline approach
|
||||||
|
|
||||||
|
For training independent V-L prompting baseline approach, we provide their corresponding configs and scripts as follows.
|
||||||
|
|
||||||
|
```
|
||||||
|
configs
|
||||||
|
|–– datasets/
|
||||||
|
|–– trainers/
|
||||||
|
| |–– CoCoOp/
|
||||||
|
| |–– CoOp/
|
||||||
|
| |–– IVLP/
|
||||||
|
| |–– PromptSRC/
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
scripts
|
||||||
|
|–– cocoop/
|
||||||
|
|–– coop/
|
||||||
|
|–– promptsrc/
|
||||||
|
|–– independent-vlp/
|
||||||
|
```
|
||||||
|
|
||||||
|
Please use the corresponding config and script files and follow the same instructions as provided for PromptSRC for training and testing.
|
||||||
|
This repository also supports using official [MaPLe](MaPLe.md), [CoOp](CoOp.md) and [Co-CoOp](Co-CoOp.md) configs and models.
|
||||||
BIN
docs/main_figure.png
Normal file
BIN
docs/main_figure.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 2.9 MiB |
49409
interpret_prompts/clip_words.csv
Normal file
49409
interpret_prompts/clip_words.csv
Normal file
File diff suppressed because it is too large
Load Diff
84
interpret_prompts/interpret_prompt.py
Normal file
84
interpret_prompts/interpret_prompt.py
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import argparse
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from clip.simple_tokenizer import SimpleTokenizer
|
||||||
|
from clip import clip
|
||||||
|
|
||||||
|
# "ViT-B/16"
|
||||||
|
# "RN50"
|
||||||
|
def load_clip_to_cpu(backbone_name="ViT-B/16"):
|
||||||
|
url = clip._MODELS[backbone_name]
|
||||||
|
model_path = clip._download(url)
|
||||||
|
|
||||||
|
try:
|
||||||
|
# loading JIT archive
|
||||||
|
model = torch.jit.load(model_path, map_location="cpu").eval()
|
||||||
|
state_dict = None
|
||||||
|
|
||||||
|
except RuntimeError:
|
||||||
|
state_dict = torch.load(model_path, map_location="cpu")
|
||||||
|
|
||||||
|
model = clip.build_model(state_dict or model.state_dict())
|
||||||
|
|
||||||
|
return model
|
||||||
|
|
||||||
|
|
||||||
|
# parser = argparse.ArgumentParser()
|
||||||
|
# parser.add_argument("fpath", type=str, help="Path to the learned prompt")
|
||||||
|
# parser.add_argument("topk", type=int, help="Select top-k similar words")
|
||||||
|
# args = parser.parse_args()
|
||||||
|
|
||||||
|
fpath = "./compound_prompt_weights/train_base/food101/shots_16/cocoop/vit_b16_c4_ep10_batch1_ctxv1/seed1/prompt_learner/model.pth.tar-5"
|
||||||
|
topk = 10
|
||||||
|
|
||||||
|
assert os.path.exists(fpath)
|
||||||
|
|
||||||
|
print(f"Return the top-{topk} matched words")
|
||||||
|
|
||||||
|
tokenizer = SimpleTokenizer()
|
||||||
|
clip_model = load_clip_to_cpu()
|
||||||
|
token_embedding = clip_model.token_embedding.weight
|
||||||
|
print(f"Size of token embedding: {token_embedding.shape}")
|
||||||
|
|
||||||
|
prompt_learner = torch.load(fpath, map_location="cpu")["state_dict"]
|
||||||
|
# Extract the input tokens
|
||||||
|
ctx = prompt_learner["prompt_learner.ctx"]
|
||||||
|
ctx = ctx.float()
|
||||||
|
# Now extract the intermediate tokens
|
||||||
|
intermediate_embeddings = []
|
||||||
|
depth = 9 - 1
|
||||||
|
for i in range(depth):
|
||||||
|
# Now extract the prompt embeddings and store it
|
||||||
|
query = 'prompt_learner.compound_prompts_text.' + str(i)
|
||||||
|
temp = prompt_learner[query].float()
|
||||||
|
intermediate_embeddings.append(temp)
|
||||||
|
|
||||||
|
print(f"Size of context: {ctx.shape}")
|
||||||
|
|
||||||
|
# Now repeat this for all layer context embeddings
|
||||||
|
|
||||||
|
all_layer_ctx = [ctx] + intermediate_embeddings
|
||||||
|
|
||||||
|
for idx, single_ctx in enumerate(all_layer_ctx):
|
||||||
|
print("SHOWING RESULTS FOR CTX Vectors of Layer: ", idx + 1)
|
||||||
|
ctx = single_ctx
|
||||||
|
if ctx.dim() == 2:
|
||||||
|
# Generic context
|
||||||
|
distance = torch.cdist(ctx, token_embedding)
|
||||||
|
print(f"Size of distance matrix: {distance.shape}")
|
||||||
|
sorted_idxs = torch.argsort(distance, dim=1)
|
||||||
|
sorted_idxs = sorted_idxs[:, :topk]
|
||||||
|
|
||||||
|
for m, idxs in enumerate(sorted_idxs):
|
||||||
|
words = [tokenizer.decoder[idx.item()] for idx in idxs]
|
||||||
|
dist = [f"{distance[m, idx].item():.4f}" for idx in idxs]
|
||||||
|
print(f"{m+1}: {words} {dist}")
|
||||||
|
|
||||||
|
elif ctx.dim() == 3:
|
||||||
|
# Class-specific context
|
||||||
|
raise NotImplementedError
|
||||||
|
|
||||||
|
print("##############################")
|
||||||
|
print("##############################")
|
||||||
17
lpclip/README.md
Normal file
17
lpclip/README.md
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
# Linear Probe CLIP
|
||||||
|
|
||||||
|
To run linear probe baselines, make sure that your current working directory is `lpclip/`.
|
||||||
|
|
||||||
|
Step 1: Extract Features using the CLIP Image Encoder
|
||||||
|
```bash
|
||||||
|
sh feat_extractor.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Step 2: Train few-shot linear probe
|
||||||
|
```bash
|
||||||
|
sh linear_probe.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
We follow the instructions stated in the Appendix A3 (pp.38) of [the original CLIP paper](https://arxiv.org/pdf/2103.00020.pdf), with a careful hyperparameter sweep.
|
||||||
|
|
||||||
|
Note: please pull the latest Dassl (version >= `606a2c6`).
|
||||||
189
lpclip/feat_extractor.py
Normal file
189
lpclip/feat_extractor.py
Normal file
@@ -0,0 +1,189 @@
|
|||||||
|
import os, argparse
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.append(os.path.abspath(".."))
|
||||||
|
|
||||||
|
from datasets.oxford_pets import OxfordPets
|
||||||
|
from datasets.oxford_flowers import OxfordFlowers
|
||||||
|
from datasets.fgvc_aircraft import FGVCAircraft
|
||||||
|
from datasets.dtd import DescribableTextures
|
||||||
|
from datasets.eurosat import EuroSAT
|
||||||
|
from datasets.stanford_cars import StanfordCars
|
||||||
|
from datasets.food101 import Food101
|
||||||
|
from datasets.sun397 import SUN397
|
||||||
|
from datasets.caltech101 import Caltech101
|
||||||
|
from datasets.ucf101 import UCF101
|
||||||
|
from datasets.imagenet import ImageNet
|
||||||
|
from datasets.imagenetv2 import ImageNetV2
|
||||||
|
from datasets.imagenet_sketch import ImageNetSketch
|
||||||
|
from datasets.imagenet_a import ImageNetA
|
||||||
|
from datasets.imagenet_r import ImageNetR
|
||||||
|
|
||||||
|
from dassl.utils import setup_logger, set_random_seed, collect_env_info
|
||||||
|
from dassl.config import get_cfg_default
|
||||||
|
from dassl.data.transforms import build_transform
|
||||||
|
from dassl.data import DatasetWrapper
|
||||||
|
|
||||||
|
import clip
|
||||||
|
|
||||||
|
# import pdb; pdb.set_trace()
|
||||||
|
|
||||||
|
|
||||||
|
def print_args(args, cfg):
|
||||||
|
print("***************")
|
||||||
|
print("** Arguments **")
|
||||||
|
print("***************")
|
||||||
|
optkeys = list(args.__dict__.keys())
|
||||||
|
optkeys.sort()
|
||||||
|
for key in optkeys:
|
||||||
|
print("{}: {}".format(key, args.__dict__[key]))
|
||||||
|
print("************")
|
||||||
|
print("** Config **")
|
||||||
|
print("************")
|
||||||
|
print(cfg)
|
||||||
|
|
||||||
|
|
||||||
|
def reset_cfg(cfg, args):
|
||||||
|
if args.root:
|
||||||
|
cfg.DATASET.ROOT = args.root
|
||||||
|
|
||||||
|
if args.output_dir:
|
||||||
|
cfg.OUTPUT_DIR = args.output_dir
|
||||||
|
|
||||||
|
if args.trainer:
|
||||||
|
cfg.TRAINER.NAME = args.trainer
|
||||||
|
|
||||||
|
if args.backbone:
|
||||||
|
cfg.MODEL.BACKBONE.NAME = args.backbone
|
||||||
|
|
||||||
|
if args.head:
|
||||||
|
cfg.MODEL.HEAD.NAME = args.head
|
||||||
|
|
||||||
|
|
||||||
|
def extend_cfg(cfg):
|
||||||
|
"""
|
||||||
|
Add new config variables.
|
||||||
|
|
||||||
|
E.g.
|
||||||
|
from yacs.config import CfgNode as CN
|
||||||
|
cfg.TRAINER.MY_MODEL = CN()
|
||||||
|
cfg.TRAINER.MY_MODEL.PARAM_A = 1.
|
||||||
|
cfg.TRAINER.MY_MODEL.PARAM_B = 0.5
|
||||||
|
cfg.TRAINER.MY_MODEL.PARAM_C = False
|
||||||
|
"""
|
||||||
|
from yacs.config import CfgNode as CN
|
||||||
|
|
||||||
|
cfg.TRAINER.OURS = CN()
|
||||||
|
cfg.TRAINER.OURS.N_CTX = 10 # number of context vectors
|
||||||
|
cfg.TRAINER.OURS.CSC = False # class-specific context
|
||||||
|
cfg.TRAINER.OURS.CTX_INIT = "" # initialize context vectors with given words
|
||||||
|
cfg.TRAINER.OURS.WEIGHT_U = 0.1 # weight for the unsupervised loss
|
||||||
|
|
||||||
|
|
||||||
|
def setup_cfg(args):
|
||||||
|
cfg = get_cfg_default()
|
||||||
|
extend_cfg(cfg)
|
||||||
|
|
||||||
|
# 1. From the dataset config file
|
||||||
|
if args.dataset_config_file:
|
||||||
|
cfg.merge_from_file(args.dataset_config_file)
|
||||||
|
|
||||||
|
# 2. From the method config file
|
||||||
|
if args.config_file:
|
||||||
|
cfg.merge_from_file(args.config_file)
|
||||||
|
|
||||||
|
# 3. From input arguments
|
||||||
|
reset_cfg(cfg, args)
|
||||||
|
|
||||||
|
cfg.freeze()
|
||||||
|
|
||||||
|
return cfg
|
||||||
|
|
||||||
|
|
||||||
|
def main(args):
|
||||||
|
cfg = setup_cfg(args)
|
||||||
|
if cfg.SEED >= 0:
|
||||||
|
print("Setting fixed seed: {}".format(cfg.SEED))
|
||||||
|
set_random_seed(cfg.SEED)
|
||||||
|
setup_logger(cfg.OUTPUT_DIR)
|
||||||
|
|
||||||
|
if torch.cuda.is_available() and cfg.USE_CUDA:
|
||||||
|
torch.backends.cudnn.benchmark = True
|
||||||
|
|
||||||
|
print_args(args, cfg)
|
||||||
|
print("Collecting env info ...")
|
||||||
|
print("** System info **\n{}\n".format(collect_env_info()))
|
||||||
|
|
||||||
|
######################################
|
||||||
|
# Setup DataLoader
|
||||||
|
######################################
|
||||||
|
dataset = eval(cfg.DATASET.NAME)(cfg)
|
||||||
|
|
||||||
|
if args.split == "train":
|
||||||
|
dataset_input = dataset.train_x
|
||||||
|
elif args.split == "val":
|
||||||
|
dataset_input = dataset.val
|
||||||
|
else:
|
||||||
|
dataset_input = dataset.test
|
||||||
|
|
||||||
|
tfm_train = build_transform(cfg, is_train=False)
|
||||||
|
data_loader = torch.utils.data.DataLoader(
|
||||||
|
DatasetWrapper(cfg, dataset_input, transform=tfm_train, is_train=False),
|
||||||
|
batch_size=cfg.DATALOADER.TRAIN_X.BATCH_SIZE,
|
||||||
|
sampler=None,
|
||||||
|
shuffle=False,
|
||||||
|
num_workers=cfg.DATALOADER.NUM_WORKERS,
|
||||||
|
drop_last=False,
|
||||||
|
pin_memory=(torch.cuda.is_available() and cfg.USE_CUDA),
|
||||||
|
)
|
||||||
|
|
||||||
|
########################################
|
||||||
|
# Setup Network
|
||||||
|
########################################
|
||||||
|
clip_model, _ = clip.load("RN50", "cuda", jit=False)
|
||||||
|
clip_model.eval()
|
||||||
|
###################################################################################################################
|
||||||
|
# Start Feature Extractor
|
||||||
|
feature_list = []
|
||||||
|
label_list = []
|
||||||
|
train_dataiter = iter(data_loader)
|
||||||
|
for train_step in range(1, len(train_dataiter) + 1):
|
||||||
|
batch = next(train_dataiter)
|
||||||
|
data = batch["img"].cuda()
|
||||||
|
feature = clip_model.visual(data)
|
||||||
|
feature = feature.cpu()
|
||||||
|
for idx in range(len(data)):
|
||||||
|
feature_list.append(feature[idx].tolist())
|
||||||
|
label_list.extend(batch["label"].tolist())
|
||||||
|
save_dir = os.path.join(cfg.OUTPUT_DIR, cfg.DATASET.NAME)
|
||||||
|
os.makedirs(save_dir, exist_ok=True)
|
||||||
|
save_filename = f"{args.split}"
|
||||||
|
np.savez(
|
||||||
|
os.path.join(save_dir, save_filename),
|
||||||
|
feature_list=feature_list,
|
||||||
|
label_list=label_list,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--root", type=str, default="", help="path to dataset")
|
||||||
|
parser.add_argument("--output-dir", type=str, default="", help="output directory")
|
||||||
|
parser.add_argument("--config-file", type=str, default="", help="path to config file")
|
||||||
|
parser.add_argument(
|
||||||
|
"--dataset-config-file",
|
||||||
|
type=str,
|
||||||
|
default="",
|
||||||
|
help="path to config file for dataset setup",
|
||||||
|
)
|
||||||
|
parser.add_argument("--num-shot", type=int, default=1, help="number of shots")
|
||||||
|
parser.add_argument("--split", type=str, choices=["train", "val", "test"], help="which split")
|
||||||
|
parser.add_argument("--trainer", type=str, default="", help="name of trainer")
|
||||||
|
parser.add_argument("--backbone", type=str, default="", help="name of CNN backbone")
|
||||||
|
parser.add_argument("--head", type=str, default="", help="name of head")
|
||||||
|
parser.add_argument("--seed", type=int, default=-1, help="only positive value enables a fixed seed")
|
||||||
|
parser.add_argument("--eval-only", action="store_true", help="evaluation only")
|
||||||
|
args = parser.parse_args()
|
||||||
|
main(args)
|
||||||
20
lpclip/feat_extractor.sh
Normal file
20
lpclip/feat_extractor.sh
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
# sh feat_extractor.sh
|
||||||
|
DATA=/path/to/datasets
|
||||||
|
OUTPUT='./clip_feat/'
|
||||||
|
SEED=1
|
||||||
|
|
||||||
|
# oxford_pets oxford_flowers fgvc_aircraft dtd eurosat stanford_cars food101 sun397 caltech101 ucf101 imagenet
|
||||||
|
for DATASET in oxford_pets
|
||||||
|
do
|
||||||
|
for SPLIT in train val test
|
||||||
|
do
|
||||||
|
python feat_extractor.py \
|
||||||
|
--split ${SPLIT} \
|
||||||
|
--root ${DATA} \
|
||||||
|
--seed ${SEED} \
|
||||||
|
--dataset-config-file ../configs/datasets/${DATASET}.yaml \
|
||||||
|
--config-file ../configs/trainers/CoOp/rn50_val.yaml \
|
||||||
|
--output-dir ${OUTPUT} \
|
||||||
|
--eval-only
|
||||||
|
done
|
||||||
|
done
|
||||||
129
lpclip/linear_probe.py
Normal file
129
lpclip/linear_probe.py
Normal file
@@ -0,0 +1,129 @@
|
|||||||
|
import numpy as np
|
||||||
|
import os
|
||||||
|
from sklearn.linear_model import LogisticRegression
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--dataset", type=str, default="", help="path to dataset")
|
||||||
|
parser.add_argument("--num_step", type=int, default=8, help="number of steps")
|
||||||
|
parser.add_argument("--num_run", type=int, default=10, help="number of runs")
|
||||||
|
parser.add_argument("--feature_dir", type=str, default="clip_feat", help="feature dir path")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
dataset = args.dataset
|
||||||
|
dataset_path = os.path.join(f"{args.feature_dir}", dataset)
|
||||||
|
|
||||||
|
train_file = np.load(os.path.join(dataset_path, "train.npz"))
|
||||||
|
train_feature, train_label = train_file["feature_list"], train_file["label_list"]
|
||||||
|
val_file = np.load(os.path.join(dataset_path, "val.npz"))
|
||||||
|
val_feature, val_label = val_file["feature_list"], val_file["label_list"]
|
||||||
|
test_file = np.load(os.path.join(dataset_path, "test.npz"))
|
||||||
|
test_feature, test_label = test_file["feature_list"], test_file["label_list"]
|
||||||
|
|
||||||
|
os.makedirs("report", exist_ok=True)
|
||||||
|
|
||||||
|
val_shot_list = {1: 1, 2: 2, 4: 4, 8: 4, 16: 4}
|
||||||
|
|
||||||
|
for num_shot in [1, 2, 4, 8, 16]:
|
||||||
|
test_acc_step_list = np.zeros([args.num_run, args.num_step])
|
||||||
|
for seed in range(1, args.num_run + 1):
|
||||||
|
np.random.seed(seed)
|
||||||
|
print(f"-- Seed: {seed} --------------------------------------------------------------")
|
||||||
|
# Sampling
|
||||||
|
all_label_list = np.unique(train_label)
|
||||||
|
selected_idx_list = []
|
||||||
|
for label in all_label_list:
|
||||||
|
label_collection = np.where(train_label == label)[0]
|
||||||
|
selected_idx = np.random.choice(label_collection, size=num_shot, replace=False)
|
||||||
|
selected_idx_list.extend(selected_idx)
|
||||||
|
|
||||||
|
fewshot_train_feature = train_feature[selected_idx_list]
|
||||||
|
fewshot_train_label = train_label[selected_idx_list]
|
||||||
|
|
||||||
|
val_num_shot = val_shot_list[num_shot]
|
||||||
|
val_selected_idx_list = []
|
||||||
|
for label in all_label_list:
|
||||||
|
label_collection = np.where(val_label == label)[0]
|
||||||
|
selected_idx = np.random.choice(label_collection, size=val_num_shot, replace=False)
|
||||||
|
val_selected_idx_list.extend(selected_idx)
|
||||||
|
|
||||||
|
fewshot_val_feature = val_feature[val_selected_idx_list]
|
||||||
|
fewshot_val_label = val_label[val_selected_idx_list]
|
||||||
|
|
||||||
|
# search initialization
|
||||||
|
search_list = [1e6, 1e4, 1e2, 1, 1e-2, 1e-4, 1e-6]
|
||||||
|
acc_list = []
|
||||||
|
for c_weight in search_list:
|
||||||
|
clf = LogisticRegression(solver="lbfgs", max_iter=1000, penalty="l2", C=c_weight).fit(fewshot_train_feature, fewshot_train_label)
|
||||||
|
pred = clf.predict(fewshot_val_feature)
|
||||||
|
acc_val = sum(pred == fewshot_val_label) / len(fewshot_val_label)
|
||||||
|
acc_list.append(acc_val)
|
||||||
|
|
||||||
|
print(acc_list, flush=True)
|
||||||
|
|
||||||
|
# binary search
|
||||||
|
peak_idx = np.argmax(acc_list)
|
||||||
|
c_peak = search_list[peak_idx]
|
||||||
|
c_left, c_right = 1e-1 * c_peak, 1e1 * c_peak
|
||||||
|
|
||||||
|
def binary_search(c_left, c_right, seed, step, test_acc_step_list):
|
||||||
|
clf_left = LogisticRegression(solver="lbfgs", max_iter=1000, penalty="l2", C=c_left).fit(fewshot_train_feature, fewshot_train_label)
|
||||||
|
pred_left = clf_left.predict(fewshot_val_feature)
|
||||||
|
acc_left = sum(pred_left == fewshot_val_label) / len(fewshot_val_label)
|
||||||
|
print("Val accuracy (Left): {:.2f}".format(100 * acc_left), flush=True)
|
||||||
|
|
||||||
|
clf_right = LogisticRegression(solver="lbfgs", max_iter=1000, penalty="l2", C=c_right).fit(fewshot_train_feature, fewshot_train_label)
|
||||||
|
pred_right = clf_right.predict(fewshot_val_feature)
|
||||||
|
acc_right = sum(pred_right == fewshot_val_label) / len(fewshot_val_label)
|
||||||
|
print("Val accuracy (Right): {:.2f}".format(100 * acc_right), flush=True)
|
||||||
|
|
||||||
|
# find maximum and update ranges
|
||||||
|
if acc_left < acc_right:
|
||||||
|
c_final = c_right
|
||||||
|
clf_final = clf_right
|
||||||
|
# range for the next step
|
||||||
|
c_left = 0.5 * (np.log10(c_right) + np.log10(c_left))
|
||||||
|
c_right = np.log10(c_right)
|
||||||
|
else:
|
||||||
|
c_final = c_left
|
||||||
|
clf_final = clf_left
|
||||||
|
# range for the next step
|
||||||
|
c_right = 0.5 * (np.log10(c_right) + np.log10(c_left))
|
||||||
|
c_left = np.log10(c_left)
|
||||||
|
|
||||||
|
pred = clf_final.predict(test_feature)
|
||||||
|
test_acc = 100 * sum(pred == test_label) / len(pred)
|
||||||
|
print("Test Accuracy: {:.2f}".format(test_acc), flush=True)
|
||||||
|
test_acc_step_list[seed - 1, step] = test_acc
|
||||||
|
|
||||||
|
saveline = "{}, seed {}, {} shot, weight {}, test_acc {:.2f}\n".format(dataset, seed, num_shot, c_final, test_acc)
|
||||||
|
with open(
|
||||||
|
"./report/{}_s{}r{}_details.txt".format(args.feature_dir, args.num_step, args.num_run),
|
||||||
|
"a+",
|
||||||
|
) as writer:
|
||||||
|
writer.write(saveline)
|
||||||
|
return (
|
||||||
|
np.power(10, c_left),
|
||||||
|
np.power(10, c_right),
|
||||||
|
seed,
|
||||||
|
step,
|
||||||
|
test_acc_step_list,
|
||||||
|
)
|
||||||
|
|
||||||
|
for step in range(args.num_step):
|
||||||
|
print(
|
||||||
|
f"{dataset}, {num_shot} Shot, Round {step}: {c_left}/{c_right}",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
c_left, c_right, seed, step, test_acc_step_list = binary_search(c_left, c_right, seed, step, test_acc_step_list)
|
||||||
|
# save results of last step
|
||||||
|
test_acc_list = test_acc_step_list[:, -1]
|
||||||
|
acc_mean = np.mean(test_acc_list)
|
||||||
|
acc_std = np.std(test_acc_list)
|
||||||
|
save_line = "{}, {} Shot, Test acc stat: {:.2f} ({:.2f})\n".format(dataset, num_shot, acc_mean, acc_std)
|
||||||
|
print(save_line, flush=True)
|
||||||
|
with open(
|
||||||
|
"./report/{}_s{}r{}.txt".format(args.feature_dir, args.num_step, args.num_run),
|
||||||
|
"a+",
|
||||||
|
) as writer:
|
||||||
|
writer.write(save_line)
|
||||||
10
lpclip/linear_probe.sh
Normal file
10
lpclip/linear_probe.sh
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
feature_dir=clip_feat
|
||||||
|
|
||||||
|
for DATASET in OxfordPets
|
||||||
|
do
|
||||||
|
python linear_probe.py \
|
||||||
|
--dataset ${DATASET} \
|
||||||
|
--feature_dir ${feature_dir} \
|
||||||
|
--num_step 8 \
|
||||||
|
--num_run 3
|
||||||
|
done
|
||||||
174
parse_test_res.py
Normal file
174
parse_test_res.py
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
"""
|
||||||
|
Goal
|
||||||
|
---
|
||||||
|
1. Read test results from log.txt files
|
||||||
|
2. Compute mean and std across different folders (seeds)
|
||||||
|
|
||||||
|
Usage
|
||||||
|
---
|
||||||
|
Assume the output files are saved under output/my_experiment,
|
||||||
|
which contains results of different seeds, e.g.,
|
||||||
|
|
||||||
|
my_experiment/
|
||||||
|
seed1/
|
||||||
|
log.txt
|
||||||
|
seed2/
|
||||||
|
log.txt
|
||||||
|
seed3/
|
||||||
|
log.txt
|
||||||
|
|
||||||
|
Run the following command from the root directory:
|
||||||
|
|
||||||
|
$ python tools/parse_test_res.py output/my_experiment
|
||||||
|
|
||||||
|
Add --ci95 to the argument if you wanna get 95% confidence
|
||||||
|
interval instead of standard deviation:
|
||||||
|
|
||||||
|
$ python tools/parse_test_res.py output/my_experiment --ci95
|
||||||
|
|
||||||
|
If my_experiment/ has the following structure,
|
||||||
|
|
||||||
|
my_experiment/
|
||||||
|
exp-1/
|
||||||
|
seed1/
|
||||||
|
log.txt
|
||||||
|
...
|
||||||
|
seed2/
|
||||||
|
log.txt
|
||||||
|
...
|
||||||
|
seed3/
|
||||||
|
log.txt
|
||||||
|
...
|
||||||
|
exp-2/
|
||||||
|
...
|
||||||
|
exp-3/
|
||||||
|
...
|
||||||
|
|
||||||
|
Run
|
||||||
|
|
||||||
|
$ python tools/parse_test_res.py output/my_experiment --multi-exp
|
||||||
|
"""
|
||||||
|
import re
|
||||||
|
import numpy as np
|
||||||
|
import os.path as osp
|
||||||
|
import argparse
|
||||||
|
from collections import OrderedDict, defaultdict
|
||||||
|
|
||||||
|
from dassl.utils import check_isfile, listdir_nohidden
|
||||||
|
|
||||||
|
|
||||||
|
def compute_ci95(res):
|
||||||
|
return 1.96 * np.std(res) / np.sqrt(len(res))
|
||||||
|
|
||||||
|
|
||||||
|
def parse_function(*metrics, directory="", args=None, end_signal=None):
|
||||||
|
print(f"Parsing files in {directory}")
|
||||||
|
subdirs = listdir_nohidden(directory, sort=True)
|
||||||
|
|
||||||
|
outputs = []
|
||||||
|
|
||||||
|
for subdir in subdirs:
|
||||||
|
fpath = osp.join(directory, subdir, "log.txt")
|
||||||
|
assert check_isfile(fpath)
|
||||||
|
good_to_go = False
|
||||||
|
output = OrderedDict()
|
||||||
|
|
||||||
|
with open(fpath, "r") as f:
|
||||||
|
lines = f.readlines()
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
line = line.strip()
|
||||||
|
|
||||||
|
if line == end_signal:
|
||||||
|
good_to_go = True
|
||||||
|
|
||||||
|
for metric in metrics:
|
||||||
|
match = metric["regex"].search(line)
|
||||||
|
if match and good_to_go:
|
||||||
|
if "file" not in output:
|
||||||
|
output["file"] = fpath
|
||||||
|
num = float(match.group(1))
|
||||||
|
name = metric["name"]
|
||||||
|
output[name] = num
|
||||||
|
|
||||||
|
if output:
|
||||||
|
outputs.append(output)
|
||||||
|
|
||||||
|
assert len(outputs) > 0, f"Nothing found in {directory}"
|
||||||
|
|
||||||
|
metrics_results = defaultdict(list)
|
||||||
|
|
||||||
|
for output in outputs:
|
||||||
|
msg = ""
|
||||||
|
for key, value in output.items():
|
||||||
|
if isinstance(value, float):
|
||||||
|
msg += f"{key}: {value:.2f}%. "
|
||||||
|
else:
|
||||||
|
msg += f"{key}: {value}. "
|
||||||
|
if key != "file":
|
||||||
|
metrics_results[key].append(value)
|
||||||
|
print(msg)
|
||||||
|
|
||||||
|
output_results = OrderedDict()
|
||||||
|
|
||||||
|
print("===")
|
||||||
|
print(f"Summary of directory: {directory}")
|
||||||
|
for key, values in metrics_results.items():
|
||||||
|
avg = np.mean(values)
|
||||||
|
std = compute_ci95(values) if args.ci95 else np.std(values)
|
||||||
|
print(f"* {key}: {avg:.2f}% +- {std:.2f}%")
|
||||||
|
output_results[key] = avg
|
||||||
|
print("===")
|
||||||
|
|
||||||
|
return output_results
|
||||||
|
|
||||||
|
|
||||||
|
def main(args, end_signal):
|
||||||
|
metric = {
|
||||||
|
"name": args.keyword,
|
||||||
|
"regex": re.compile(fr"\* {args.keyword}: ([\.\deE+-]+)%"),
|
||||||
|
}
|
||||||
|
|
||||||
|
if args.multi_exp:
|
||||||
|
final_results = defaultdict(list)
|
||||||
|
|
||||||
|
for directory in listdir_nohidden(args.directory, sort=True):
|
||||||
|
directory = osp.join(args.directory, directory)
|
||||||
|
results = parse_function(
|
||||||
|
metric, directory=directory, args=args, end_signal=end_signal
|
||||||
|
)
|
||||||
|
|
||||||
|
for key, value in results.items():
|
||||||
|
final_results[key].append(value)
|
||||||
|
|
||||||
|
print("Average performance")
|
||||||
|
for key, values in final_results.items():
|
||||||
|
avg = np.mean(values)
|
||||||
|
print(f"* {key}: {avg:.2f}%")
|
||||||
|
|
||||||
|
else:
|
||||||
|
parse_function(
|
||||||
|
metric, directory=args.directory, args=args, end_signal=end_signal
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("directory", type=str, help="path to directory")
|
||||||
|
parser.add_argument(
|
||||||
|
"--ci95", action="store_true", help=r"compute 95\% confidence interval"
|
||||||
|
)
|
||||||
|
parser.add_argument("--test-log", action="store_true", help="parse test-only logs")
|
||||||
|
parser.add_argument(
|
||||||
|
"--multi-exp", action="store_true", help="parse multiple experiments"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--keyword", default="accuracy", type=str, help="which keyword to extract"
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
end_signal = "Finished training"
|
||||||
|
if args.test_log:
|
||||||
|
end_signal = "=> result"
|
||||||
|
|
||||||
|
main(args, end_signal)
|
||||||
3
requirements.txt
Normal file
3
requirements.txt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
ftfy==6.1.1
|
||||||
|
regex
|
||||||
|
tqdm
|
||||||
54
scripts/cocoop/base2new_test.sh
Normal file
54
scripts/cocoop/base2new_test.sh
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
#cd ../..
|
||||||
|
|
||||||
|
# custom config
|
||||||
|
DATA="/path/to/dataset/folder"
|
||||||
|
TRAINER=CoCoOp
|
||||||
|
|
||||||
|
DATASET=$1
|
||||||
|
SEED=$2
|
||||||
|
|
||||||
|
CFG=vit_b16_c4_ep10_batch1_ctxv1
|
||||||
|
SHOTS=16
|
||||||
|
LOADEP=10
|
||||||
|
SUB=new
|
||||||
|
|
||||||
|
|
||||||
|
COMMON_DIR=${DATASET}/shots_${SHOTS}/${TRAINER}/${CFG}/seed${SEED}
|
||||||
|
MODEL_DIR=output/base2new/train_base/${COMMON_DIR}
|
||||||
|
DIR=output/base2new/test_${SUB}/${COMMON_DIR}
|
||||||
|
if [ -d "$DIR" ]; then
|
||||||
|
echo "Evaluating model"
|
||||||
|
echo "Results are available in ${DIR}. Resuming..."
|
||||||
|
|
||||||
|
python train.py \
|
||||||
|
--root ${DATA} \
|
||||||
|
--seed ${SEED} \
|
||||||
|
--trainer ${TRAINER} \
|
||||||
|
--dataset-config-file configs/datasets/${DATASET}.yaml \
|
||||||
|
--config-file configs/trainers/${TRAINER}/${CFG}.yaml \
|
||||||
|
--output-dir ${DIR} \
|
||||||
|
--model-dir ${MODEL_DIR} \
|
||||||
|
--load-epoch ${LOADEP} \
|
||||||
|
--eval-only \
|
||||||
|
DATASET.NUM_SHOTS ${SHOTS} \
|
||||||
|
DATASET.SUBSAMPLE_CLASSES ${SUB}
|
||||||
|
|
||||||
|
else
|
||||||
|
echo "Evaluating model"
|
||||||
|
echo "Runing the first phase job and save the output to ${DIR}"
|
||||||
|
|
||||||
|
python train.py \
|
||||||
|
--root ${DATA} \
|
||||||
|
--seed ${SEED} \
|
||||||
|
--trainer ${TRAINER} \
|
||||||
|
--dataset-config-file configs/datasets/${DATASET}.yaml \
|
||||||
|
--config-file configs/trainers/${TRAINER}/${CFG}.yaml \
|
||||||
|
--output-dir ${DIR} \
|
||||||
|
--model-dir ${MODEL_DIR} \
|
||||||
|
--load-epoch ${LOADEP} \
|
||||||
|
--eval-only \
|
||||||
|
DATASET.NUM_SHOTS ${SHOTS} \
|
||||||
|
DATASET.SUBSAMPLE_CLASSES ${SUB}
|
||||||
|
fi
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user