Release of PromptSRC with pretrained models.

2023-07-13 23:43:31 +05:00
commit 8be7dcff6b
132 changed files with 106641 additions and 0 deletions
--- a/22
+++ b/22
@@ -0,0 +1,22 @@
 MIT License
 Copyright (c) 2023 Muhammad Uzair Khattak
 Copyright (c) 2022 Muhammad Uzair Khattak
 Copyright (c) 2021 Kaiyang Zhou
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,158 @@
 # Self-regulating Prompts: Foundational Model Adaptation without Forgetting
 > [**Self-regulating Prompts: Foundational Model Adaptation without Forgetting**]()<br>
 > [Muhammad Uzair Khattak*](https://muzairkhattak.github.io/), [Syed Talal Wasim*](https://talalwasim.github.io), [Muzammal Naseer](https://scholar.google.com/citations?user=tM9xKA8AAAAJ&hl=en&oi=ao), [Salman Khan](https://salman-h-khan.github.io/), [Ming-Hsuan Yang](http://faculty.ucmerced.edu/mhyang/), [Fahad Shahbaz Khan](https://scholar.google.es/citations?user=zvaeYnUAAAAJ&hl=en)
 *Joint first authors
 [![paper](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)]()
 [![Website](https://img.shields.io/badge/Project-Website-87CEEB)](https://muzairkhattak.github.io/PromptSRC/)
 [![slides](https://img.shields.io/badge/Presentation-Slides-B762C1)](https://drive.google.com/file/d/1d14q8hhAl6qGsiPYpNIVfShMCulVJSUa/view?usp=sharing)
 Official implementation of the paper "[Self-regulating Prompts: Foundational Model Adaptation without Forgetting](https://arxiv.org/abs/2210.03117)".
 <hr />
 # :rocket: News
 * **(July 12, 2023)** 
  * Pre-trained models and evaluation codes for reproducing PromptSRC official benchmark results are released.
  * Training codes for [PromptSRC](configs/trainers/PromptSRC) are released.
  * This repository also supports [MaPle (CVPR'23)](configs/trainers/MaPLe),
 [CoOp (IJCV'22)](configs/trainers/CoOp), [Co-CoOp (CVPR'22)](configs/trainers/CoCoOp) 
 architectures.
 <hr />
 ## Highlights
 ![main figure](docs/main_figure.png)
 > <p align="justify"> <b> <span style="color: blue;">Left</span></b>:
 > Existing prompt learning approaches for foundational Vision-Language models like CLIP rely on task-specific objectives that restrict
 > prompt learning to learn a feature space suitable only for downstream tasks and 
 > consequently lose the generalized knowledge of CLIP (shown in <span style="color: purple;">purple</span></b>). 
 > Our self-regulating framework explicitly guides the training trajectory of prompts
 > towards the closest point between two optimal solution manifolds (solid line) to 
 > learn task-specific representations while also retaining generalized CLIP knowledge
 > (shown in <span style="color: green;">green</span>). <b><span style="color: blue;">Middle</span></b>: Averaged 
 > across 11 image recognition datasets, PromptSRC surpasses existing methods on the
 > base-to-novel generalization setting. <b><span style="color: blue;">Right</span></b>: We evaluate
 > our approach on four diverse image recognition benchmarks for CLIP and show 
 > consistent gains over previous state-of-the-art approaches. </p>
 > **<p align="justify"> Abstract:** *Prompt learning has emerged as an efficient alternative 
 > for fine-tuning foundational models, such as CLIP, for various downstream tasks.
 > Conventionally trained using the task-specific objective, i.e., cross-entropy loss, 
 > prompts tend to overfit downstream data distributions and find it challenging to capture
 > task-agnostic general features from the frozen CLIP. This leads to the loss of the model's 
 > original generalization capability. To address this issue, our work introduces a 
 > self-regularization framework for prompting called PromptSRC (Prompting with Self-regulating 
 > Constraints). PromptSRC guides the prompts to optimize for both task-specific and task-agnostic
 > general representations using a three-pronged approach by: (a) regulating {prompted}
 > representations via mutual agreement maximization with the frozen model, (b) regulating 
 > with self-ensemble of prompts over the training trajectory to encode their complementary
 > strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance
 > with the visual branch. To the best of our knowledge, this is the first regularization 
 > framework for prompt learning that avoids overfitting by jointly attending to pre-trained
 > model features, the training trajectory during prompting, and the textual diversity. 
 > PromptSRC explicitly steers the prompts to learn a representation space that maximizes
 > performance on downstream tasks without compromising CLIP generalization. We perform
 > experiments on 4 benchmarks where PromptSRC performs favorably well compared
 > to the existing methods. Our code and pre-trained models are publicly available.* </p>
 ## Regularization Framework for Prompt Learning
 We propose PromptSRC (Prompting with Self-regulating Constraints) which steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization.
 **Key components of PromptSRC:**
 1) **Mutual agreement maximization:** PromptSRC explicitly guides the prompts to jointly acquire both <i>task-specific knowledge</i> and <i>task-agnostic generalized knowledge</i> by maximizing the mutual agreement between prompted and features of the frozen VL model.
 2) **Gaussian weighted prompt aggregation:** We propose a weighted self-ensembling strategy for prompts over the training trajectory that captures complementary features and enhances their generalization abilities.
 3) **Textual diversity:** PromptSRC regulates prompts with textual diversity to mitigate sample diversity imbalance compared to the visual branch during training.
 ## :ballot_box_with_check: Supported Methods
 | Method                    | Paper                                         |                             Configs                             |        Training Scripts         |
 |---------------------------|:----------------------------------------------|:---------------------------------------------------------------:|:-------------------------------:|
 | PromptSRC                 | [arXiv]()                                     |                    [link](configs/trainers/PromptSRC/)                    |    [link](scripts/promptsrc)    |
 | Independent V-L Prompting | -                                             | [link](configs/trainers/IVLP/) | [link](scripts/independent-vlp) |
  | MaPLe                     | [CVPR 2023](https://arxiv.org/abs/2210.03117) |                  [link](configs/trainers/CoOp)                  |      [link](scripts/maple)      |
 | CoOp                      | [IJCV 2022](https://arxiv.org/abs/2109.01134) |                  [link](configs/trainers/CoOp)                  |      [link](scripts/coop)       |
 | Co-CoOp                   | [CVPR 2022](https://arxiv.org/abs/2203.05557) |                 [link](configs/trainers/CoCoOp)                 |     [link](scripts/cocoop)      |
 <hr />
 ## Results
 Results reported below show accuracy for base and novel classes for across 11 recognition datasets averaged over 3 seeds.
 ### Effectiveness of PromptSRC in comparison with baseline Independent V-L Prompting
 PromptSRC effectively maximizes supervised task performance (base classes) without compromising on CLIP's original generalization towards new unseen tasks (novel classes).
 | Name                                                                            | Base Acc. | Novel Acc. |    HM     | Epochs |  
 |---------------------------------------------------------------------------------|:---------:|:----------:|:---------:|:------:|
 | CLIP  |   69.34   |   74.22    |   71.70   |   -    |  
 | Independent V-L Prompting |   84.21   |   71.79    |   77.51   |   20   | 
 | PromptSRC (ours) | **84.26** | **76.10**  | **79.97** |   20   | 
 ### PromptSRC in comparison with existing state-of-the-art
 | Name                                       | Base Acc. | Novel Acc. |    HM     | Epochs | 
 |--------------------------------------------|:---------:|:----------:|:---------:|:------:|
 | [CLIP](https://arxiv.org/abs/2103.00020)   |   69.34   |   74.22    |   71.70   |   -    |  
 | [CoOp](https://arxiv.org/abs/2109.01134)   |   82.69   |   63.22    |   71.66   |  200   | 
 | [CoCoOp](https://arxiv.org/abs/2203.05557) |   80.47   |   71.69    |   75.83   |   10   | 
 | [ProDA](https://arxiv.org/abs/2205.03340)  |   81.56   |   75.83    |   76.65   |  100   | 
 | [MaPLe](https://arxiv.org/abs/2210.03117)                           |   82.28   | 75.14  | 78.55 |   5    |
 | [PromptSRC (ours)]()                       | **84.26** | **76.10**  | **79.97** |   20   |  
 ## Installation 
 For installation and other package requirements, please follow the instructions detailed in [INSTALL.md](docs/INSTALL.md). 
 ## Data Preparation
 Please follow the instructions at [DATASETS.md](docs/DATASETS.md) to prepare all datasets.
 ## Model Zoo
 ### Vision-Language prompting methods
 | Name  (configs)                                                                       |                                                             Model checkpoints                                                             |
 |---------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------:|
 | [Independent V-L Prompting](configs/trainers/IVLP/vit_b16_c2_ep20_batch4_4+4ctx.yaml) | [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/EuIwh-yMh_JBqB2Y_o8Jl14BPDKDRHC0JBPE1BugIeZiSQ?e=AJ8MhY) |
 | [PromptSRC](configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml)            | [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/EqFXPs2Zl9pKp39w3SqlR7QBDACTv-AgCXH6_cGflrUFwg?e=l33EBA) |
 ## Evaluation 
 Please refer to the [EVAL.md](docs/EVAL.md) for detailed instructions on using the evaluation scripts and reproducing the official results using our pre-trained models.
 ## Training 
 Please refer to the [TRAIN.md](docs/TRAIN.md) for detailed instructions on training PromptSRC and IVLP baseline from scratch.
 <hr />
 ## Citation
 If you find our work, this repository, or pretrained models useful, please consider giving a star :star: and citation.
 ```bibtex
@article{khattak2023PromptSRC,
    title={Self-regulating Prompts: Foundational Model Adaptation without Forgetting},
    author={khattak, Muhammad Uzair and Wasim, Syed Talal and Muzzamal, Naseer and Khan, Salman and Yang, Ming-Hsuan and Khan, Fahad Shahbaz},
    journal={arXiv:},
    year={2023}
 }
 ```
 ## Contact
 If you have any questions, please create an issue on this repository or contact at uzair.khattak@mbzuai.ac.ae or syed.wasim@mbzuai.ac.ae.
 ## Acknowledgements
 Our code is based on [MaPLe](https://github.com/muzairkhattak/multimodal-prompt-learning), along with [Co-CoOp and CoOp](https://github.com/KaiyangZhou/CoOp) repository. We thank the authors for releasing their code. If you use our model and code, please consider citing these works as well.
--- a/clip/init.py
+++ b/clip/init.py
@@ -0,0 +1 @@
 from .clip import *
--- a/clip/pycache/init.cpython-37.pyc
+++ b/clip/pycache/init.cpython-37.pyc
--- a/clip/pycache/clip.cpython-37.pyc
+++ b/clip/pycache/clip.cpython-37.pyc
--- a/clip/pycache/model.cpython-37.pyc
+++ b/clip/pycache/model.cpython-37.pyc
--- a/clip/pycache/simple_tokenizer.cpython-37.pyc
+++ b/clip/pycache/simple_tokenizer.cpython-37.pyc
--- a/clip/bpe_simple_vocab_16e6.txt.gz
+++ b/clip/bpe_simple_vocab_16e6.txt.gz
--- a/clip/clip.py
+++ b/clip/clip.py
@@ -0,0 +1,221 @@
 import hashlib
 import os
 import urllib
 import warnings
 from typing import Union, List
 import torch
 from PIL import Image
 from torchvision.transforms import Compose, Resize, CenterCrop, ToTensor, Normalize
 from tqdm import tqdm
 from .model import build_model
 from .simple_tokenizer import SimpleTokenizer as _Tokenizer
 try:
    from torchvision.transforms import InterpolationMode
    BICUBIC = InterpolationMode.BICUBIC
 except ImportError:
    BICUBIC = Image.BICUBIC
 if torch.__version__.split(".") < ["1", "7", "1"]:
    warnings.warn("PyTorch version 1.7.1 or higher is recommended")
 __all__ = ["available_models", "load", "tokenize"]
 _tokenizer = _Tokenizer()
 _MODELS = {
    "RN50": "https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt",
    "RN101": "https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt",
    "RN50x4": "https://openaipublic.azureedge.net/clip/models/7e526bd135e493cef0776de27d5f42653e6b4c8bf9e0f653bb11773263205fdd/RN50x4.pt",
    "RN50x16": "https://openaipublic.azureedge.net/clip/models/52378b407f34354e150460fe41077663dd5b39c54cd0bfd2b27167a4a06ec9aa/RN50x16.pt",
    "ViT-B/32": "https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt",
    "ViT-B/16": "https://openaipublic.azureedge.net/clip/models/5806e77cd80f8b59890b7e101eabd078d9fb84e6937f9e85e4ecb61988df416f/ViT-B-16.pt",
 }
 def _download(url: str, root: str = os.path.expanduser("~/.cache/clip")):
    os.makedirs(root, exist_ok=True)
    filename = os.path.basename(url)
    expected_sha256 = url.split("/")[-2]
    download_target = os.path.join(root, filename)
    if os.path.exists(download_target) and not os.path.isfile(download_target):
        raise RuntimeError(f"{download_target} exists and is not a regular file")
    if os.path.isfile(download_target):
        if hashlib.sha256(open(download_target, "rb").read()).hexdigest() == expected_sha256:
            return download_target
        else:
            warnings.warn(f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file")
    with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
        with tqdm(total=int(source.info().get("Content-Length")), ncols=80, unit='iB', unit_scale=True) as loop:
            while True:
                buffer = source.read(8192)
                if not buffer:
                    break
                output.write(buffer)
                loop.update(len(buffer))
    if hashlib.sha256(open(download_target, "rb").read()).hexdigest() != expected_sha256:
        raise RuntimeError(f"Model has been downloaded but the SHA256 checksum does not not match")
    return download_target
 def _transform(n_px):
    return Compose([
        Resize(n_px, interpolation=BICUBIC),
        CenterCrop(n_px),
        lambda image: image.convert("RGB"),
        ToTensor(),
        Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711)),
    ])
 def available_models() -> List[str]:
    """Returns the names of available CLIP models"""
    return list(_MODELS.keys())
 def load(name: str, device: Union[str, torch.device] = "cuda" if torch.cuda.is_available() else "cpu", jit=False):
    """Load a CLIP model
    Parameters
    ----------
    name : str
        A model name listed by `clip.available_models()`, or the path to a model checkpoint containing the state_dict
    device : Union[str, torch.device]
        The device to put the loaded model
    jit : bool
        Whether to load the optimized JIT model or more hackable non-JIT model (default).
    Returns
    -------
    model : torch.nn.Module
        The CLIP model
    preprocess : Callable[[PIL.Image], torch.Tensor]
        A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
    """
    if name in _MODELS:
        model_path = _download(_MODELS[name])
    elif os.path.isfile(name):
        model_path = name
    else:
        raise RuntimeError(f"Model {name} not found; available models = {available_models()}")
    try:
        # loading JIT archive
        model = torch.jit.load(model_path, map_location=device if jit else "cpu").eval()
        state_dict = None
    except RuntimeError:
        # loading saved state dict
        if jit:
            warnings.warn(f"File {model_path} is not a JIT archive. Loading as a state dict instead")
            jit = False
        state_dict = torch.load(model_path, map_location="cpu")
    if not jit:
        model = build_model(state_dict or model.state_dict()).to(device)
        if str(device) == "cpu":
            model.float()
        return model, _transform(model.visual.input_resolution)
    # patch the device names
    device_holder = torch.jit.trace(lambda: torch.ones([]).to(torch.device(device)), example_inputs=[])
    device_node = [n for n in device_holder.graph.findAllNodes("prim::Constant") if "Device" in repr(n)][-1]
    def patch_device(module):
        try:
            graphs = [module.graph] if hasattr(module, "graph") else []
        except RuntimeError:
            graphs = []
        if hasattr(module, "forward1"):
            graphs.append(module.forward1.graph)
        for graph in graphs:
            for node in graph.findAllNodes("prim::Constant"):
                if "value" in node.attributeNames() and str(node["value"]).startswith("cuda"):
                    node.copyAttributes(device_node)
    model.apply(patch_device)
    patch_device(model.encode_image)
    patch_device(model.encode_text)
    # patch dtype to float32 on CPU
    if str(device) == "cpu":
        float_holder = torch.jit.trace(lambda: torch.ones([]).float(), example_inputs=[])
        float_input = list(float_holder.graph.findNode("aten::to").inputs())[1]
        float_node = float_input.node()
        def patch_float(module):
            try:
                graphs = [module.graph] if hasattr(module, "graph") else []
            except RuntimeError:
                graphs = []
            if hasattr(module, "forward1"):
                graphs.append(module.forward1.graph)
            for graph in graphs:
                for node in graph.findAllNodes("aten::to"):
                    inputs = list(node.inputs())
                    for i in [1, 2]:  # dtype can be the second or third argument to aten::to()
                        if inputs[i].node()["value"] == 5:
                            inputs[i].node().copyAttributes(float_node)
        model.apply(patch_float)
        patch_float(model.encode_image)
        patch_float(model.encode_text)
        model.float()
    return model, _transform(model.input_resolution.item())
 def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate: bool = False) -> torch.LongTensor:
    """
    Returns the tokenized representation of given input string(s)
    Parameters
    ----------
    texts : Union[str, List[str]]
        An input string or a list of input strings to tokenize
    context_length : int
        The context length to use; all CLIP models use 77 as the context length
    truncate: bool
        Whether to truncate the text in case its encoding is longer than the context length
    Returns
    -------
    A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]
    """
    if isinstance(texts, str):
        texts = [texts]
    sot_token = _tokenizer.encoder["<|startoftext|>"]
    eot_token = _tokenizer.encoder["<|endoftext|>"]
    all_tokens = [[sot_token] + _tokenizer.encode(text) + [eot_token] for text in texts]
    result = torch.zeros(len(all_tokens), context_length, dtype=torch.long)
    for i, tokens in enumerate(all_tokens):
        if len(tokens) > context_length:
            if truncate:
                tokens = tokens[:context_length]
                tokens[-1] = eot_token
            else:
                raise RuntimeError(f"Input {texts[i]} is too long for context length {context_length}")
        result[i, :len(tokens)] = torch.tensor(tokens)
    return result
--- a/clip/model.py
+++ b/clip/model.py
@@ -0,0 +1,699 @@
 from collections import OrderedDict
 from typing import Tuple, Union
 import numpy as np
 import torch
 import torch.nn.functional as F
 from torch import nn
 class Bottleneck(nn.Module):
    expansion = 4
    def __init__(self, inplanes, planes, stride=1):
        super().__init__()
        # all conv layers have stride 1. an avgpool is performed after the second convolution when stride > 1
        self.conv1 = nn.Conv2d(inplanes, planes, 1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, 3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.avgpool = nn.AvgPool2d(stride) if stride > 1 else nn.Identity()
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, 1, bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = None
        self.stride = stride
        if stride > 1 or inplanes != planes * Bottleneck.expansion:
            # downsampling layer is prepended with an avgpool, and the subsequent convolution has stride 1
            self.downsample = nn.Sequential(OrderedDict([
                ("-1", nn.AvgPool2d(stride)),
                ("0", nn.Conv2d(inplanes, planes * self.expansion, 1, stride=1, bias=False)),
                ("1", nn.BatchNorm2d(planes * self.expansion))
            ]))
    def forward(self, x: torch.Tensor):
        identity = x
        out = self.relu(self.bn1(self.conv1(x)))
        out = self.relu(self.bn2(self.conv2(out)))
        out = self.avgpool(out)
        out = self.bn3(self.conv3(out))
        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity
        out = self.relu(out)
        return out
 class AttentionPool2d(nn.Module):
    def __init__(self, spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None):
        super().__init__()
        self.positional_embedding = nn.Parameter(torch.randn(spacial_dim ** 2 + 1, embed_dim) / embed_dim ** 0.5)
        self.k_proj = nn.Linear(embed_dim, embed_dim)
        self.q_proj = nn.Linear(embed_dim, embed_dim)
        self.v_proj = nn.Linear(embed_dim, embed_dim)
        self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim)
        self.num_heads = num_heads
    def forward(self, x):
        x = x.reshape(x.shape[0], x.shape[1], x.shape[2] * x.shape[3]).permute(2, 0, 1)  # NCHW -> (HW)NC
        x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0)  # (HW+1)NC
        x = x + self.positional_embedding[:, None, :].to(x.dtype)  # (HW+1)NC
        x, _ = F.multi_head_attention_forward(
            query=x, key=x, value=x,
            embed_dim_to_check=x.shape[-1],
            num_heads=self.num_heads,
            q_proj_weight=self.q_proj.weight,
            k_proj_weight=self.k_proj.weight,
            v_proj_weight=self.v_proj.weight,
            in_proj_weight=None,
            in_proj_bias=torch.cat([self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]),
            bias_k=None,
            bias_v=None,
            add_zero_attn=False,
            dropout_p=0,
            out_proj_weight=self.c_proj.weight,
            out_proj_bias=self.c_proj.bias,
            use_separate_proj_weight=True,
            training=self.training,
            need_weights=False
        )
        return x[0]
 class ModifiedResNet(nn.Module):
    """
    A ResNet class that is similar to torchvision's but contains the following changes:
    - There are now 3 "stem" convolutions as opposed to 1, with an average pool instead of a max pool.
    - Performs anti-aliasing strided convolutions, where an avgpool is prepended to convolutions with stride > 1
    - The final pooling layer is a QKV attention instead of an average pool
    """
    def __init__(self, layers, output_dim, heads, input_resolution=224, width=64):
        super().__init__()
        self.output_dim = output_dim
        self.input_resolution = input_resolution
        # the 3-layer stem
        self.conv1 = nn.Conv2d(3, width // 2, kernel_size=3, stride=2, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(width // 2)
        self.conv2 = nn.Conv2d(width // 2, width // 2, kernel_size=3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(width // 2)
        self.conv3 = nn.Conv2d(width // 2, width, kernel_size=3, padding=1, bias=False)
        self.bn3 = nn.BatchNorm2d(width)
        self.avgpool = nn.AvgPool2d(2)
        self.relu = nn.ReLU(inplace=True)
        # residual layers
        self._inplanes = width  # this is a *mutable* variable used during construction
        self.layer1 = self._make_layer(width, layers[0])
        self.layer2 = self._make_layer(width * 2, layers[1], stride=2)
        self.layer3 = self._make_layer(width * 4, layers[2], stride=2)
        self.layer4 = self._make_layer(width * 8, layers[3], stride=2)
        embed_dim = width * 32  # the ResNet feature dimension
        self.attnpool = AttentionPool2d(input_resolution // 32, embed_dim, heads, output_dim)
    def _make_layer(self, planes, blocks, stride=1):
        layers = [Bottleneck(self._inplanes, planes, stride)]
        self._inplanes = planes * Bottleneck.expansion
        for _ in range(1, blocks):
            layers.append(Bottleneck(self._inplanes, planes))
        return nn.Sequential(*layers)
    def forward(self, x):
        def stem(x):
            for conv, bn in [(self.conv1, self.bn1), (self.conv2, self.bn2), (self.conv3, self.bn3)]:
                x = self.relu(bn(conv(x)))
            x = self.avgpool(x)
            return x
        x = x.type(self.conv1.weight.dtype)
        x = stem(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.attnpool(x)
        return x
 class LayerNorm(nn.LayerNorm):
    """Subclass torch's LayerNorm to handle fp16."""
    def forward(self, x: torch.Tensor):
        orig_type = x.dtype
        ret = super().forward(x.type(torch.float32))
        return ret.type(orig_type)
 class QuickGELU(nn.Module):
    def forward(self, x: torch.Tensor):
        return x * torch.sigmoid(1.702 * x)
 class ResidualAttentionBlock(nn.Module):
    def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None):
        super().__init__()
        self.attn = nn.MultiheadAttention(d_model, n_head)
        self.ln_1 = LayerNorm(d_model)
        self.mlp = nn.Sequential(OrderedDict([
            ("c_fc", nn.Linear(d_model, d_model * 4)),
            ("gelu", QuickGELU()),
            ("c_proj", nn.Linear(d_model * 4, d_model))
        ]))
        self.ln_2 = LayerNorm(d_model)
        self.attn_mask = attn_mask
    def attention(self, x: torch.Tensor):
        self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
        return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
    def forward(self, x: torch.Tensor):
        x = x + self.attention(self.ln_1(x))
        x = x + self.mlp(self.ln_2(x))
        return x
 class ResidualAttentionBlock_IVLP(nn.Module):
    def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None, add_prompt=False,
                 text_layer=False, i=0, design_details=None):
        super().__init__()
        self.attn = nn.MultiheadAttention(d_model, n_head)
        self.ln_1 = LayerNorm(d_model)
        self.mlp = nn.Sequential(OrderedDict([
            ("c_fc", nn.Linear(d_model, d_model * 4)),
            ("gelu", QuickGELU()),
            ("c_proj", nn.Linear(d_model * 4, d_model))
        ]))
        self.ln_2 = LayerNorm(d_model)
        # Only add learnable tokens if flag is set True
        # For the first iteration i, we should not add the learnable parameters
        # as it is already been taken care of in the very start, for both text
        # and the visual branch
        self.text_layer = text_layer
        self.attn_mask = attn_mask
        if i != 0:
            self.add_prompt = add_prompt
            if self.add_prompt:
                if self.text_layer:
                    self.n_ctx_text = design_details["language_ctx"]  # hyperparameter
                    ctx_vectors = torch.empty(self.n_ctx_text, d_model)
                else:
                    self.n_ctx_visual = design_details["vision_ctx"]  # hyperparameter
                    ctx_vectors = torch.empty(self.n_ctx_visual, d_model)
                # Code snippet for per layer visual prompts
                nn.init.normal_(ctx_vectors, std=0.02)
                self.VPT_shallow = nn.Parameter(ctx_vectors)
        else:
            self.add_prompt = False
    def attention(self, x: torch.Tensor):
        self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
        return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
    def forward(self, x: torch.Tensor):
        # Will need to append the learnable tokens for this layer here
        # Check if flag was set for this layer or not
        if self.add_prompt:
            # Also see if this is textual transformer layer or not
            if not self.text_layer:
                # Remove the outputs produced by learnable tokens of previous layer
                prefix = x[0:x.shape[0] - self.n_ctx_visual, :, :]
                # Create/configure learnable tokens of this layer
                visual_context = self.VPT_shallow.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
                # Add the learnable tokens of this layer with the input, by replacing the previous
                # layer learnable tokens
                x = torch.cat([prefix, visual_context], dim=0)
            else:
                # Appending the learnable tokens in different way
                # x -> [77, NCLS, DIM]
                # First remove the learnable tokens from previous layer
                prefix = x[:1, :, :]
                suffix = x[1 + self.n_ctx_text:, :, :]
                # Create/configure learnable tokens of this layer
                textual_context = self.VPT_shallow.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
                # Add the learnable tokens of this layer with the input, replaced by previous
                # layer learnable tokens
                x = torch.cat([prefix, textual_context, suffix], dim=0)
        x = x + self.attention(self.ln_1(x))
        x = x + self.mlp(self.ln_2(x))
        return x
 class ResidualAttentionBlock_MaPLe(nn.Module):
    def __init__(self, d_model: int, n_head: int, attn_mask: torch.Tensor = None, design_details=None,
                 text_layer=False, i=0):
        super().__init__()
        self.attn = nn.MultiheadAttention(d_model, n_head)
        self.ln_1 = LayerNorm(d_model)
        self.mlp = nn.Sequential(OrderedDict([
            ("c_fc", nn.Linear(d_model, d_model * 4)),
            ("gelu", QuickGELU()),
            ("c_proj", nn.Linear(d_model * 4, d_model))
        ]))
        self.ln_2 = LayerNorm(d_model)
        # For the first iteration i, we do not need to add the learnable parameters here
        # as it will be added in the beginning, for both text and the vision branch
        self.text_layer = text_layer
        self.attn_mask = attn_mask
        # This must be consistent with the config file prompt
        self.compound_prompt_nctx = design_details['maple_length']
        if i == 0:
            self.first_layer = True
        else:
            self.first_layer = False
    def attention(self, x: torch.Tensor):
        self.attn_mask = self.attn_mask.to(dtype=x.dtype, device=x.device) if self.attn_mask is not None else None
        return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
    def forward(self, inputs):
        # For the first layer, we do not need to add any duplicate, as it is already added
        # as the shallow version
        x = inputs[0]
        compound_prompts_deeper = inputs[1]
        counter = inputs[2]
        if not self.first_layer:
            if len(compound_prompts_deeper) > 0:
                # This means that deeper compound prompts are turned on
                # Here it behaves differently for text and visual side
                # Forward function is same for both
                if not self.text_layer:
                    # First check if the ith layer needs compound prompts or not
                    if not (counter > len(compound_prompts_deeper) - 1):
                        # Remove the outputs produced by learnable tokens of previous layer
                        prefix = x[0:x.shape[0] - self.compound_prompt_nctx, :, :]
                        # Create/configure learnable tokens of this layer
                        visual_context = compound_prompts_deeper[counter]  # extract the correct index
                        visual_context = visual_context.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
                        # Add the learnable tokens of this layer with the input, by replacing previous
                        # layer learnable tokens
                        x = torch.cat([prefix, visual_context], dim=0)
                        # Once done, update the counter, so that the next time, it does not use same learnable tokens
                        counter += 1
                else:
                    # First check if the ith layer needs compound prompts or not
                    if not (counter > len(compound_prompts_deeper) - 1):
                        # Appending the learnable tokens in different way
                        # x -> [77, NCLS, DIM]
                        # First remove the learnable tokens from previous layer
                        prefix = x[:1, :, :]
                        suffix = x[1 + self.compound_prompt_nctx:, :, :]
                        # Create/configure learnable tokens of this layer
                        textual_context = compound_prompts_deeper[counter]
                        textual_context = textual_context.expand(x.shape[1], -1, -1).permute(1, 0, 2).half()
                        # Add the learnable tokens of this layer with the input, replaced by previous
                        # layer learnable tokens
                        x = torch.cat([prefix, textual_context, suffix], dim=0)
                        # Once done, update the counter, so that the next time, it does not use same learnable tokens
                        counter += 1
        x = x + self.attention(self.ln_1(x))
        x = x + self.mlp(self.ln_2(x))
        return [x, compound_prompts_deeper, counter]  # return again as a list, so that nn.seq can work
 class Transformer(nn.Module):
    def __init__(self, width: int, layers: int, heads: int, attn_mask: torch.Tensor = None, prompts_needed=0,
                 text_layer=False, design_details=None):
        super().__init__()
        self.width = width
        self.layers = layers
        # Implements respective encoder blocks for a given design choice
        current_trainer = design_details['trainer']
        if current_trainer == 'IVLP' or current_trainer == 'VPT':
            self.resblocks = nn.Sequential(*[ResidualAttentionBlock_IVLP(width, heads, attn_mask, True,
                                                                         text_layer, i,
                                                                         design_details) if prompts_needed > i
                                             else ResidualAttentionBlock_IVLP(width, heads, attn_mask, False,
                                                                              text_layer, i, design_details)
                                             for i in range(layers)])
        elif current_trainer == 'MaPLe':
            self.resblocks = nn.Sequential(
                *[ResidualAttentionBlock_MaPLe(width, heads, attn_mask, design_details, text_layer, i)
                  for i in range(layers)])
        else:
            # Corresponds to default CoOp or CoCoOp
            assert current_trainer == 'CoOp' or current_trainer == 'CoCoOp'
            self.resblocks = nn.Sequential(*[ResidualAttentionBlock(width, heads, attn_mask) for _ in range(layers)])
    def forward(self, x: torch.Tensor):
        return self.resblocks(x)
 class VisionTransformer(nn.Module):
    def __init__(self, input_resolution: int, patch_size: int, width: int, layers: int, heads: int,
                 output_dim: int, design_details):
        super().__init__()
        self.input_resolution = input_resolution
        self.output_dim = output_dim
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=width, kernel_size=patch_size, stride=patch_size, bias=False)
        if design_details["vision_depth"] == 0:
            self.VPT_shallow = False
        else:
            self.VPT_shallow = True
        if self.VPT_shallow:
            # Add visual prompt tokens here
            n_ctx = design_details["vision_ctx"]  # hyperparameter
            ctx_vectors = torch.empty(n_ctx, width)
            nn.init.normal_(ctx_vectors, std=0.02)
            self.VPT = nn.Parameter(ctx_vectors)
            # self.VPT.half()
        scale = width ** -0.5
        self.class_embedding = nn.Parameter(scale * torch.randn(width))
        self.positional_embedding = nn.Parameter(scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width))
        self.ln_pre = LayerNorm(width)
        # hyper-parameter if need to add prompt embeddings inside to the input
        # of transformer block or not:
        self.prompt_till_layer_visual = design_details["vision_depth"]
        self.transformer = Transformer(width, layers, heads, prompts_needed=self.prompt_till_layer_visual,
                                       design_details=design_details)
        self.ln_post = LayerNorm(width)
        self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
    def forward(self, x: torch.Tensor):
        x = self.conv1(x)  # shape = [*, width, grid, grid]
        x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
        x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]
        x = torch.cat(
            [self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype,
                                                            device=x.device),
             x], dim=1)  # shape = [*, grid ** 2 + 1, width]
        x = x + self.positional_embedding.to(x.dtype)
        # After positional embeddings, we will attach prompts with the model, remember only those
        # are trainable parameters here in whole image encoder.
        if self.VPT_shallow:
            visual_ctx = self.VPT.expand(x.shape[0], -1, -1).half()
            x = torch.cat([x, visual_ctx], dim=1)
        else:
            assert self.prompt_till_layer_visual == 0
        # Normal code as before
        x = self.ln_pre(x)
        x = x.permute(1, 0, 2)  # NLD -> LND
        x = self.transformer(x)
        x = x.permute(1, 0, 2)  # LND -> NLD
        x = self.ln_post(x[:, 0, :])
        if self.proj is not None:
            x = x @ self.proj
        return x
 class VisionTransformer_MaPLe(nn.Module):
    def __init__(self, input_resolution: int, patch_size: int, width: int, layers: int, heads: int, output_dim: int,
                 design_details):
        super().__init__()
        self.input_resolution = input_resolution
        self.output_dim = output_dim
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=width, kernel_size=patch_size, stride=patch_size, bias=False)
        self.VPT_shallow = True
        scale = width ** -0.5
        self.class_embedding = nn.Parameter(scale * torch.randn(width))
        self.positional_embedding = nn.Parameter(scale * torch.randn((input_resolution // patch_size) ** 2 + 1, width))
        self.ln_pre = LayerNorm(width)
        # hyper-parameter if need to add prompt embeddings inside to the input
        # of transformer block or not:
        self.prompt_till_layer_visual = 0
        self.transformer = Transformer(width, layers, heads, design_details=design_details)
        self.ln_post = LayerNorm(width)
        self.proj = nn.Parameter(scale * torch.randn(width, output_dim))
    def forward(self, x: torch.Tensor, shared_ctx, compound_deeper_prompts):
        x = self.conv1(x)  # shape = [*, width, grid, grid]
        x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]
        x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]
        x = torch.cat(
            [self.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device),
             x], dim=1)  # shape = [*, grid ** 2 + 1, width]
        x = x + self.positional_embedding.to(x.dtype)
        # After positional embeddings, we will attach prompts with the model, remember only those
        # are trainable parameters here in whole image encoder.
        if self.VPT_shallow:
            visual_ctx = shared_ctx.expand(x.shape[0], -1, -1).half()
            x = torch.cat([x, visual_ctx], dim=1)
        else:
            assert self.prompt_till_layer_visual == 0
        # Normal code as before
        x = self.ln_pre(x)
        x = x.permute(1, 0, 2)  # NLD -> LND
        # Again combine the inputs, so nn.sequential can work
        outputs = self.transformer([x, compound_deeper_prompts, 0])  # third argument is counter
        x = outputs[0]
        x = x.permute(1, 0, 2)  # LND -> NLD
        x = self.ln_post(x[:, 0, :])
        if self.proj is not None:
            x = x @ self.proj
        return x
 class CLIP(nn.Module):
    def __init__(self,
                 embed_dim: int,
                 # vision
                 image_resolution: int,
                 vision_layers: Union[Tuple[int, int, int, int], int],
                 vision_width: int,
                 vision_patch_size: int,
                 # text
                 context_length: int,
                 vocab_size: int,
                 transformer_width: int,
                 transformer_heads: int,
                 transformer_layers: int,
                 design_details
                 ):
        super().__init__()
        self.context_length = context_length
        trainer = design_details['trainer']
        if isinstance(vision_layers, (tuple, list)):
            vision_heads = vision_width * 32 // 64
            self.visual = ModifiedResNet(
                layers=vision_layers,
                output_dim=embed_dim,
                heads=vision_heads,
                input_resolution=image_resolution,
                width=vision_width
            )
        else:
            vision_heads = vision_width // 64
            if trainer == "MaPLe":
                self.visual = VisionTransformer_MaPLe(
                    input_resolution=image_resolution,
                    patch_size=vision_patch_size,
                    width=vision_width,
                    layers=vision_layers,
                    heads=vision_heads,
                    output_dim=embed_dim,
                    design_details=design_details
                )
            else:
                self.visual = VisionTransformer(
                    input_resolution=image_resolution,
                    patch_size=vision_patch_size,
                    width=vision_width,
                    layers=vision_layers,
                    heads=vision_heads,
                    output_dim=embed_dim,
                    design_details=design_details
                )
        # hyper-parameter if need to add prompt embeddings inside to the input
        # of transformer block or not:
        prompt_till_layer_text = design_details['language_depth']
        self.transformer = Transformer(
            width=transformer_width,
            layers=transformer_layers,
            heads=transformer_heads,
            attn_mask=self.build_attention_mask(),
            prompts_needed=prompt_till_layer_text,
            text_layer=True,
            design_details=design_details
        )
        self.vocab_size = vocab_size
        self.token_embedding = nn.Embedding(vocab_size, transformer_width)
        self.positional_embedding = nn.Parameter(torch.empty(self.context_length, transformer_width))
        self.ln_final = LayerNorm(transformer_width)
        self.text_projection = nn.Parameter(torch.empty(transformer_width, embed_dim))
        self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / 0.07))
        self.initialize_parameters()
    def initialize_parameters(self):
        nn.init.normal_(self.token_embedding.weight, std=0.02)
        nn.init.normal_(self.positional_embedding, std=0.01)
        if isinstance(self.visual, ModifiedResNet):
            if self.visual.attnpool is not None:
                std = self.visual.attnpool.c_proj.in_features ** -0.5
                nn.init.normal_(self.visual.attnpool.q_proj.weight, std=std)
                nn.init.normal_(self.visual.attnpool.k_proj.weight, std=std)
                nn.init.normal_(self.visual.attnpool.v_proj.weight, std=std)
                nn.init.normal_(self.visual.attnpool.c_proj.weight, std=std)
            for resnet_block in [self.visual.layer1, self.visual.layer2, self.visual.layer3, self.visual.layer4]:
                for name, param in resnet_block.named_parameters():
                    if name.endswith("bn3.weight"):
                        nn.init.zeros_(param)
        proj_std = (self.transformer.width ** -0.5) * ((2 * self.transformer.layers) ** -0.5)
        attn_std = self.transformer.width ** -0.5
        fc_std = (2 * self.transformer.width) ** -0.5
        for block in self.transformer.resblocks:
            nn.init.normal_(block.attn.in_proj_weight, std=attn_std)
            nn.init.normal_(block.attn.out_proj.weight, std=proj_std)
            nn.init.normal_(block.mlp.c_fc.weight, std=fc_std)
            nn.init.normal_(block.mlp.c_proj.weight, std=proj_std)
        if self.text_projection is not None:
            nn.init.normal_(self.text_projection, std=self.transformer.width ** -0.5)
    def build_attention_mask(self):
        # lazily create causal attention mask, with full attention between the vision tokens
        # pytorch uses additive attention mask; fill with -inf
        mask = torch.empty(self.context_length, self.context_length)
        mask.fill_(float("-inf"))
        mask.triu_(1)  # zero out the lower diagonal
        return mask
    @property
    def dtype(self):
        return self.visual.conv1.weight.dtype
    def encode_image(self, image):
        return self.visual(image.type(self.dtype))
    def encode_text(self, text):
        x = self.token_embedding(text).type(self.dtype)  # [batch_size, n_ctx, d_model]
        x = x + self.positional_embedding.type(self.dtype)
        x = x.permute(1, 0, 2)  # NLD -> LND
        x = self.transformer(x)
        x = x.permute(1, 0, 2)  # LND -> NLD
        x = self.ln_final(x).type(self.dtype)
        # x.shape = [batch_size, n_ctx, transformer.width]
        # take features from the eot embedding (eot_token is the highest number in each sequence)
        x = x[torch.arange(x.shape[0]), text.argmax(dim=-1)] @ self.text_projection
        return x
    def forward(self, image, text):
        image_features = self.encode_image(image)
        text_features = self.encode_text(text)
        # normalized features
        image_features = image_features / image_features.norm(dim=-1, keepdim=True)
        text_features = text_features / text_features.norm(dim=-1, keepdim=True)
        # cosine similarity as logits
        logit_scale = self.logit_scale.exp()
        logits_per_image = logit_scale * image_features @ text_features.t()
        logits_per_text = logit_scale * text_features @ image_features.t()
        # shape = [global_batch_size, global_batch_size]
        return logits_per_image, logits_per_text
 def convert_weights(model: nn.Module):
    """Convert applicable model parameters to fp16"""
    def _convert_weights_to_fp16(l):
        if isinstance(l, (nn.Conv1d, nn.Conv2d, nn.Linear)):
            l.weight.data = l.weight.data.half()
            if l.bias is not None:
                l.bias.data = l.bias.data.half()
        if isinstance(l, nn.MultiheadAttention):
            for attr in [*[f"{s}_proj_weight" for s in ["in", "q", "k", "v"]], "in_proj_bias", "bias_k", "bias_v"]:
                tensor = getattr(l, attr)
                if tensor is not None:
                    tensor.data = tensor.data.half()
        for name in ["text_projection", "proj"]:
            if hasattr(l, name):
                attr = getattr(l, name)
                if attr is not None:
                    attr.data = attr.data.half()
    model.apply(_convert_weights_to_fp16)
 def build_model(state_dict: dict, design_details):
    vit = "visual.proj" in state_dict
    if vit:
        vision_width = state_dict["visual.conv1.weight"].shape[0]
        vision_layers = len(
            [k for k in state_dict.keys() if k.startswith("visual.") and k.endswith(".attn.in_proj_weight")])
        vision_patch_size = state_dict["visual.conv1.weight"].shape[-1]
        grid_size = round((state_dict["visual.positional_embedding"].shape[0] - 1) ** 0.5)
        image_resolution = vision_patch_size * grid_size
    else:
        counts: list = [len(set(k.split(".")[2] for k in state_dict if k.startswith(f"visual.layer{b}"))) for b in
                        [1, 2, 3, 4]]
        vision_layers = tuple(counts)
        vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
        output_width = round((state_dict["visual.attnpool.positional_embedding"].shape[0] - 1) ** 0.5)
        vision_patch_size = None
        assert output_width ** 2 + 1 == state_dict["visual.attnpool.positional_embedding"].shape[0]
        image_resolution = output_width * 32
    embed_dim = state_dict["text_projection"].shape[1]
    context_length = state_dict["positional_embedding"].shape[0]
    vocab_size = state_dict["token_embedding.weight"].shape[0]
    transformer_width = state_dict["ln_final.weight"].shape[0]
    transformer_heads = transformer_width // 64
    transformer_layers = len(set(k.split(".")[2] for k in state_dict if k.startswith(f"transformer.resblocks")))
    model = CLIP(
        embed_dim,
        image_resolution, vision_layers, vision_width, vision_patch_size,
        context_length, vocab_size, transformer_width, transformer_heads, transformer_layers, design_details
    )
    for key in ["input_resolution", "context_length", "vocab_size"]:
        if key in state_dict:
            del state_dict[key]
    convert_weights(model)
    try:
        model.load_state_dict(state_dict)
    except:
        missing_keys, _ = model.load_state_dict(state_dict, strict=False)
        print('Weights not found for some missing keys: ', missing_keys)
    return model.eval()
--- a/clip/simple_tokenizer.py
+++ b/clip/simple_tokenizer.py
@@ -0,0 +1,132 @@
 import gzip
 import html
 import os
 from functools import lru_cache
 import ftfy
 import regex as re
@lru_cache()
 def default_bpe():
    return os.path.join(os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz")
@lru_cache()
 def bytes_to_unicode():
    """
    Returns list of utf-8 byte and a corresponding list of unicode strings.
    The reversible bpe codes work on unicode strings.
    This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.
    When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.
    This is a signficant percentage of your normal, say, 32K bpe vocab.
    To avoid that, we want lookup tables between utf-8 bytes and unicode strings.
    And avoids mapping to whitespace/control characters the bpe code barfs on.
    """
    bs = list(range(ord("!"), ord("~")+1))+list(range(ord("¡"), ord("¬")+1))+list(range(ord("®"), ord("ÿ")+1))
    cs = bs[:]
    n = 0
    for b in range(2**8):
        if b not in bs:
            bs.append(b)
            cs.append(2**8+n)
            n += 1
    cs = [chr(n) for n in cs]
    return dict(zip(bs, cs))
 def get_pairs(word):
    """Return set of symbol pairs in a word.
    Word is represented as tuple of symbols (symbols being variable-length strings).
    """
    pairs = set()
    prev_char = word[0]
    for char in word[1:]:
        pairs.add((prev_char, char))
        prev_char = char
    return pairs
 def basic_clean(text):
    text = ftfy.fix_text(text)
    text = html.unescape(html.unescape(text))
    return text.strip()
 def whitespace_clean(text):
    text = re.sub(r'\s+', ' ', text)
    text = text.strip()
    return text
 class SimpleTokenizer(object):
    def __init__(self, bpe_path: str = default_bpe()):
        self.byte_encoder = bytes_to_unicode()
        self.byte_decoder = {v: k for k, v in self.byte_encoder.items()}
        merges = gzip.open(bpe_path).read().decode("utf-8").split('\n')
        merges = merges[1:49152-256-2+1]
        merges = [tuple(merge.split()) for merge in merges]
        vocab = list(bytes_to_unicode().values())
        vocab = vocab + [v+'</w>' for v in vocab]
        for merge in merges:
            vocab.append(''.join(merge))
        vocab.extend(['<|startoftext|>', '<|endoftext|>'])
        self.encoder = dict(zip(vocab, range(len(vocab))))
        self.decoder = {v: k for k, v in self.encoder.items()}
        self.bpe_ranks = dict(zip(merges, range(len(merges))))
        self.cache = {'<|startoftext|>': '<|startoftext|>', '<|endoftext|>': '<|endoftext|>'}
        self.pat = re.compile(r"""<\|startoftext\|>|<\|endoftext\|>|'s|'t|'re|'ve|'m|'ll|'d|[\p{L}]+|[\p{N}]|[^\s\p{L}\p{N}]+""", re.IGNORECASE)
    def bpe(self, token):
        if token in self.cache:
            return self.cache[token]
        word = tuple(token[:-1]) + ( token[-1] + '</w>',)
        pairs = get_pairs(word)
        if not pairs:
            return token+'</w>'
        while True:
            bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))
            if bigram not in self.bpe_ranks:
                break
            first, second = bigram
            new_word = []
            i = 0
            while i < len(word):
                try:
                    j = word.index(first, i)
                    new_word.extend(word[i:j])
                    i = j
                except:
                    new_word.extend(word[i:])
                    break
                if word[i] == first and i < len(word)-1 and word[i+1] == second:
                    new_word.append(first+second)
                    i += 2
                else:
                    new_word.append(word[i])
                    i += 1
            new_word = tuple(new_word)
            word = new_word
            if len(word) == 1:
                break
            else:
                pairs = get_pairs(word)
        word = ' '.join(word)
        self.cache[token] = word
        return word
    def encode(self, text):
        bpe_tokens = []
        text = whitespace_clean(basic_clean(text)).lower()
        for token in re.findall(self.pat, text):
            token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))
            bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
        return bpe_tokens
    def decode(self, tokens):
        text = ''.join([self.decoder[token] for token in tokens])
        text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors="replace").replace('</w>', ' ')
        return text
--- a/clip_words.csv
+++ b/clip_words.csv
--- a/configs/datasets/caltech101.yaml
+++ b/configs/datasets/caltech101.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "Caltech101"
--- a/configs/datasets/dtd.yaml
+++ b/configs/datasets/dtd.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "DescribableTextures"
--- a/configs/datasets/eurosat.yaml
+++ b/configs/datasets/eurosat.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "EuroSAT"
--- a/configs/datasets/fgvc_aircraft.yaml
+++ b/configs/datasets/fgvc_aircraft.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "FGVCAircraft"
--- a/configs/datasets/food101.yaml
+++ b/configs/datasets/food101.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "Food101"
--- a/configs/datasets/imagenet.yaml
+++ b/configs/datasets/imagenet.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "ImageNet"
--- a/configs/datasets/imagenet_a.yaml
+++ b/configs/datasets/imagenet_a.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "ImageNetA"
--- a/configs/datasets/imagenet_r.yaml
+++ b/configs/datasets/imagenet_r.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "ImageNetR"
--- a/configs/datasets/imagenet_sketch.yaml
+++ b/configs/datasets/imagenet_sketch.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "ImageNetSketch"
--- a/configs/datasets/imagenetv2.yaml
+++ b/configs/datasets/imagenetv2.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "ImageNetV2"
--- a/configs/datasets/oxford_flowers.yaml
+++ b/configs/datasets/oxford_flowers.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "OxfordFlowers"
--- a/configs/datasets/oxford_pets.yaml
+++ b/configs/datasets/oxford_pets.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "OxfordPets"
--- a/configs/datasets/stanford_cars.yaml
+++ b/configs/datasets/stanford_cars.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "StanfordCars"
--- a/configs/datasets/sun397.yaml
+++ b/configs/datasets/sun397.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "SUN397"
--- a/configs/datasets/ucf101.yaml
+++ b/configs/datasets/ucf101.yaml
@@ -0,0 +1,2 @@
 DATASET:
  NAME: "UCF101"
--- a/configs/trainers/CoCoOp/vit_b16_c16_ep10_batch1.yaml
+++ b/configs/trainers/CoCoOp/vit_b16_c16_ep10_batch1.yaml
@@ -0,0 +1,35 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 1
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 10
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  COCOOP:
    N_CTX: 16
    CTX_INIT: ""
    PREC: "fp16"
--- a/configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1.yaml
+++ b/configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1.yaml
@@ -0,0 +1,35 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 1
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 10
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  COCOOP:
    N_CTX: 4
    CTX_INIT: ""
    PREC: "fp16"
--- a/configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1.yaml
+++ b/configs/trainers/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1.yaml
@@ -0,0 +1,35 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 1
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 10
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  COCOOP:
    N_CTX: 4
    CTX_INIT: "a photo of a"
    PREC: "fp16"
--- a/configs/trainers/CoCoOp/vit_b16_c8_ep10_batch1.yaml
+++ b/configs/trainers/CoCoOp/vit_b16_c8_ep10_batch1.yaml
@@ -0,0 +1,35 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 1
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 10
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  COCOOP:
    N_CTX: 8
    CTX_INIT: ""
    PREC: "fp16"
--- a/configs/trainers/CoOp/rn101.yaml
+++ b/configs/trainers/CoOp/rn101.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 200
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN101"
--- a/configs/trainers/CoOp/rn101_ep50.yaml
+++ b/configs/trainers/CoOp/rn101_ep50.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 50
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN101"
--- a/configs/trainers/CoOp/rn50.yaml
+++ b/configs/trainers/CoOp/rn50.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 200
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN50"
--- a/configs/trainers/CoOp/rn50_ctxv1.yaml
+++ b/configs/trainers/CoOp/rn50_ctxv1.yaml
@@ -0,0 +1,33 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 200
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN50"
 TRAINER:
  COOP:
    CTX_INIT: "a photo of a"
--- a/configs/trainers/CoOp/rn50_ep100.yaml
+++ b/configs/trainers/CoOp/rn50_ep100.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 100
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN50"
--- a/configs/trainers/CoOp/rn50_ep50.yaml
+++ b/configs/trainers/CoOp/rn50_ep50.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 50
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN50"
--- a/configs/trainers/CoOp/rn50_ep50_ctxv1.yaml
+++ b/configs/trainers/CoOp/rn50_ep50_ctxv1.yaml
@@ -0,0 +1,33 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 50
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "RN50"
 TRAINER:
  COOP:
    CTX_INIT: "a photo of a"
--- a/configs/trainers/CoOp/rn50_val.yaml
+++ b/configs/trainers/CoOp/rn50_val.yaml
@@ -0,0 +1,17 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 200
  TEST:
    BATCH_SIZE: 200
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 MODEL:
  BACKBONE:
    NAME: "RN50"
--- a/configs/trainers/CoOp/vit_b16.yaml
+++ b/configs/trainers/CoOp/vit_b16.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 200
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
--- a/configs/trainers/CoOp/vit_b16_ep100.yaml
+++ b/configs/trainers/CoOp/vit_b16_ep100.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 100
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
--- a/configs/trainers/CoOp/vit_b16_ep50.yaml
+++ b/configs/trainers/CoOp/vit_b16_ep50.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 50
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
--- a/configs/trainers/CoOp/vit_b32.yaml
+++ b/configs/trainers/CoOp/vit_b32.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 200
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "ViT-B/32"
--- a/configs/trainers/CoOp/vit_b32_ep50.yaml
+++ b/configs/trainers/CoOp/vit_b32_ep50.yaml
@@ -0,0 +1,29 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 32
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.002
  MAX_EPOCH: 50
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 5
 MODEL:
  BACKBONE:
    NAME: "ViT-B/32"
--- a/configs/trainers/IVLP/vit_b16_c2_ep20_batch4_4+4ctx.yaml
+++ b/configs/trainers/IVLP/vit_b16_c2_ep20_batch4_4+4ctx.yaml
@@ -0,0 +1,39 @@
 # Independent Vision Language Prompting
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.0025
  MAX_EPOCH: 20
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  IVLP:
    N_CTX_VISION: 4
    N_CTX_TEXT: 4
    CTX_INIT: "a photo of a"
    PREC: "fp16"
    PROMPT_DEPTH_VISION: 9
    PROMPT_DEPTH_TEXT: 9
--- a/configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml
+++ b/configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml
@@ -0,0 +1,36 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.0035
  MAX_EPOCH: 2
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  MAPLE:
    N_CTX: 2
    CTX_INIT: "a photo of a"
    PREC: "fp16"
    PROMPT_DEPTH: 9
--- a/configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets.yaml
+++ b/configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets.yaml
@@ -0,0 +1,36 @@
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.0026
  MAX_EPOCH: 2
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  MAPLE:
    N_CTX: 2
    CTX_INIT: "a photo of a"
    PREC: "fp16"
    PROMPT_DEPTH: 3
--- a/configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml
+++ b/configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml
@@ -0,0 +1,43 @@
 # PromptSRC: Prompting with Self-regularizing constraints
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.0025
  MAX_EPOCH: 20
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  PROMPTSRC:
    N_CTX_VISION: 4
    N_CTX_TEXT: 4
    CTX_INIT: "a photo of a"
    PREC: "fp16"
    PROMPT_DEPTH_VISION: 9
    PROMPT_DEPTH_TEXT: 9
    TEXT_LOSS_WEIGHT: 25
    IMAGE_LOSS_WEIGHT: 10
    GPA_MEAN: 15
    GPA_STD: 1
--- a/configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets.yaml
+++ b/configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets.yaml
@@ -0,0 +1,43 @@
 # PromptSRC: Prompting with Self-regularizing constraints
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.0025
  MAX_EPOCH: 20
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  PROMPTSRC:
    N_CTX_VISION: 4
    N_CTX_TEXT: 4
    CTX_INIT: "a photo of a"
    PREC: "fp16"
    PROMPT_DEPTH_VISION: 3
    PROMPT_DEPTH_TEXT: 3
    TEXT_LOSS_WEIGHT: 25
    IMAGE_LOSS_WEIGHT: 10
    GPA_MEAN: 6
    GPA_STD: 10
--- a/configs/trainers/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot.yaml
+++ b/configs/trainers/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot.yaml
@@ -0,0 +1,47 @@
 # PromptSRC: Prompting with Self-regularizing constraints
 DATALOADER:
  TRAIN_X:
    BATCH_SIZE: 4
  TEST:
    BATCH_SIZE: 100
  NUM_WORKERS: 8
 INPUT:
  SIZE: (224, 224)
  INTERPOLATION: "bicubic"
  PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073]
  PIXEL_STD: [0.26862954, 0.26130258, 0.27577711]
  TRANSFORMS: ["random_resized_crop", "random_flip", "normalize"]
 OPTIM:
  NAME: "sgd"
  LR: 0.0025
  MAX_EPOCH: 50
  LR_SCHEDULER: "cosine"
  WARMUP_EPOCH: 1
  WARMUP_TYPE: "constant"
  WARMUP_CONS_LR: 1e-5
 TRAIN:
  PRINT_FREQ: 20
 MODEL:
  BACKBONE:
    NAME: "ViT-B/16"
 TRAINER:
  PROMPTSRC:
    N_CTX_VISION: 4
    N_CTX_TEXT: 4
    CTX_INIT: "a photo of a"
    PREC: "fp16"
    PROMPT_DEPTH_VISION: 9
    PROMPT_DEPTH_TEXT: 9
    TEXT_LOSS_WEIGHT: 25
    IMAGE_LOSS_WEIGHT: 10
 # Use the below configuration for: ImageNet, Caltech101, OxfordPets, Food101, UCF101 and  SUN397
    GPA_MEAN: 30
    GPA_STD: 30
 # Use the below configuration for: StanfordCars, Flowers102, FGVCAircraft, DTD and EuroSAT
 #    GPA_MEAN: 45
 #    GPA_STD: 5
--- a/datasets/init.py
+++ b/datasets/init.py
--- a/datasets/pycache/init.cpython-37.pyc
+++ b/datasets/pycache/init.cpython-37.pyc
--- a/datasets/pycache/caltech101.cpython-37.pyc
+++ b/datasets/pycache/caltech101.cpython-37.pyc
--- a/datasets/pycache/dtd.cpython-37.pyc
+++ b/datasets/pycache/dtd.cpython-37.pyc
--- a/datasets/pycache/eurosat.cpython-37.pyc
+++ b/datasets/pycache/eurosat.cpython-37.pyc
--- a/datasets/pycache/fgvc_aircraft.cpython-37.pyc
+++ b/datasets/pycache/fgvc_aircraft.cpython-37.pyc
--- a/datasets/pycache/food101.cpython-37.pyc
+++ b/datasets/pycache/food101.cpython-37.pyc
--- a/datasets/pycache/imagenet.cpython-37.pyc
+++ b/datasets/pycache/imagenet.cpython-37.pyc
--- a/datasets/pycache/imagenet_a.cpython-37.pyc
+++ b/datasets/pycache/imagenet_a.cpython-37.pyc
--- a/datasets/pycache/imagenet_r.cpython-37.pyc
+++ b/datasets/pycache/imagenet_r.cpython-37.pyc
--- a/datasets/pycache/imagenet_sketch.cpython-37.pyc
+++ b/datasets/pycache/imagenet_sketch.cpython-37.pyc
--- a/datasets/pycache/imagenetv2.cpython-37.pyc
+++ b/datasets/pycache/imagenetv2.cpython-37.pyc
--- a/datasets/pycache/oxford_flowers.cpython-37.pyc
+++ b/datasets/pycache/oxford_flowers.cpython-37.pyc
--- a/datasets/pycache/oxford_pets.cpython-37.pyc
+++ b/datasets/pycache/oxford_pets.cpython-37.pyc
--- a/datasets/pycache/stanford_cars.cpython-37.pyc
+++ b/datasets/pycache/stanford_cars.cpython-37.pyc
--- a/datasets/pycache/sun397.cpython-37.pyc
+++ b/datasets/pycache/sun397.cpython-37.pyc
--- a/datasets/pycache/ucf101.cpython-37.pyc
+++ b/datasets/pycache/ucf101.cpython-37.pyc
--- a/datasets/caltech101.py
+++ b/datasets/caltech101.py
@@ -0,0 +1,59 @@
 import os
 import pickle
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
 from .dtd import DescribableTextures as DTD
 IGNORED = ["BACKGROUND_Google", "Faces_easy"]
 NEW_CNAMES = {
    "airplanes": "airplane",
    "Faces": "face",
    "Leopards": "leopard",
    "Motorbikes": "motorbike",
 }
@DATASET_REGISTRY.register()
 class Caltech101(DatasetBase):
    dataset_dir = "caltech-101"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "101_ObjectCategories")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_Caltech101.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            train, val, test = DTD.read_and_split_data(self.image_dir, ignored=IGNORED, new_cnames=NEW_CNAMES)
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
--- a/datasets/dtd.py
+++ b/datasets/dtd.py
@@ -0,0 +1,95 @@
 import os
 import pickle
 import random
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import listdir_nohidden, mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class DescribableTextures(DatasetBase):
    dataset_dir = "dtd"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "images")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_DescribableTextures.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            train, val, test = self.read_and_split_data(self.image_dir)
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    @staticmethod
    def read_and_split_data(image_dir, p_trn=0.5, p_val=0.2, ignored=[], new_cnames=None):
        # The data are supposed to be organized into the following structure
        # =============
        # images/
        #     dog/
        #     cat/
        #     horse/
        # =============
        categories = listdir_nohidden(image_dir)
        categories = [c for c in categories if c not in ignored]
        categories.sort()
        p_tst = 1 - p_trn - p_val
        print(f"Splitting into {p_trn:.0%} train, {p_val:.0%} val, and {p_tst:.0%} test")
        def _collate(ims, y, c):
            items = []
            for im in ims:
                item = Datum(impath=im, label=y, classname=c)  # is already 0-based
                items.append(item)
            return items
        train, val, test = [], [], []
        for label, category in enumerate(categories):
            category_dir = os.path.join(image_dir, category)
            images = listdir_nohidden(category_dir)
            images = [os.path.join(category_dir, im) for im in images]
            random.shuffle(images)
            n_total = len(images)
            n_train = round(n_total * p_trn)
            n_val = round(n_total * p_val)
            n_test = n_total - n_train - n_val
            assert n_train > 0 and n_val > 0 and n_test > 0
            if new_cnames is not None and category in new_cnames:
                category = new_cnames[category]
            train.extend(_collate(images[:n_train], label, category))
            val.extend(_collate(images[n_train : n_train + n_val], label, category))
            test.extend(_collate(images[n_train + n_val :], label, category))
        return train, val, test
--- a/datasets/eurosat.py
+++ b/datasets/eurosat.py
@@ -0,0 +1,73 @@
 import os
 import pickle
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
 from .dtd import DescribableTextures as DTD
 NEW_CNAMES = {
    "AnnualCrop": "Annual Crop Land",
    "Forest": "Forest",
    "HerbaceousVegetation": "Herbaceous Vegetation Land",
    "Highway": "Highway or Road",
    "Industrial": "Industrial Buildings",
    "Pasture": "Pasture Land",
    "PermanentCrop": "Permanent Crop Land",
    "Residential": "Residential Buildings",
    "River": "River",
    "SeaLake": "Sea or Lake",
 }
@DATASET_REGISTRY.register()
 class EuroSAT(DatasetBase):
    dataset_dir = "eurosat"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "2750")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_EuroSAT.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            train, val, test = DTD.read_and_split_data(self.image_dir, new_cnames=NEW_CNAMES)
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def update_classname(self, dataset_old):
        dataset_new = []
        for item_old in dataset_old:
            cname_old = item_old.classname
            cname_new = NEW_CLASSNAMES[cname_old]
            item_new = Datum(impath=item_old.impath, label=item_old.label, classname=cname_new)
            dataset_new.append(item_new)
        return dataset_new
--- a/datasets/fgvc_aircraft.py
+++ b/datasets/fgvc_aircraft.py
@@ -0,0 +1,71 @@
 import os
 import pickle
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class FGVCAircraft(DatasetBase):
    dataset_dir = "fgvc_aircraft"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "images")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        classnames = []
        with open(os.path.join(self.dataset_dir, "variants.txt"), "r") as f:
            lines = f.readlines()
            for line in lines:
                classnames.append(line.strip())
        cname2lab = {c: i for i, c in enumerate(classnames)}
        train = self.read_data(cname2lab, "images_variant_train.txt")
        val = self.read_data(cname2lab, "images_variant_val.txt")
        test = self.read_data(cname2lab, "images_variant_test.txt")
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def read_data(self, cname2lab, split_file):
        filepath = os.path.join(self.dataset_dir, split_file)
        items = []
        with open(filepath, "r") as f:
            lines = f.readlines()
            for line in lines:
                line = line.strip().split(" ")
                imname = line[0] + ".jpg"
                classname = " ".join(line[1:])
                impath = os.path.join(self.image_dir, imname)
                label = cname2lab[classname]
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/food101.py
+++ b/datasets/food101.py
@@ -0,0 +1,51 @@
 import os
 import pickle
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
 from .dtd import DescribableTextures as DTD
@DATASET_REGISTRY.register()
 class Food101(DatasetBase):
    dataset_dir = "food-101"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "images")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_Food101.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            train, val, test = DTD.read_and_split_data(self.image_dir)
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
--- a/datasets/imagenet.py
+++ b/datasets/imagenet.py
@@ -0,0 +1,91 @@
 import os
 import pickle
 from collections import OrderedDict
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import listdir_nohidden, mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class ImageNet(DatasetBase):
    dataset_dir = "imagenet"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "images")
        self.preprocessed = os.path.join(self.dataset_dir, "preprocessed.pkl")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.preprocessed):
            with open(self.preprocessed, "rb") as f:
                preprocessed = pickle.load(f)
                train = preprocessed["train"]
                test = preprocessed["test"]
        else:
            text_file = os.path.join(self.dataset_dir, "classnames.txt")
            classnames = self.read_classnames(text_file)
            train = self.read_data(classnames, "train")
            # Follow standard practice to perform evaluation on the val set
            # Also used as the val set (so evaluate the last-step model)
            test = self.read_data(classnames, "val")
            preprocessed = {"train": train, "test": test}
            with open(self.preprocessed, "wb") as f:
                pickle.dump(preprocessed, f, protocol=pickle.HIGHEST_PROTOCOL)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train = data["train"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                data = {"train": train}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, test = OxfordPets.subsample_classes(train, test, subsample=subsample)
        super().__init__(train_x=train, val=test, test=test)
    @staticmethod
    def read_classnames(text_file):
        """Return a dictionary containing
        key-value pairs of <folder name>: <class name>.
        """
        classnames = OrderedDict()
        with open(text_file, "r") as f:
            lines = f.readlines()
            for line in lines:
                line = line.strip().split(" ")
                folder = line[0]
                classname = " ".join(line[1:])
                classnames[folder] = classname
        return classnames
    def read_data(self, classnames, split_dir):
        split_dir = os.path.join(self.image_dir, split_dir)
        folders = sorted(f.name for f in os.scandir(split_dir) if f.is_dir())
        items = []
        for label, folder in enumerate(folders):
            imnames = listdir_nohidden(os.path.join(split_dir, folder))
            classname = classnames[folder]
            for imname in imnames:
                impath = os.path.join(split_dir, folder, imname)
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/imagenet_a.py
+++ b/datasets/imagenet_a.py
@@ -0,0 +1,46 @@
 import os
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import listdir_nohidden
 from .imagenet import ImageNet
 TO_BE_IGNORED = ["README.txt"]
@DATASET_REGISTRY.register()
 class ImageNetA(DatasetBase):
    """ImageNet-A(dversarial).
    This dataset is used for testing only.
    """
    dataset_dir = "imagenet-adversarial"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "imagenet-a")
        text_file = os.path.join(self.dataset_dir, "classnames.txt")
        classnames = ImageNet.read_classnames(text_file)
        data = self.read_data(classnames)
        super().__init__(train_x=data, test=data)
    def read_data(self, classnames):
        image_dir = self.image_dir
        folders = listdir_nohidden(image_dir, sort=True)
        folders = [f for f in folders if f not in TO_BE_IGNORED]
        items = []
        for label, folder in enumerate(folders):
            imnames = listdir_nohidden(os.path.join(image_dir, folder))
            classname = classnames[folder]
            for imname in imnames:
                impath = os.path.join(image_dir, folder, imname)
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/imagenet_r.py
+++ b/datasets/imagenet_r.py
@@ -0,0 +1,46 @@
 import os
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import listdir_nohidden
 from .imagenet import ImageNet
 TO_BE_IGNORED = ["README.txt"]
@DATASET_REGISTRY.register()
 class ImageNetR(DatasetBase):
    """ImageNet-R(endition).
    This dataset is used for testing only.
    """
    dataset_dir = "imagenet-rendition"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "imagenet-r")
        text_file = os.path.join(self.dataset_dir, "classnames.txt")
        classnames = ImageNet.read_classnames(text_file)
        data = self.read_data(classnames)
        super().__init__(train_x=data, test=data)
    def read_data(self, classnames):
        image_dir = self.image_dir
        folders = listdir_nohidden(image_dir, sort=True)
        folders = [f for f in folders if f not in TO_BE_IGNORED]
        items = []
        for label, folder in enumerate(folders):
            imnames = listdir_nohidden(os.path.join(image_dir, folder))
            classname = classnames[folder]
            for imname in imnames:
                impath = os.path.join(image_dir, folder, imname)
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/imagenet_sketch.py
+++ b/datasets/imagenet_sketch.py
@@ -0,0 +1,43 @@
 import os
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import listdir_nohidden
 from .imagenet import ImageNet
@DATASET_REGISTRY.register()
 class ImageNetSketch(DatasetBase):
    """ImageNet-Sketch.
    This dataset is used for testing only.
    """
    dataset_dir = "imagenet-sketch"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "images")
        text_file = os.path.join(self.dataset_dir, "classnames.txt")
        classnames = ImageNet.read_classnames(text_file)
        data = self.read_data(classnames)
        super().__init__(train_x=data, test=data)
    def read_data(self, classnames):
        image_dir = self.image_dir
        folders = listdir_nohidden(image_dir, sort=True)
        items = []
        for label, folder in enumerate(folders):
            imnames = listdir_nohidden(os.path.join(image_dir, folder))
            classname = classnames[folder]
            for imname in imnames:
                impath = os.path.join(image_dir, folder, imname)
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/imagenetv2.py
+++ b/datasets/imagenetv2.py
@@ -0,0 +1,46 @@
 import os
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import listdir_nohidden
 from .imagenet import ImageNet
@DATASET_REGISTRY.register()
 class ImageNetV2(DatasetBase):
    """ImageNetV2.
    This dataset is used for testing only.
    """
    dataset_dir = "imagenetv2"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        image_dir = "imagenetv2-matched-frequency-format-val"
        self.image_dir = os.path.join(self.dataset_dir, image_dir)
        text_file = os.path.join(self.dataset_dir, "classnames.txt")
        classnames = ImageNet.read_classnames(text_file)
        data = self.read_data(classnames)
        super().__init__(train_x=data, test=data)
    def read_data(self, classnames):
        image_dir = self.image_dir
        folders = list(classnames.keys())
        items = []
        for label in range(1000):
            class_dir = os.path.join(image_dir, str(label))
            imnames = listdir_nohidden(class_dir)
            folder = folders[label]
            classname = classnames[folder]
            for imname in imnames:
                impath = os.path.join(class_dir, imname)
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/oxford_flowers.py
+++ b/datasets/oxford_flowers.py
@@ -0,0 +1,89 @@
 import os
 import pickle
 import random
 from scipy.io import loadmat
 from collections import defaultdict
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import read_json, mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class OxfordFlowers(DatasetBase):
    dataset_dir = "oxford_flowers"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "jpg")
        self.label_file = os.path.join(self.dataset_dir, "imagelabels.mat")
        self.lab2cname_file = os.path.join(self.dataset_dir, "cat_to_name.json")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_OxfordFlowers.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            train, val, test = self.read_data()
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def read_data(self):
        tracker = defaultdict(list)
        label_file = loadmat(self.label_file)["labels"][0]
        for i, label in enumerate(label_file):
            imname = f"image_{str(i + 1).zfill(5)}.jpg"
            impath = os.path.join(self.image_dir, imname)
            label = int(label)
            tracker[label].append(impath)
        print("Splitting data into 50% train, 20% val, and 30% test")
        def _collate(ims, y, c):
            items = []
            for im in ims:
                item = Datum(impath=im, label=y - 1, classname=c)  # convert to 0-based label
                items.append(item)
            return items
        lab2cname = read_json(self.lab2cname_file)
        train, val, test = [], [], []
        for label, impaths in tracker.items():
            random.shuffle(impaths)
            n_total = len(impaths)
            n_train = round(n_total * 0.5)
            n_val = round(n_total * 0.2)
            n_test = n_total - n_train - n_val
            assert n_train > 0 and n_val > 0 and n_test > 0
            cname = lab2cname[str(label)]
            train.extend(_collate(impaths[:n_train], label, cname))
            val.extend(_collate(impaths[n_train : n_train + n_val], label, cname))
            test.extend(_collate(impaths[n_train + n_val :], label, cname))
        return train, val, test
--- a/datasets/oxford_pets.py
+++ b/datasets/oxford_pets.py
@@ -0,0 +1,186 @@
 import os
 import pickle
 import math
 import random
 from collections import defaultdict
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import read_json, write_json, mkdir_if_missing
@DATASET_REGISTRY.register()
 class OxfordPets(DatasetBase):
    dataset_dir = "oxford_pets"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "images")
        self.anno_dir = os.path.join(self.dataset_dir, "annotations")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_OxfordPets.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = self.read_split(self.split_path, self.image_dir)
        else:
            trainval = self.read_data(split_file="trainval.txt")
            test = self.read_data(split_file="test.txt")
            train, val = self.split_trainval(trainval)
            self.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = self.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def read_data(self, split_file):
        filepath = os.path.join(self.anno_dir, split_file)
        items = []
        with open(filepath, "r") as f:
            lines = f.readlines()
            for line in lines:
                line = line.strip()
                imname, label, species, _ = line.split(" ")
                breed = imname.split("_")[:-1]
                breed = "_".join(breed)
                breed = breed.lower()
                imname += ".jpg"
                impath = os.path.join(self.image_dir, imname)
                label = int(label) - 1  # convert to 0-based index
                item = Datum(impath=impath, label=label, classname=breed)
                items.append(item)
        return items
    @staticmethod
    def split_trainval(trainval, p_val=0.2):
        p_trn = 1 - p_val
        print(f"Splitting trainval into {p_trn:.0%} train and {p_val:.0%} val")
        tracker = defaultdict(list)
        for idx, item in enumerate(trainval):
            label = item.label
            tracker[label].append(idx)
        train, val = [], []
        for label, idxs in tracker.items():
            n_val = round(len(idxs) * p_val)
            assert n_val > 0
            random.shuffle(idxs)
            for n, idx in enumerate(idxs):
                item = trainval[idx]
                if n < n_val:
                    val.append(item)
                else:
                    train.append(item)
        return train, val
    @staticmethod
    def save_split(train, val, test, filepath, path_prefix):
        def _extract(items):
            out = []
            for item in items:
                impath = item.impath
                label = item.label
                classname = item.classname
                impath = impath.replace(path_prefix, "")
                if impath.startswith("/"):
                    impath = impath[1:]
                out.append((impath, label, classname))
            return out
        train = _extract(train)
        val = _extract(val)
        test = _extract(test)
        split = {"train": train, "val": val, "test": test}
        write_json(split, filepath)
        print(f"Saved split to {filepath}")
    @staticmethod
    def read_split(filepath, path_prefix):
        def _convert(items):
            out = []
            for impath, label, classname in items:
                impath = os.path.join(path_prefix, impath)
                item = Datum(impath=impath, label=int(label), classname=classname)
                out.append(item)
            return out
        print(f"Reading split from {filepath}")
        split = read_json(filepath)
        train = _convert(split["train"])
        val = _convert(split["val"])
        test = _convert(split["test"])
        return train, val, test
    @staticmethod
    def subsample_classes(*args, subsample="all"):
        """Divide classes into two groups. The first group
        represents base classes while the second group represents
        new classes.
        Args:
            args: a list of datasets, e.g. train, val and test.
            subsample (str): what classes to subsample.
        """
        assert subsample in ["all", "base", "new"]
        if subsample == "all":
            return args
        dataset = args[0]
        labels = set()
        for item in dataset:
            labels.add(item.label)
        labels = list(labels)
        labels.sort()
        n = len(labels)
        # Divide classes into two halves
        m = math.ceil(n / 2)
        print(f"SUBSAMPLE {subsample.upper()} CLASSES!")
        if subsample == "base":
            selected = labels[:m]  # take the first half
        else:
            selected = labels[m:]  # take the second half
        relabeler = {y: y_new for y_new, y in enumerate(selected)}
        output = []
        for dataset in args:
            dataset_new = []
            for item in dataset:
                if item.label not in selected:
                    continue
                item_new = Datum(
                    impath=item.impath,
                    label=relabeler[item.label],
                    classname=item.classname
                )
                dataset_new.append(item_new)
            output.append(dataset_new)
        return output
--- a/datasets/stanford_cars.py
+++ b/datasets/stanford_cars.py
@@ -0,0 +1,75 @@
 import os
 import pickle
 from scipy.io import loadmat
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class StanfordCars(DatasetBase):
    dataset_dir = "stanford_cars"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_StanfordCars.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.dataset_dir)
        else:
            trainval_file = os.path.join(self.dataset_dir, "devkit", "cars_train_annos.mat")
            test_file = os.path.join(self.dataset_dir, "cars_test_annos_withlabels.mat")
            meta_file = os.path.join(self.dataset_dir, "devkit", "cars_meta.mat")
            trainval = self.read_data("cars_train", trainval_file, meta_file)
            test = self.read_data("cars_test", test_file, meta_file)
            train, val = OxfordPets.split_trainval(trainval)
            OxfordPets.save_split(train, val, test, self.split_path, self.dataset_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def read_data(self, image_dir, anno_file, meta_file):
        anno_file = loadmat(anno_file)["annotations"][0]
        meta_file = loadmat(meta_file)["class_names"][0]
        items = []
        for i in range(len(anno_file)):
            imname = anno_file[i]["fname"][0]
            impath = os.path.join(self.dataset_dir, image_dir, imname)
            label = anno_file[i]["class"][0, 0]
            label = int(label) - 1  # convert to 0-based index
            classname = meta_file[label][0]
            names = classname.split(" ")
            year = names.pop(-1)
            names.insert(0, year)
            classname = " ".join(names)
            item = Datum(impath=impath, label=label, classname=classname)
            items.append(item)
        return items
--- a/datasets/sun397.py
+++ b/datasets/sun397.py
@@ -0,0 +1,80 @@
 import os
 import pickle
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class SUN397(DatasetBase):
    dataset_dir = "sun397"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "SUN397")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_SUN397.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            classnames = []
            with open(os.path.join(self.dataset_dir, "ClassName.txt"), "r") as f:
                lines = f.readlines()
                for line in lines:
                    line = line.strip()[1:]  # remove /
                    classnames.append(line)
            cname2lab = {c: i for i, c in enumerate(classnames)}
            trainval = self.read_data(cname2lab, "Training_01.txt")
            test = self.read_data(cname2lab, "Testing_01.txt")
            train, val = OxfordPets.split_trainval(trainval)
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def read_data(self, cname2lab, text_file):
        text_file = os.path.join(self.dataset_dir, text_file)
        items = []
        with open(text_file, "r") as f:
            lines = f.readlines()
            for line in lines:
                imname = line.strip()[1:]  # remove /
                classname = os.path.dirname(imname)
                label = cname2lab[classname]
                impath = os.path.join(self.image_dir, imname)
                names = classname.split("/")[1:]  # remove 1st letter
                names = names[::-1]  # put words like indoor/outdoor at first
                classname = " ".join(names)
                item = Datum(impath=impath, label=label, classname=classname)
                items.append(item)
        return items
--- a/datasets/ucf101.py
+++ b/datasets/ucf101.py
@@ -0,0 +1,84 @@
 import os
 import pickle
 import re
 from dassl.data.datasets import DATASET_REGISTRY, Datum, DatasetBase
 from dassl.utils import mkdir_if_missing
 from .oxford_pets import OxfordPets
@DATASET_REGISTRY.register()
 class UCF101(DatasetBase):
    dataset_dir = "ucf101"
    def __init__(self, cfg):
        root = os.path.abspath(os.path.expanduser(cfg.DATASET.ROOT))
        self.dataset_dir = os.path.join(root, self.dataset_dir)
        self.image_dir = os.path.join(self.dataset_dir, "UCF-101-midframes")
        self.split_path = os.path.join(self.dataset_dir, "split_zhou_UCF101.json")
        self.split_fewshot_dir = os.path.join(self.dataset_dir, "split_fewshot")
        mkdir_if_missing(self.split_fewshot_dir)
        if os.path.exists(self.split_path):
            train, val, test = OxfordPets.read_split(self.split_path, self.image_dir)
        else:
            cname2lab = {}
            filepath = os.path.join(self.dataset_dir, "ucfTrainTestlist/classInd.txt")
            with open(filepath, "r") as f:
                lines = f.readlines()
                for line in lines:
                    label, classname = line.strip().split(" ")
                    label = int(label) - 1  # conver to 0-based index
                    cname2lab[classname] = label
            trainval = self.read_data(cname2lab, "ucfTrainTestlist/trainlist01.txt")
            test = self.read_data(cname2lab, "ucfTrainTestlist/testlist01.txt")
            train, val = OxfordPets.split_trainval(trainval)
            OxfordPets.save_split(train, val, test, self.split_path, self.image_dir)
        num_shots = cfg.DATASET.NUM_SHOTS
        if num_shots >= 1:
            seed = cfg.SEED
            preprocessed = os.path.join(self.split_fewshot_dir, f"shot_{num_shots}-seed_{seed}.pkl")
            if os.path.exists(preprocessed):
                print(f"Loading preprocessed few-shot data from {preprocessed}")
                with open(preprocessed, "rb") as file:
                    data = pickle.load(file)
                    train, val = data["train"], data["val"]
            else:
                train = self.generate_fewshot_dataset(train, num_shots=num_shots)
                val = self.generate_fewshot_dataset(val, num_shots=min(num_shots, 4))
                data = {"train": train, "val": val}
                print(f"Saving preprocessed few-shot data to {preprocessed}")
                with open(preprocessed, "wb") as file:
                    pickle.dump(data, file, protocol=pickle.HIGHEST_PROTOCOL)
        subsample = cfg.DATASET.SUBSAMPLE_CLASSES
        train, val, test = OxfordPets.subsample_classes(train, val, test, subsample=subsample)
        super().__init__(train_x=train, val=val, test=test)
    def read_data(self, cname2lab, text_file):
        text_file = os.path.join(self.dataset_dir, text_file)
        items = []
        with open(text_file, "r") as f:
            lines = f.readlines()
            for line in lines:
                line = line.strip().split(" ")[0]  # trainlist: filename, label
                action, filename = line.split("/")
                label = cname2lab[action]
                elements = re.findall("[A-Z][^A-Z]*", action)
                renamed_action = "_".join(elements)
                filename = filename.replace(".avi", ".jpg")
                impath = os.path.join(self.image_dir, renamed_action, filename)
                item = Datum(impath=impath, label=label, classname=renamed_action)
                items.append(item)
        return items
--- a/docs/Co-CoOp.md
+++ b/docs/Co-CoOp.md
@@ -0,0 +1,99 @@
 # Conditional Prompt Learning for Vision-Language Models (Co-CoOp, CVPR'22)
 [![paper](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2203.05557)
 We provide the scripts in [scripts/cocoop](../scripts/cocoop) to reproduce Co-CoOp results (CVPR'22).
 Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `PromptSRC/`.
 ## Generalization From Base to New Classes
 This corresponds to the experiments in Section 4.1, i.e., Table 1.
 You will need both `scripts/cocoop/base2new_train.sh` and `scripts/cocoop/base2new_test.sh`. The former trains a model on bash classes while the latter evaluates the trained model on new classes. Both scripts have two input arguments, i.e., `DATASET` and `SEED`.
 `DATASET` takes as input a dataset name, like `imagenet` or `caltech101`. The valid names are the files' names in `CoOp/configs/datasets/`.
 Below we provide an example on how to evaluate the model on ImageNet.
 ```bash
 # seed=1
 bash scripts/cocoop/base2new_train.sh imagenet 1
 bash scripts/cocoop/base2new_test.sh imagenet 1
 # seed=2
 bash scripts/cocoop/base2new_train.sh imagenet 2
 bash scripts/cocoop/base2new_test.sh imagenet 2
 # seed=3
 bash scripts/cocoop/base2new_train.sh imagenet 3
 bash scripts/cocoop/base2new_test.sh imagenet 3
 ```
 When the evaluation is done, you can use `parse_test_res.py` to automatically calculate the average results. For instance, after you finish the evaluation (including `base2new_train.sh` and `base2new_test.sh`) on ImageNet using the aforementioned commands, you would get
 ```
 output
 |–– base2new/
 |   |–– test_new/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– CoCoOp/
 |   |   |   |   |   |–– vit_b16_c4_ep10_batch1_ctxv1/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 |   |–– train_base/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– CoCoOp/
 |   |   |   |   |   |–– vit_b16_c4_ep10_batch1_ctxv1/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 ```
 Then, to get the average performance on the base classes, run
 ```bash
 python parse_test_res.py output/base2new/train_base/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1
 ```
 To get the average performance on the new classes, run
 ```bash
 python parse_test_res.py output/base2new/test_new/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1 --test-log
 ```
 ## Cross-Dataset Transfer
 This corresponds to the experiments in Section 4.2, i.e., Table 2.
 The relevant scripts are `scripts/cocoop/xd_train.sh` and `scripts/cocoop/xd_test.sh` where the `DATASET` variable is set to the default, namely `imagenet`. To train the model, run
 ```bash
 # seed=1
 bash scripts/cocoop/xd_train.sh 1
 # seed=2
 bash scripts/cocoop/xd_train.sh 2
 # seed=3
 bash scripts/cocoop/xd_train.sh 3
 ```
 Then, you evaluate the model on other datasets, e.g.,
 ```bash
 for SEED in 1 2 3
 do
    bash scripts/cocoop/xd_test.sh caltech101 ${SEED}
    bash scripts/cocoop/xd_test.sh oxford_pets ${SEED}
    bash scripts/cocoop/xd_test.sh stanford_cars ${SEED}
 done
 ```
 ## Domain Generalization
 This corresponds to the experiments in Section 4.3, i.e., Table 3.
 The steps are similar to those discussed in "Cross-Dataset Transfer" except you evaluate the model on the variants of ImageNet, i.e., `imagenetv2`, `imagenet_sketch`, `imagenet_a` and `imagenet_r`.
--- a/docs/CoOp.md
+++ b/docs/CoOp.md
@@ -0,0 +1,99 @@
 # Conditional Prompt Learning for Vision-Language Models (Co-CoOp, CVPR'22)
 [![paper](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2203.05557)
 We provide the scripts in [scripts/cocoop](../scripts/cocoop) to reproduce Co-CoOp results (CVPR'22).
 Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `PromptSRC/`.
 ## Generalization From Base to New Classes
 This corresponds to the experiments in Section 4.1, i.e., Table 1.
 You will need both `scripts/cocoop/base2new_train.sh` and `scripts/cocoop/base2new_test.sh`. The former trains a model on bash classes while the latter evaluates the trained model on new classes. Both scripts have two input arguments, i.e., `DATASET` and `SEED`.
 `DATASET` takes as input a dataset name, like `imagenet` or `caltech101`. The valid names are the files' names in `CoOp/configs/datasets/`.
 Below we provide an example on how to evaluate the model on ImageNet.
 ```bash
 # seed=1
 bash scripts/cocoop/base2new_train.sh imagenet 1
 bash scripts/cocoop/base2new_test.sh imagenet 1
 # seed=2
 bash scripts/cocoop/base2new_train.sh imagenet 2
 bash scripts/cocoop/base2new_test.sh imagenet 2
 # seed=3
 bash scripts/cocoop/base2new_train.sh imagenet 3
 bash scripts/cocoop/base2new_test.sh imagenet 3
 ```
 When the evaluation is done, you can use `parse_test_res.py` to automatically calculate the average results. For instance, after you finish the evaluation (including `base2new_train.sh` and `base2new_test.sh`) on ImageNet using the aforementioned commands, you would get
 ```
 output
 |–– base2new/
 |   |–– test_new/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– CoCoOp/
 |   |   |   |   |   |–– vit_b16_c4_ep10_batch1_ctxv1/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 |   |–– train_base/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– CoCoOp/
 |   |   |   |   |   |–– vit_b16_c4_ep10_batch1_ctxv1/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 ```
 Then, to get the average performance on the base classes, run
 ```bash
 python parse_test_res.py output/base2new/train_base/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1
 ```
 To get the average performance on the new classes, run
 ```bash
 python parse_test_res.py output/base2new/test_new/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1 --test-log
 ```
 ## Cross-Dataset Transfer
 This corresponds to the experiments in Section 4.2, i.e., Table 2.
 The relevant scripts are `scripts/cocoop/xd_train.sh` and `scripts/cocoop/xd_test.sh` where the `DATASET` variable is set to the default, namely `imagenet`. To train the model, run
 ```bash
 # seed=1
 bash scripts/cocoop/xd_train.sh 1
 # seed=2
 bash scripts/cocoop/xd_train.sh 2
 # seed=3
 bash scripts/cocoop/xd_train.sh 3
 ```
 Then, you evaluate the model on other datasets, e.g.,
 ```bash
 for SEED in 1 2 3
 do
    bash scripts/cocoop/xd_test.sh caltech101 ${SEED}
    bash scripts/cocoop/xd_test.sh oxford_pets ${SEED}
    bash scripts/cocoop/xd_test.sh stanford_cars ${SEED}
 done
 ```
 ## Domain Generalization
 This corresponds to the experiments in Section 4.3, i.e., Table 3.
 The steps are similar to those discussed in "Cross-Dataset Transfer" except you evaluate the model on the variants of ImageNet, i.e., `imagenetv2`, `imagenet_sketch`, `imagenet_a` and `imagenet_r`.
--- a/docs/DATASETS.md
+++ b/docs/DATASETS.md
@@ -0,0 +1,233 @@
 # How to install datasets
 ### Acknowledgement: This readme file for installing datasets has been borrowed directly from [MaPLe's](https://github.com/muzairkhattak/multimodal-prompt-learning) official repository.
 We recommend putting all datasets under the same folder (say `$DATA`) to ease management and following the instructions below to organize datasets to avoid modifying the source code. The file structure should look like:
 ```
 $DATA/
 |–– imagenet/
 |–– caltech-101/
 |–– oxford_pets/
 |–– stanford_cars/
 ```
 If you have some datasets already installed somewhere else, you can create symbolic links in `$DATA/dataset_name` that point to the original data to avoid duplicate download.
 Datasets list:
 - [ImageNet](#imagenet)
 - [Caltech101](#caltech101)
 - [OxfordPets](#oxfordpets)
 - [StanfordCars](#stanfordcars)
 - [Flowers102](#flowers102)
 - [Food101](#food101)
 - [FGVCAircraft](#fgvcaircraft)
 - [SUN397](#sun397)
 - [DTD](#dtd)
 - [EuroSAT](#eurosat)
 - [UCF101](#ucf101)
 - [ImageNetV2](#imagenetv2)
 - [ImageNet-Sketch](#imagenet-sketch)
 - [ImageNet-A](#imagenet-a)
 - [ImageNet-R](#imagenet-r)
 The instructions to prepare each dataset are detailed below. To ensure reproducibility and fair comparison for future work, we provide fixed train/val/test splits for all datasets except ImageNet where the validation set is used as test set. The fixed splits are either from the original datasets (if available) or created by us.
 ### ImageNet
 - Create a folder named `imagenet/` under `$DATA`.
 - Create `images/` under `imagenet/`.
 - Download the dataset from the [official website](https://image-net.org/index.php) and extract the training and validation sets to `$DATA/imagenet/images`. The directory structure should look like
 ```
 imagenet/
 |–– images/
 |   |–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
 |   |–– val/
 ```
 - If you had downloaded the ImageNet dataset before, you can create symbolic links to map the training and validation sets to `$DATA/imagenet/images`.
 - Download the `classnames.txt` to `$DATA/imagenet/` from this [link](https://drive.google.com/file/d/1-61f_ol79pViBFDG_IDlUQSwoLcn2XXF/view?usp=sharing). The class names are copied from [CLIP](https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb).
 ### Caltech101
 - Create a folder named `caltech-101/` under `$DATA`.
 - Download `101_ObjectCategories.tar.gz` from http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz and extract the file under `$DATA/caltech-101`.
 - Download `split_zhou_Caltech101.json` from this [link](https://drive.google.com/file/d/1hyarUivQE36mY6jSomru6Fjd-JzwcCzN/view?usp=sharing) and put it under `$DATA/caltech-101`. 
 The directory structure should look like
 ```
 caltech-101/
 |–– 101_ObjectCategories/
 |–– split_zhou_Caltech101.json
 ```
 ### OxfordPets
 - Create a folder named `oxford_pets/` under `$DATA`.
 - Download the images from https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz.
 - Download the annotations from https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz.
 - Download `split_zhou_OxfordPets.json` from this [link](https://drive.google.com/file/d/1501r8Ber4nNKvmlFVQZ8SeUHTcdTTEqs/view?usp=sharing). 
 The directory structure should look like
 ```
 oxford_pets/
 |–– images/
 |–– annotations/
 |–– split_zhou_OxfordPets.json
 ```
 ### StanfordCars
 - Create a folder named `stanford_cars/` under `$DATA`.
 - Download the train images http://ai.stanford.edu/~jkrause/car196/cars_train.tgz.
 - Download the test images http://ai.stanford.edu/~jkrause/car196/cars_test.tgz.
 - Download the train labels https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.
 - Download the test labels http://ai.stanford.edu/~jkrause/car196/cars_test_annos_withlabels.mat.
 - Download `split_zhou_StanfordCars.json` from this [link](https://drive.google.com/file/d/1ObCFbaAgVu0I-k_Au-gIUcefirdAuizT/view?usp=sharing).
 The directory structure should look like
 ```
 stanford_cars/
 |–– cars_test\
 |–– cars_test_annos_withlabels.mat
 |–– cars_train\
 |–– devkit\
 |–– split_zhou_StanfordCars.json
 ```
 ### Flowers102
 - Create a folder named `oxford_flowers/` under `$DATA`.
 - Download the images and labels from https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz and https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat respectively.
 - Download `cat_to_name.json` from [here](https://drive.google.com/file/d/1AkcxCXeK_RCGCEC_GvmWxjcjaNhu-at0/view?usp=sharing). 
 - Download `split_zhou_OxfordFlowers.json` from [here](https://drive.google.com/file/d/1Pp0sRXzZFZq15zVOzKjKBu4A9i01nozT/view?usp=sharing).
 The directory structure should look like
 ```
 oxford_flowers/
 |–– cat_to_name.json
 |–– imagelabels.mat
 |–– jpg/
 |–– split_zhou_OxfordFlowers.json
 ```
 ### Food101
 - Download the dataset from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract the file `food-101.tar.gz` under `$DATA`, resulting in a folder named `$DATA/food-101/`.
 - Download `split_zhou_Food101.json` from [here](https://drive.google.com/file/d/1QK0tGi096I0Ba6kggatX1ee6dJFIcEJl/view?usp=sharing).
 The directory structure should look like
 ```
 food-101/
 |–– images/
 |–– license_agreement.txt
 |–– meta/
 |–– README.txt
 |–– split_zhou_Food101.json
 ```
 ### FGVCAircraft
 - Download the data from https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz.
 - Extract `fgvc-aircraft-2013b.tar.gz` and keep only `data/`.
 - Move `data/` to `$DATA` and rename the folder to `fgvc_aircraft/`.
 The directory structure should look like
 ```
 fgvc_aircraft/
 |–– images/
 |–– ... # a bunch of .txt files
 ```
 ### SUN397
 - Create a folder named  `sun397/` under `$DATA`.
 - Download the images http://vision.princeton.edu/projects/2010/SUN/SUN397.tar.gz.
 - Download the partitions https://vision.princeton.edu/projects/2010/SUN/download/Partitions.zip.
 - Extract these files under `$DATA/sun397/`.
 - Download `split_zhou_SUN397.json` from this [link](https://drive.google.com/file/d/1y2RD81BYuiyvebdN-JymPfyWYcd8_MUq/view?usp=sharing).
 The directory structure should look like
 ```
 sun397/
 |–– SUN397/
 |–– split_zhou_SUN397.json
 |–– ... # a bunch of .txt files
 ```
 ### DTD
 - Download the dataset from https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz and extract it to `$DATA`. This should lead to `$DATA/dtd/`.
 - Download `split_zhou_DescribableTextures.json` from this [link](https://drive.google.com/file/d/1u3_QfB467jqHgNXC00UIzbLZRQCg2S7x/view?usp=sharing).
 The directory structure should look like
 ```
 dtd/
 |–– images/
 |–– imdb/
 |–– labels/
 |–– split_zhou_DescribableTextures.json
 ```
 ### EuroSAT
 - Create a folder named `eurosat/` under `$DATA`.
 - Download the dataset from http://madm.dfki.de/files/sentinel/EuroSAT.zip and extract it to `$DATA/eurosat/`.
 - Download `split_zhou_EuroSAT.json` from [here](https://drive.google.com/file/d/1Ip7yaCWFi0eaOFUGga0lUdVi_DDQth1o/view?usp=sharing).
 The directory structure should look like
 ```
 eurosat/
 |–– 2750/
 |–– split_zhou_EuroSAT.json
 ```
 ### UCF101
 - Create a folder named `ucf101/` under `$DATA`.
 - Download the zip file `UCF-101-midframes.zip` from [here](https://drive.google.com/file/d/10Jqome3vtUA2keJkNanAiFpgbyC9Hc2O/view?usp=sharing) and extract it to `$DATA/ucf101/`. This zip file contains the extracted middle video frames.
 - Download `split_zhou_UCF101.json` from this [link](https://drive.google.com/file/d/1I0S0q91hJfsV9Gf4xDIjgDq4AqBNJb1y/view?usp=sharing).
 The directory structure should look like
 ```
 ucf101/
 |–– UCF-101-midframes/
 |–– split_zhou_UCF101.json
 ```
 ### ImageNetV2
 - Create a folder named `imagenetv2/` under `$DATA`.
 - Go to this github repo https://github.com/modestyachts/ImageNetV2.
 - Download the matched-frequency dataset from https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz and extract it to `$DATA/imagenetv2/`.
 - Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenetv2/`.
 The directory structure should look like
 ```
 imagenetv2/
 |–– imagenetv2-matched-frequency-format-val/
 |–– classnames.txt
 ```
 ### ImageNet-Sketch
 - Download the dataset from https://github.com/HaohanWang/ImageNet-Sketch.
 - Extract the dataset to `$DATA/imagenet-sketch`.
 - Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-sketch/`.
 The directory structure should look like
 ```
 imagenet-sketch/
 |–– images/ # contains 1,000 folders whose names have the format of n*
 |–– classnames.txt
 ```
 ### ImageNet-A
 - Create a folder named `imagenet-adversarial/` under `$DATA`.
 - Download the dataset from https://github.com/hendrycks/natural-adv-examples and extract it to `$DATA/imagenet-adversarial/`.
 - Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-adversarial/`.
 The directory structure should look like
 ```
 imagenet-adversarial/
 |–– imagenet-a/ # contains 200 folders whose names have the format of n*
 |–– classnames.txt
 ```
 ### ImageNet-R
 - Create a folder named `imagenet-rendition/` under `$DATA`.
 - Download the dataset from https://github.com/hendrycks/imagenet-r and extract it to `$DATA/imagenet-rendition/`.
 - Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-rendition/`.
 The directory structure should look like
 ```
 imagenet-rendition/
 |–– imagenet-r/ # contains 200 folders whose names have the format of n*
 |–– classnames.txt
 ```
--- a/docs/EVAL.md
+++ b/docs/EVAL.md
@@ -0,0 +1,149 @@
 # Evaluating and Reproducing PromptSRC Results
 We provide bash scripts in [scripts/](../scripts) directory for evaluating PromptSRC and independent V-L prompting baseline using the provided pre-trained model checkpoints.
 Make sure to update the `DATA` variable with dataset path in the script file and run the commands from the main directory `PromptSRC/`.
 Below we provide the pre-trained models evaluation instructions for PromptSRC. The same instructions applies for reproducing results for the baseline *independent V-L prompting* and MaPLe.
 ## PromptSRC
 #### (1) Base-to-Novel class generalization setting
 The base-to-novel PromptSRC configuration is provided in config file at `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml`. No hyper-parameters or other settings should be changed in the config file during evaluation of pre-trained models. 
 We show an example to reproduce results for imagenet. Follow the instructions below to reproduce results using our pre-trained model weights:
 * Download the zipped folder containing base-to-novel generalization pre-trained weights for a single dataset from this [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/Em_3tkSj6T9AmhVjmzKTL3gBYNehhvfJl8ke2pU3U0nabA?e=9ecjQA). After unzipping, the directory should look like this:
 ```
 imagenet
 |–– base/
 |   |–– seed1/
 |   |–– seed2/
 |   |–– seed3/
 ```
 Now use the evaluation script `scripts/promptsrc/reproduce_base2novel_setting.sh` and run the commands below to calculate the results over 3 seeds:
 ```bash
 # Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
 # evaluate on base and novel classes for SEED1
 bash scripts/promptsrc/reproduce_base2novel_setting.sh imagenet 1 /path/to/imagenet/weights/folder
 # evaluate on base and novel classes for SEED2
 bash scripts/promptsrc/reproduce_base2novel_setting.sh imagenet 2 /path/to/imagenet/weights/folder
 # evaluate on base and novel classes for SEED3
 bash scripts/promptsrc/reproduce_base2novel_setting.sh imagenet 3 /path/to/imagenet/weights/folder
 ```
 This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
 ```bash
 # prints averaged results for base classes
 python output/base2new/test_base/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
 # prints averaged results for novel classes
 python output/base2new/test_new/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
 ```
 The same above steps can be repeated for other individual datasets by providing respective dataset name and checkpoints path.
 #### (2) Cross-dataset and domain generalization setting
 In cross-dataset and domain generalization setting, we first train PromptSRC on ImageNet-1k in few-shot manner with 16 shots for all 3 seeds and then evaluate the trained model directly on cross-datasets and out-of-distribution datasets.
 We provide the instructions below to reproduce cross-datasets and domain generalization results using our pre-trained imagenet model weights for PromptSRC:
 * Download the zipped folder containing pre-trained weights for imagenet from this [link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/Ekr9qF0cSaVDr0X6OlP2JAEBG1xjlTMjHNLc28g1SjwW-w?e=AA5ABi). After unzipping, the directory should look like this:
 ```
 imagenet
 |–– seed1/
 |–– seed2/
 |–– seed3/
 ```
 Now use the evaluation script `scripts/promptsrc/reproduce_xd.sh` and run the commands below to calculate the results for food101 dataset over 3 seeds:
 ```bash
 # Other possible dataset values for cross-datasets includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
 # possible dataset values for domain generalization benchmark includes [imagenetv2, imagenet_sketch, imagenet_a, imagenet_r]
 # evaluate on given dataset for SEED1
 bash scripts/promptsrc/reproduce_xd.sh food101 1 /path/to/imagenet/weights/folder
 # evaluate on given dataset for SEED2
 bash scripts/promptsrc/reproduce_xd.sh food101 2 /path/to/imagenet/weights/folder
 # evaluate on given dataset for SEED3
 bash scripts/promptsrc/reproduce_xd.sh food101 3 /path/to/imagenet/weights/folder
 ```
 This should evaluate and save the log files in `output/` directory. To obtain the results averaged over 3 seeds, run:
 ```bash
 # prints averaged results for food101 dataset
 python parse_test_res.py output/evaluation/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets_16shots/food101 --test-log
 ```
 The same above steps can be repeated for other individual datasets by providing respective dataset name and checkpoints path.
 #### (3) Few-shot setting
 In this setting, PromptSRC is trained on all classes individual datasets with different few-shot splits (K = 1, 2, 4, 8, 16). The PromptSRC config for few-shot setting is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot.yaml`. 
 Follow the instructions below to reproduce PromptSRC few-shot setting results using our pre-trained models:
 Now use the evaluation script `scripts/promptsrc/reproduce_few_shot.sh` and run the commands below to calculate the results for imagenet dataset over 3 seeds:
 ```bash
 # reproduce_few_shot.sh calculates results for all 3 seeds for a given K
 # Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
 # evaluate on given dataset for K=1 shot
 bash scripts/promptsrc/reproduce_few_shot.sh food101 1 /path/to/imagenet/weights/folder
 # evaluate on given dataset for K=2 shot
 bash scripts/promptsrc/reproduce_few_shot.sh food101 2 /path/to/imagenet/weights/folder
 # evaluate on given dataset for K=4 shot
 bash scripts/promptsrc/reproduce_few_shot.sh food101 4 /path/to/imagenet/weights/folder
 # evaluate on given dataset for K=8 shot
 bash scripts/promptsrc/reproduce_few_shot.sh food101 8 /path/to/imagenet/weights/folder
 # evaluate on given dataset for K=16 shot
 bash scripts/promptsrc/reproduce_few_shot.sh food101 16 /path/to/imagenet/weights/folder
 ```
 This should evaluate and save the log files in `output/` directory. To obtain the results averaged over 3 seeds for all shots, run:
 ```bash
 # prints averaged results for food101 dataset for K=1
 python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_1shots/food101 --test-log
 # prints averaged results for food101 dataset for K=2
 python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_2shots/food101 --test-log
 # prints averaged results for food101 dataset for K=4
 python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_4shots/food101 --test-log
 # prints averaged results for food101 dataset for K=8
 python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_8shots/food101 --test-log
 # prints averaged results for food101 dataset for K=16
 python parse_test_res.py output/few_shot/food101/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot_16shots/food101 --test-log
 ```
 The same above steps can be repeated for other individual datasets by providing respective dataset name and checkpoints path.
 <br>
 ## Training and Evaluating the independent V-L prompting baseline results
 For IVLP baseline method, we provide its corresponding default configs and evaluation scripts as follows.
 ```
 configs
 |–– datasets/
 |–– trainers/
 |   |–– CoCoOp/
 |   |–– CoOp/
 |   |–– MaPLe/
 |   |–– IVLP/
 |   |–– PromptSRC/
 ```
 ```
 scripts
 |–– cocoop/
 |–– coop/
 |–– maple/
 |–– independent-vlp/
 |–– promptsrc/
 ```
 Please use the corresponding config and script files and follow the same instructions as provided for PromptSRC in order to evaluate and reproduce results of IVLP baseline approach. The pretrained weights for IVLP baseline are provided [at this link](https://mbzuaiac-my.sharepoint.com/:f:/g/personal/syed_wasim_mbzuai_ac_ae/EuIwh-yMh_JBqB2Y_o8Jl14BPDKDRHC0JBPE1BugIeZiSQ?e=oJnJwy). 
 This repository also supports using official [CoOp](CoOp.md) and [Co-CoOp](Co-CoOp.md) configs and models.
--- a/docs/INSTALL.md
+++ b/docs/INSTALL.md
@@ -0,0 +1,48 @@
 # Installation
 ### Acknowledgement: This readme file for installing datasets is modified from [MaPLe's](https://github.com/muzairkhattak/multimodal-prompt-learning) official repository.
 This codebase is tested on Ubuntu 20.04.2 LTS with python 3.8. Follow the below steps to create environment and install dependencies.
 * Setup conda environment (recommended).
 ```bash
 # Create a conda environment
 conda create -y -n promptsrc python=3.8
 # Activate the environment
 conda activate promptsrc
 # Install torch (requires version >= 1.8.1) and torchvision
 # Please refer to https://pytorch.org/ if you need a different cuda version
 pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
 ```
 * Install dassl library.
 ```bash
 # Instructions borrowed from https://github.com/KaiyangZhou/Dassl.pytorch#installation
 # Clone this repo
 git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
 cd Dassl.pytorch/
 # Install dependencies
 pip install -r requirements.txt
 # Install this library (no need to re-build if the source code is modified)
 python setup.py develop
 cd ..
 ```
 * Clone PromptSRC code repository and install requirements
 ```bash
 # Clone PromptSRC code base
 git clone https://github.com/muzairkhattak/PromptSRC.git
 cd PromptSRC/
 # Install requirements
 pip install -r requirements.txt
 # Update setuptools package 
 pip install setuptools==59.5.0
 ```
--- a/docs/MaPLe.md
+++ b/docs/MaPLe.md
@@ -0,0 +1,211 @@
 # Training and Evaluation
 We provide bash scripts in [scripts/](../scripts) for each prompting variant including MaPLe, vision, language and independent V-L prompting.
 Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `multimodal-prompt-learning/`.
 Below we provide training and evaluation instructions for MaPLe. The same instructions applies for all other variants including *Vision (VPT), Language and independent V-L prompting*.
 ### Training time and compute
 We train MaPLe on each dataset with a batch size of 4 using a **single** NVIDIA A100 GPU.
 Training MaPle on ImageNet for 5 epochs takes 1 hour for a single seed. So results for 3 seeds takes around 3 hours. For all remaining 10 datasets, it combinedly takes around 4 hours (for all 3 seeds) on a single A100 GPU. To ease reproduction of MaPLe results, we have provided [training logs](https://drive.google.com/drive/folders/1EvuvgR8566bL0T7ucvAL3LFVwuUPMRas?usp=sharing) for all datasets. 
 ## MaPLe
 #### (1) Base-to-Novel class generalization setting
 The default training settings are provided in config file at `configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml`. All hyper-parameters such as prompt length, prompt depth, etc., can be modified using this config file.
 Below, we provide instructions to train MaPLe on imagenet. 
 ```bash
 # Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
 # seed=1
 # trains and evaluates on base classes
 bash scripts/maple/base2new_train_maple.sh imagenet 1
 # evaluates on novel classes
 bash scripts/maple/base2new_test_maple.sh imagenet 1
 # seed=2
 # trains and evaluates on base classes
 bash scripts/maple/base2new_train_maple.sh imagenet 2
 # evaluates on novel classes
 bash scripts/maple/base2new_test_maple.sh imagenet 2
 # seed=3
 # trains and evaluates on base classes
 bash scripts/maple/base2new_train_maple.sh imagenet 3
 # evaluates on novel classes
 bash scripts/maple/base2new_test_maple.sh imagenet 3
 ```
 #### Averaging results over 3 seeds: 
 Once the above trainings and evaluations are completed, the `output/` directory should have the following structure:
 ```
 output
 |–– base2new/
 |   |–– test_new/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– MaPLe/
 |   |   |   |   |   |–– vit_b16_c2_ep5_batch4_2ctx/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 |   |–– train_base/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– MaPLe/
 |   |   |   |   |   |–– vit_b16_c2_ep5_batch4_2ctx/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 ```
 Now use the script `parse_test_res.py` and run the commands below to calculate the averaged results:
 ```bash
 # prints averaged results for base classes
 python parse_test_res.py output/base2new/train_base/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx
 # averaged results for novel classes
 python parse_test_res.py output/base2new/test_new/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx --test-log
 ```
 The above steps can be repeated for other individual datasets.
 #### Reproducing results using pre-trained weights for base-to-novel generalization setting
 We show an example to reproduce results for imagenet. Follow the instructions below to reproduce results using our pre-trained model weights:
 * Download the zipped folder containing pre-trained weights for a single dataset from this [link](https://drive.google.com/drive/folders/1-tB6BUDBzs9CXTOJ7p5hM4Svq1tL_mGz?usp=sharing). Additionally we also provide the log files for both training and evaluation. After unzipping, the directory should look like this:
 ```
 imagenet
 |–– base/
 |   |–– seed1/
 |   |–– seed2/
 |   |–– seed3/
 |–– novel/
 |   |–– seed1/
 |   |–– seed2/
 |   |–– seed3/
 ```
 Now use the evaluation script `scripts/maple/reproduce_maple.sh` and run the commands below to calculate the averaged results:
 ```bash
 # evaluate on base and novel classes for SEED1
 bash scripts/maple/reproduce_maple.sh imagenet 1 /path/to/imagenet/weights/folder
 # evaluate on base and novel classes for SEED2
 bash scripts/maple/reproduce_maple.sh imagenet 2 /path/to/imagenet/weights/folder
 # evaluate on base and novel classes for SEED3
 bash scripts/maple/reproduce_maple.sh imagenet 3 /path/to/imagenet/weights/folder
 ```
 This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
 ```bash
 # prints averaged results for base classes
 python parse_test_res.py output/base2new/train_base/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx
 # averaged results for novel classes
 python parse_test_res.py output/base2new/test_new/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx --test-log
 ```
 #### (2) Cross-Dataset Transfer
 We provide instructions to train MaPLe on imageNet using all 1000 classes and then evaluating it directly on new downstream datasets.
 We provide cross-dataset config for MaPLe: `configs/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets.yaml`.
 * Firstly, train MaPLe on imagenet in few-shot manner (for all 3 seeds).
 ```bash
 # seed=1 
 bash scripts/maple/xd_train_maple.sh imagenet 1
 # seed=2 
 bash scripts/maple/xd_train_maple.sh imagenet 2
 # seed=3 
 bash scripts/maple/xd_train_maple.sh imagenet 3
 ```
 * Now evaluate imageNet model on downstream datasets.
 ```bash
 for SEED in 1 2 3
 do
    bash scripts/maple/xd_test_maple.sh caltech101 ${SEED}
    bash scripts/maple/xd_test_maple.sh oxford_pets ${SEED}
    bash scripts/maple/xd_test_maple.sh stanford_cars ${SEED}
 done
 ```
 #### (3) Domain Generalization 
 We use imagenet trained MaPLe model for domain generalization experiments. The steps are similar to above cross-dataset experiments, however, model is evaluated on imagenet variants.
 * Evaluate imageNet model on variants of imagenet (domain shift datasets).
 ```bash
 for SEED in 1 2 3
 do
    bash scripts/maple/xd_test_maple.sh imagenetv2 ${SEED}
    bash scripts/maple/xd_test_maple.sh imagenet_sketch ${SEED}
    bash scripts/maple/xd_test_maple.sh imagenet_a ${SEED}
    bash scripts/maple/xd_test_maple.sh imagenet_r ${SEED}
 done
 ```
 You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
 <br>
 #### Reproducing official results for cross-dataset and domain generalization setting
 We provide the instructions below to reproduce domain-generalization and cross-datasets results using our pre-trained imagenet model weights for MaPLe:
 * Download the zipped folder containing pre-trained weights for imagenet from this [link](https://drive.google.com/drive/folders/1bmhvmNZc13WJ5U71qt0t8k91wyuoemVF?usp=sharing). Additionally, we also provide the log files for both training and evaluation. After unzipping, the directory should look like this:
 ```
 imagenet
 |–– seed1/
 |–– seed2/
 |–– seed3/
 ```
 Now use the evaluation script `scripts/maple/reproduce_maple_xd.sh` and run the commands below to calculate the averaged results:
 ```bash
 # evaluate on given dataset for SEED1
 bash scripts/maple/reproduce_maple_xd.sh food101 1 /path/to/imagenet/weights/folder
 # evaluate on given dataset for SEED2
 bash scripts/maple/reproduce_maple_xd.sh food101 2 /path/to/imagenet/weights/folder
 # evaluate on given dataset for SEED3
 bash scripts/maple/reproduce_maple_xd.sh food101 3 /path/to/imagenet/weights/folder
 ```
 This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
 ```bash
 # prints averaged results for food101 dataset
 python parse_test_res.py output/evaluation/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets_16shots/food101 --test-log
 ```
 #### Training and Evaluating other variants
 For other variants including vision, language and independent V-L prompting techniques, we provide their corresponding configs and scripts as follows.
 ```
 configs
 |–– datasets/
 |–– trainers/
 |   |–– CoCoOp/
 |   |–– CoOp/
 |   |–– MaPLe/
 |   |–– IVLP/
 |   |–– VPT/
 ```
 ```
 scripts
 |–– cocoop/
 |–– coop/
 |–– language-prompting/
 |–– maple/
 |–– independent-vlp/
 ```
 Please use the corresponding config and script files and follow the same instructions as provided for MaPLe in order to train and evaluate the other variants. Same instructions can be followed to reproduce results of other variants using provided pretrained weights.
--- a/docs/TRAIN.md
+++ b/docs/TRAIN.md
@@ -0,0 +1,169 @@
 # PromptSRC Training
 We provide bash scripts in [scripts/](../scripts) for training PromptSRC and independent V-L prompting baseline.
 Make sure to update the `DATA` variable with dataset path in the script file and run the commands from the main directory `PromptSRC/`.
 Below we provide training and testing instructions for PromptSRC. The same instructions are applicable for the baseline *independent V-L prompting* approach, MaPLe, CoOp and CoCoOp.
 ### Training time and compute
 We train PromptSRC on each dataset with a batch size of 4 using a **single** NVIDIA A100 GPU.
 Training PromptSRC on ImageNet for 20 epochs takes around 6 hours for a single seed. So results for 3 seeds takes around 18 hours. For all remaining 10 datasets, it combinedly takes around around 8 hours (for all 3 seeds) on a single A100 GPU. 
 ## PromptSRC
 #### (1) Base-to-Novel class generalization setting
 The base-to-novel PromptSRC configuration is provided in config file at `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx.yaml`. All hyper-parameters such as GPA STD, GPA Mean, SCL loss weights coefficients, prompt length and prompt depth etc., can be modified using this config file.
 Run the commands below to train PromptSRC on ImageNet.
 ```bash
 # Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
 # seed=1
 # trains and evaluates on base classes
 bash scripts/promptsrc/base2new_train.sh imagenet 1
 # evaluates on novel classes
 bash scripts/promptsrc/base2new_test.sh imagenet 1
 # seed=2
 # trains and evaluates on base classes
 bash scripts/promptsrc/base2new_train.sh imagenet 2
 # evaluates on novel classes
 bash scripts/promptsrc/base2new_test.sh imagenet 2
 # seed=3
 # trains and evaluates on base classes
 bash scripts/promptsrc/base2new_train.sh imagenet 3
 # evaluates on novel classes
 bash scripts/promptsrc/base2new_test.sh imagenet 3
 ```
 #### Averaging results over 3 seeds: 
 Once the above trainings and evaluations are completed, the `output/` directory should have the following structure:
 ```
 output
 |–– base2new/
 |   |–– test_new/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– PromptSRC/
 |   |   |   |   |   |–– vit_b16_c2_ep20_batch4_4+4ctx/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 |   |–– train_base/
 |   |   |–– imagenet/
 |   |   |   |–– shots_16/
 |   |   |   |   |–– PromptSRC/
 |   |   |   |   |   |–– vit_b16_c2_ep20_batch4_4+4ctx/
 |   |   |   |   |   |   |–– seed1/
 |   |   |   |   |   |   |–– seed2/
 |   |   |   |   |   |   |–– seed3/
 ```
 Now use the script `parse_test_res.py` and run the commands below to calculate the averaged results:
 ```bash
 # prints averaged results for base classes
 python output/base2new/train_base/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
 # averaged results for novel classes
 python output/base2new/test_new/imagenet/shots_16/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx --test-log
 ```
 The above steps can be repeated for other individual datasets.
 #### (2) Cross-Dataset Transfer setting
 We provide instructions to train PromptSRC on ImageNet using all 1000 classes with 16 shots and then evaluating it directly on new downstream datasets.
 The corresponding cross-dataset config for PromptSRC is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets.yaml`. All PromptSRC hyper-parameters can be modified in this config file.
 * Firstly, train PromptSRC on imagenet in few-shot manner (for all 3 seeds).
 ```bash
 # seed=1 
 bash scripts/promptsrc/xd_train.sh imagenet 1
 # seed=2 
 bash scripts/promptsrc/xd_train.sh imagenet 2
 # seed=3 
 bash scripts/promptsrc/xd_train.sh imagenet 3
 ```
 * Now directly evaluate the ImageNet trained model on downstream cross-datasets.
 ```bash
 # Other possible dataset values includes [imagenet, food101, dtd, ucf101, oxford_flowers, fgvc_aircraft, sun397, eurosat]
 for SEED in 1 2 3
 do
    bash scripts/promptsrc/xd_test.sh caltech101 ${SEED}
    bash scripts/promptsrc/xd_test.sh oxford_pets ${SEED}
    bash scripts/promptsrc/xd_test.sh stanford_cars ${SEED}
 done
 ```
 You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
 #### (3) Domain Generalization setting
 We use the same ImageNet trained PromptSRC model for domain generalization experiments. The steps are similar to above cross-dataset experiments, however, the trained model is now evaluated on ImageNet variants.
 The corresponding domain generalization config for PromptSRC is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep20_batch4_4+4ctx_cross_datasets.yaml`.
 * Evaluate ImageNet model on different variants of ImageNet (datasets with domain shifts).
 ```bash
 for SEED in 1 2 3
 do
    bash scripts/promptsrc/xd_test.sh imagenetv2 ${SEED}
    bash scripts/promptsrc/xd_test.sh imagenet_sketch ${SEED}
    bash scripts/promptsrc/xd_test.sh imagenet_a ${SEED}
    bash scripts/promptsrc/xd_test.sh imagenet_r ${SEED}
 done
 ```
 You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
 #### (4) Few-shot setting 
 In this setting, PromptSRC is trained on all classes individual datasets with different few-shot splits (K = 1, 2, 4, 8, 16). The corresponding few-shot setting config for PromptSRC is available at: `configs/trainers/PromptSRC/vit_b16_c2_ep50_batch4_4+4ctx_few_shot.yaml`.
 Now use the training script `scripts/promptsrc/few_shot.sh` and run the commands below to calculate the results for imagenet dataset for all shots over 3 seeds:
 ```bash
 # Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
 # train and test on given dataset for K=1 shot
 bash scripts/promptsrc/few_shot.sh imagenet 1 
 # train and test on given dataset for K=2 shot
 bash scripts/promptsrc/few_shot.sh imagenet 2 
 # train and test on given dataset for K=4 shot
 bash scripts/promptsrc/few_shot.sh imagenet 4 
 # train and test on given dataset for K=8 shot
 bash scripts/promptsrc/few_shot.sh imagenet 8 
 # train and test on given dataset for K=17 shot
 bash scripts/promptsrc/few_shot.sh imagenet 16
 ```
 You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
 <br>
 #### Training and testing independent V-L prompting baseline approach
 For training independent V-L prompting baseline approach, we provide their corresponding configs and scripts as follows.
 ```
 configs
 |–– datasets/
 |–– trainers/
 |   |–– CoCoOp/
 |   |–– CoOp/
 |   |–– IVLP/
 |   |–– PromptSRC/
 ```
 ```
 scripts
 |–– cocoop/
 |–– coop/
 |–– promptsrc/
 |–– independent-vlp/
 ```
 Please use the corresponding config and script files and follow the same instructions as provided for PromptSRC for training and testing. 
 This repository also supports using official [MaPLe](MaPLe.md), [CoOp](CoOp.md) and [Co-CoOp](Co-CoOp.md) configs and models.
--- a/docs/main_figure.png
+++ b/docs/main_figure.png
--- a/interpret_prompts/clip_words.csv
+++ b/interpret_prompts/clip_words.csv
--- a/interpret_prompts/interpret_prompt.py
+++ b/interpret_prompts/interpret_prompt.py
@@ -0,0 +1,84 @@
 import os
 import sys
 import argparse
 import torch
 from clip.simple_tokenizer import SimpleTokenizer
 from clip import clip
 # "ViT-B/16"
 # "RN50"
 def load_clip_to_cpu(backbone_name="ViT-B/16"):
    url = clip._MODELS[backbone_name]
    model_path = clip._download(url)
    try:
        # loading JIT archive
        model = torch.jit.load(model_path, map_location="cpu").eval()
        state_dict = None
    except RuntimeError:
        state_dict = torch.load(model_path, map_location="cpu")
    model = clip.build_model(state_dict or model.state_dict())
    return model
 # parser = argparse.ArgumentParser()
 # parser.add_argument("fpath", type=str, help="Path to the learned prompt")
 # parser.add_argument("topk", type=int, help="Select top-k similar words")
 # args = parser.parse_args()
 fpath = "./compound_prompt_weights/train_base/food101/shots_16/cocoop/vit_b16_c4_ep10_batch1_ctxv1/seed1/prompt_learner/model.pth.tar-5"
 topk = 10
 assert os.path.exists(fpath)
 print(f"Return the top-{topk} matched words")
 tokenizer = SimpleTokenizer()
 clip_model = load_clip_to_cpu()
 token_embedding = clip_model.token_embedding.weight
 print(f"Size of token embedding: {token_embedding.shape}")
 prompt_learner = torch.load(fpath, map_location="cpu")["state_dict"]
 # Extract the input tokens
 ctx = prompt_learner["prompt_learner.ctx"]
 ctx = ctx.float()
 # Now extract the intermediate tokens
 intermediate_embeddings = []
 depth = 9 - 1
 for i in range(depth):
    # Now extract the prompt embeddings and store it
    query = 'prompt_learner.compound_prompts_text.' + str(i)
    temp = prompt_learner[query].float()
    intermediate_embeddings.append(temp)
 print(f"Size of context: {ctx.shape}")
 # Now repeat this for all layer context embeddings
 all_layer_ctx = [ctx] + intermediate_embeddings
 for idx, single_ctx in enumerate(all_layer_ctx):
    print("SHOWING RESULTS FOR CTX Vectors of Layer: ", idx + 1)
    ctx = single_ctx
    if ctx.dim() == 2:
        # Generic context
        distance = torch.cdist(ctx, token_embedding)
        print(f"Size of distance matrix: {distance.shape}")
        sorted_idxs = torch.argsort(distance, dim=1)
        sorted_idxs = sorted_idxs[:, :topk]
        for m, idxs in enumerate(sorted_idxs):
            words = [tokenizer.decoder[idx.item()] for idx in idxs]
            dist = [f"{distance[m, idx].item():.4f}" for idx in idxs]
            print(f"{m+1}: {words} {dist}")
    elif ctx.dim() == 3:
        # Class-specific context
        raise NotImplementedError
    print("##############################")
    print("##############################")
--- a/lpclip/README.md
+++ b/lpclip/README.md
@@ -0,0 +1,17 @@
 # Linear Probe CLIP
 To run linear probe baselines, make sure that your current working directory is `lpclip/`.
 Step 1: Extract Features using the CLIP Image Encoder
 ```bash
 sh feat_extractor.sh
 ```
 Step 2: Train few-shot linear probe
 ```bash
 sh linear_probe.sh
 ```
 We follow the instructions stated in the Appendix A3 (pp.38) of [the original CLIP paper](https://arxiv.org/pdf/2103.00020.pdf), with a careful hyperparameter sweep.
 Note: please pull the latest Dassl (version >= `606a2c6`).
--- a/lpclip/feat_extractor.py
+++ b/lpclip/feat_extractor.py
@@ -0,0 +1,189 @@
 import os, argparse
 import numpy as np
 import torch
 import sys
 sys.path.append(os.path.abspath(".."))
 from datasets.oxford_pets import OxfordPets
 from datasets.oxford_flowers import OxfordFlowers
 from datasets.fgvc_aircraft import FGVCAircraft
 from datasets.dtd import DescribableTextures
 from datasets.eurosat import EuroSAT
 from datasets.stanford_cars import StanfordCars
 from datasets.food101 import Food101
 from datasets.sun397 import SUN397
 from datasets.caltech101 import Caltech101
 from datasets.ucf101 import UCF101
 from datasets.imagenet import ImageNet
 from datasets.imagenetv2 import ImageNetV2
 from datasets.imagenet_sketch import ImageNetSketch
 from datasets.imagenet_a import ImageNetA
 from datasets.imagenet_r import ImageNetR
 from dassl.utils import setup_logger, set_random_seed, collect_env_info
 from dassl.config import get_cfg_default
 from dassl.data.transforms import build_transform
 from dassl.data import DatasetWrapper
 import clip
 # import pdb; pdb.set_trace()
 def print_args(args, cfg):
    print("***************")
    print("** Arguments **")
    print("***************")
    optkeys = list(args.__dict__.keys())
    optkeys.sort()
    for key in optkeys:
        print("{}: {}".format(key, args.__dict__[key]))
    print("************")
    print("** Config **")
    print("************")
    print(cfg)
 def reset_cfg(cfg, args):
    if args.root:
        cfg.DATASET.ROOT = args.root
    if args.output_dir:
        cfg.OUTPUT_DIR = args.output_dir
    if args.trainer:
        cfg.TRAINER.NAME = args.trainer
    if args.backbone:
        cfg.MODEL.BACKBONE.NAME = args.backbone
    if args.head:
        cfg.MODEL.HEAD.NAME = args.head
 def extend_cfg(cfg):
    """
    Add new config variables.
    E.g.
        from yacs.config import CfgNode as CN
        cfg.TRAINER.MY_MODEL = CN()
        cfg.TRAINER.MY_MODEL.PARAM_A = 1.
        cfg.TRAINER.MY_MODEL.PARAM_B = 0.5
        cfg.TRAINER.MY_MODEL.PARAM_C = False
    """
    from yacs.config import CfgNode as CN
    cfg.TRAINER.OURS = CN()
    cfg.TRAINER.OURS.N_CTX = 10  # number of context vectors
    cfg.TRAINER.OURS.CSC = False  # class-specific context
    cfg.TRAINER.OURS.CTX_INIT = ""  # initialize context vectors with given words
    cfg.TRAINER.OURS.WEIGHT_U = 0.1  # weight for the unsupervised loss
 def setup_cfg(args):
    cfg = get_cfg_default()
    extend_cfg(cfg)
    # 1. From the dataset config file
    if args.dataset_config_file:
        cfg.merge_from_file(args.dataset_config_file)
    # 2. From the method config file
    if args.config_file:
        cfg.merge_from_file(args.config_file)
    # 3. From input arguments
    reset_cfg(cfg, args)
    cfg.freeze()
    return cfg
 def main(args):
    cfg = setup_cfg(args)
    if cfg.SEED >= 0:
        print("Setting fixed seed: {}".format(cfg.SEED))
        set_random_seed(cfg.SEED)
    setup_logger(cfg.OUTPUT_DIR)
    if torch.cuda.is_available() and cfg.USE_CUDA:
        torch.backends.cudnn.benchmark = True
    print_args(args, cfg)
    print("Collecting env info ...")
    print("** System info **\n{}\n".format(collect_env_info()))
    ######################################
    #   Setup DataLoader
    ######################################
    dataset = eval(cfg.DATASET.NAME)(cfg)
    if args.split == "train":
        dataset_input = dataset.train_x
    elif args.split == "val":
        dataset_input = dataset.val
    else:
        dataset_input = dataset.test
    tfm_train = build_transform(cfg, is_train=False)
    data_loader = torch.utils.data.DataLoader(
        DatasetWrapper(cfg, dataset_input, transform=tfm_train, is_train=False),
        batch_size=cfg.DATALOADER.TRAIN_X.BATCH_SIZE,
        sampler=None,
        shuffle=False,
        num_workers=cfg.DATALOADER.NUM_WORKERS,
        drop_last=False,
        pin_memory=(torch.cuda.is_available() and cfg.USE_CUDA),
    )
    ########################################
    #   Setup Network
    ########################################
    clip_model, _ = clip.load("RN50", "cuda", jit=False)
    clip_model.eval()
    ###################################################################################################################
    # Start Feature Extractor
    feature_list = []
    label_list = []
    train_dataiter = iter(data_loader)
    for train_step in range(1, len(train_dataiter) + 1):
        batch = next(train_dataiter)
        data = batch["img"].cuda()
        feature = clip_model.visual(data)
        feature = feature.cpu()
        for idx in range(len(data)):
            feature_list.append(feature[idx].tolist())
        label_list.extend(batch["label"].tolist())
    save_dir = os.path.join(cfg.OUTPUT_DIR, cfg.DATASET.NAME)
    os.makedirs(save_dir, exist_ok=True)
    save_filename = f"{args.split}"
    np.savez(
        os.path.join(save_dir, save_filename),
        feature_list=feature_list,
        label_list=label_list,
    )
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--root", type=str, default="", help="path to dataset")
    parser.add_argument("--output-dir", type=str, default="", help="output directory")
    parser.add_argument("--config-file", type=str, default="", help="path to config file")
    parser.add_argument(
        "--dataset-config-file",
        type=str,
        default="",
        help="path to config file for dataset setup",
    )
    parser.add_argument("--num-shot", type=int, default=1, help="number of shots")
    parser.add_argument("--split", type=str, choices=["train", "val", "test"], help="which split")
    parser.add_argument("--trainer", type=str, default="", help="name of trainer")
    parser.add_argument("--backbone", type=str, default="", help="name of CNN backbone")
    parser.add_argument("--head", type=str, default="", help="name of head")
    parser.add_argument("--seed", type=int, default=-1, help="only positive value enables a fixed seed")
    parser.add_argument("--eval-only", action="store_true", help="evaluation only")
    args = parser.parse_args()
    main(args)
--- a/lpclip/feat_extractor.sh
+++ b/lpclip/feat_extractor.sh
@@ -0,0 +1,20 @@
 # sh feat_extractor.sh
 DATA=/path/to/datasets
 OUTPUT='./clip_feat/'
 SEED=1
 # oxford_pets oxford_flowers fgvc_aircraft dtd eurosat stanford_cars food101 sun397 caltech101 ucf101 imagenet
 for DATASET in oxford_pets
 do
    for SPLIT in train val test
    do
        python feat_extractor.py \
        --split ${SPLIT} \
        --root ${DATA} \
        --seed ${SEED} \
        --dataset-config-file ../configs/datasets/${DATASET}.yaml \
        --config-file ../configs/trainers/CoOp/rn50_val.yaml \
        --output-dir ${OUTPUT} \
        --eval-only
    done
 done
--- a/lpclip/linear_probe.py
+++ b/lpclip/linear_probe.py
@@ -0,0 +1,129 @@
 import numpy as np
 import os
 from sklearn.linear_model import LogisticRegression
 import argparse
 parser = argparse.ArgumentParser()
 parser.add_argument("--dataset", type=str, default="", help="path to dataset")
 parser.add_argument("--num_step", type=int, default=8, help="number of steps")
 parser.add_argument("--num_run", type=int, default=10, help="number of runs")
 parser.add_argument("--feature_dir", type=str, default="clip_feat", help="feature dir path")
 args = parser.parse_args()
 dataset = args.dataset
 dataset_path = os.path.join(f"{args.feature_dir}", dataset)
 train_file = np.load(os.path.join(dataset_path, "train.npz"))
 train_feature, train_label = train_file["feature_list"], train_file["label_list"]
 val_file = np.load(os.path.join(dataset_path, "val.npz"))
 val_feature, val_label = val_file["feature_list"], val_file["label_list"]
 test_file = np.load(os.path.join(dataset_path, "test.npz"))
 test_feature, test_label = test_file["feature_list"], test_file["label_list"]
 os.makedirs("report", exist_ok=True)
 val_shot_list = {1: 1, 2: 2, 4: 4, 8: 4, 16: 4}
 for num_shot in [1, 2, 4, 8, 16]:
    test_acc_step_list = np.zeros([args.num_run, args.num_step])
    for seed in range(1, args.num_run + 1):
        np.random.seed(seed)
        print(f"-- Seed: {seed} --------------------------------------------------------------")
        # Sampling
        all_label_list = np.unique(train_label)
        selected_idx_list = []
        for label in all_label_list:
            label_collection = np.where(train_label == label)[0]
            selected_idx = np.random.choice(label_collection, size=num_shot, replace=False)
            selected_idx_list.extend(selected_idx)
        fewshot_train_feature = train_feature[selected_idx_list]
        fewshot_train_label = train_label[selected_idx_list]
        val_num_shot = val_shot_list[num_shot]
        val_selected_idx_list = []
        for label in all_label_list:
            label_collection = np.where(val_label == label)[0]
            selected_idx = np.random.choice(label_collection, size=val_num_shot, replace=False)
            val_selected_idx_list.extend(selected_idx)
        fewshot_val_feature = val_feature[val_selected_idx_list]
        fewshot_val_label = val_label[val_selected_idx_list]
        # search initialization
        search_list = [1e6, 1e4, 1e2, 1, 1e-2, 1e-4, 1e-6]
        acc_list = []
        for c_weight in search_list:
            clf = LogisticRegression(solver="lbfgs", max_iter=1000, penalty="l2", C=c_weight).fit(fewshot_train_feature, fewshot_train_label)
            pred = clf.predict(fewshot_val_feature)
            acc_val = sum(pred == fewshot_val_label) / len(fewshot_val_label)
            acc_list.append(acc_val)
        print(acc_list, flush=True)
        # binary search
        peak_idx = np.argmax(acc_list)
        c_peak = search_list[peak_idx]
        c_left, c_right = 1e-1 * c_peak, 1e1 * c_peak
        def binary_search(c_left, c_right, seed, step, test_acc_step_list):
            clf_left = LogisticRegression(solver="lbfgs", max_iter=1000, penalty="l2", C=c_left).fit(fewshot_train_feature, fewshot_train_label)
            pred_left = clf_left.predict(fewshot_val_feature)
            acc_left = sum(pred_left == fewshot_val_label) / len(fewshot_val_label)
            print("Val accuracy (Left): {:.2f}".format(100 * acc_left), flush=True)
            clf_right = LogisticRegression(solver="lbfgs", max_iter=1000, penalty="l2", C=c_right).fit(fewshot_train_feature, fewshot_train_label)
            pred_right = clf_right.predict(fewshot_val_feature)
            acc_right = sum(pred_right == fewshot_val_label) / len(fewshot_val_label)
            print("Val accuracy (Right): {:.2f}".format(100 * acc_right), flush=True)
            # find maximum and update ranges
            if acc_left < acc_right:
                c_final = c_right
                clf_final = clf_right
                # range for the next step
                c_left = 0.5 * (np.log10(c_right) + np.log10(c_left))
                c_right = np.log10(c_right)
            else:
                c_final = c_left
                clf_final = clf_left
                # range for the next step
                c_right = 0.5 * (np.log10(c_right) + np.log10(c_left))
                c_left = np.log10(c_left)
            pred = clf_final.predict(test_feature)
            test_acc = 100 * sum(pred == test_label) / len(pred)
            print("Test Accuracy: {:.2f}".format(test_acc), flush=True)
            test_acc_step_list[seed - 1, step] = test_acc
            saveline = "{}, seed {}, {} shot, weight {}, test_acc {:.2f}\n".format(dataset, seed, num_shot, c_final, test_acc)
            with open(
                "./report/{}_s{}r{}_details.txt".format(args.feature_dir, args.num_step, args.num_run),
                "a+",
            ) as writer:
                writer.write(saveline)
            return (
                np.power(10, c_left),
                np.power(10, c_right),
                seed,
                step,
                test_acc_step_list,
            )
        for step in range(args.num_step):
            print(
                f"{dataset}, {num_shot} Shot, Round {step}: {c_left}/{c_right}",
                flush=True,
            )
            c_left, c_right, seed, step, test_acc_step_list = binary_search(c_left, c_right, seed, step, test_acc_step_list)
    # save results of last step
    test_acc_list = test_acc_step_list[:, -1]
    acc_mean = np.mean(test_acc_list)
    acc_std = np.std(test_acc_list)
    save_line = "{}, {} Shot, Test acc stat: {:.2f} ({:.2f})\n".format(dataset, num_shot, acc_mean, acc_std)
    print(save_line, flush=True)
    with open(
        "./report/{}_s{}r{}.txt".format(args.feature_dir, args.num_step, args.num_run),
        "a+",
    ) as writer:
        writer.write(save_line)
--- a/lpclip/linear_probe.sh
+++ b/lpclip/linear_probe.sh
@@ -0,0 +1,10 @@
 feature_dir=clip_feat
 for DATASET in OxfordPets
 do
    python linear_probe.py \
    --dataset ${DATASET} \
    --feature_dir ${feature_dir} \
    --num_step 8 \
    --num_run 3
 done
--- a/parse_test_res.py
+++ b/parse_test_res.py
@@ -0,0 +1,174 @@
 """
 Goal
 ---
 1. Read test results from log.txt files
 2. Compute mean and std across different folders (seeds)
 Usage
 ---
 Assume the output files are saved under output/my_experiment,
 which contains results of different seeds, e.g.,
 my_experiment/
    seed1/
        log.txt
    seed2/
        log.txt
    seed3/
        log.txt
 Run the following command from the root directory:
 $ python tools/parse_test_res.py output/my_experiment
 Add --ci95 to the argument if you wanna get 95% confidence
 interval instead of standard deviation:
 $ python tools/parse_test_res.py output/my_experiment --ci95
 If my_experiment/ has the following structure,
 my_experiment/
    exp-1/
        seed1/
            log.txt
            ...
        seed2/
            log.txt
            ...
        seed3/
            log.txt
            ...
    exp-2/
        ...
    exp-3/
        ...
 Run
 $ python tools/parse_test_res.py output/my_experiment --multi-exp
 """
 import re
 import numpy as np
 import os.path as osp
 import argparse
 from collections import OrderedDict, defaultdict
 from dassl.utils import check_isfile, listdir_nohidden
 def compute_ci95(res):
    return 1.96 * np.std(res) / np.sqrt(len(res))
 def parse_function(*metrics, directory="", args=None, end_signal=None):
    print(f"Parsing files in {directory}")
    subdirs = listdir_nohidden(directory, sort=True)
    outputs = []
    for subdir in subdirs:
        fpath = osp.join(directory, subdir, "log.txt")
        assert check_isfile(fpath)
        good_to_go = False
        output = OrderedDict()
        with open(fpath, "r") as f:
            lines = f.readlines()
            for line in lines:
                line = line.strip()
                if line == end_signal:
                    good_to_go = True
                for metric in metrics:
                    match = metric["regex"].search(line)
                    if match and good_to_go:
                        if "file" not in output:
                            output["file"] = fpath
                        num = float(match.group(1))
                        name = metric["name"]
                        output[name] = num
        if output:
            outputs.append(output)
    assert len(outputs) > 0, f"Nothing found in {directory}"
    metrics_results = defaultdict(list)
    for output in outputs:
        msg = ""
        for key, value in output.items():
            if isinstance(value, float):
                msg += f"{key}: {value:.2f}%. "
            else:
                msg += f"{key}: {value}. "
            if key != "file":
                metrics_results[key].append(value)
        print(msg)
    output_results = OrderedDict()
    print("===")
    print(f"Summary of directory: {directory}")
    for key, values in metrics_results.items():
        avg = np.mean(values)
        std = compute_ci95(values) if args.ci95 else np.std(values)
        print(f"* {key}: {avg:.2f}% +- {std:.2f}%")
        output_results[key] = avg
    print("===")
    return output_results
 def main(args, end_signal):
    metric = {
        "name": args.keyword,
        "regex": re.compile(fr"\* {args.keyword}: ([\.\deE+-]+)%"),
    }
    if args.multi_exp:
        final_results = defaultdict(list)
        for directory in listdir_nohidden(args.directory, sort=True):
            directory = osp.join(args.directory, directory)
            results = parse_function(
                metric, directory=directory, args=args, end_signal=end_signal
            )
            for key, value in results.items():
                final_results[key].append(value)
        print("Average performance")
        for key, values in final_results.items():
            avg = np.mean(values)
            print(f"* {key}: {avg:.2f}%")
    else:
        parse_function(
            metric, directory=args.directory, args=args, end_signal=end_signal
        )
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("directory", type=str, help="path to directory")
    parser.add_argument(
        "--ci95", action="store_true", help=r"compute 95\% confidence interval"
    )
    parser.add_argument("--test-log", action="store_true", help="parse test-only logs")
    parser.add_argument(
        "--multi-exp", action="store_true", help="parse multiple experiments"
    )
    parser.add_argument(
        "--keyword", default="accuracy", type=str, help="which keyword to extract"
    )
    args = parser.parse_args()
    end_signal = "Finished training"
    if args.test_log:
        end_signal = "=> result"
    main(args, end_signal)
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,3 @@
 ftfy==6.1.1
 regex
 tqdm
--- a/scripts/cocoop/base2new_test.sh
+++ b/scripts/cocoop/base2new_test.sh
@@ -0,0 +1,54 @@
 #!/bin/bash
 #cd ../..
 # custom config
 DATA="/path/to/dataset/folder"
 TRAINER=CoCoOp
 DATASET=$1
 SEED=$2
 CFG=vit_b16_c4_ep10_batch1_ctxv1
 SHOTS=16
 LOADEP=10
 SUB=new
 COMMON_DIR=${DATASET}/shots_${SHOTS}/${TRAINER}/${CFG}/seed${SEED}
 MODEL_DIR=output/base2new/train_base/${COMMON_DIR}
 DIR=output/base2new/test_${SUB}/${COMMON_DIR}
 if [ -d "$DIR" ]; then
    echo "Evaluating model"
    echo "Results are available in ${DIR}. Resuming..."
    python train.py \
    --root ${DATA} \
    --seed ${SEED} \
    --trainer ${TRAINER} \
    --dataset-config-file configs/datasets/${DATASET}.yaml \
    --config-file configs/trainers/${TRAINER}/${CFG}.yaml \
    --output-dir ${DIR} \
    --model-dir ${MODEL_DIR} \
    --load-epoch ${LOADEP} \
    --eval-only \
    DATASET.NUM_SHOTS ${SHOTS} \
    DATASET.SUBSAMPLE_CLASSES ${SUB}
 else
    echo "Evaluating model"
    echo "Runing the first phase job and save the output to ${DIR}"
    python train.py \
    --root ${DATA} \
    --seed ${SEED} \
    --trainer ${TRAINER} \
    --dataset-config-file configs/datasets/${DATASET}.yaml \
    --config-file configs/trainers/${TRAINER}/${CFG}.yaml \
    --output-dir ${DIR} \
    --model-dir ${MODEL_DIR} \
    --load-epoch ${LOADEP} \
    --eval-only \
    DATASET.NUM_SHOTS ${SHOTS} \
    DATASET.SUBSAMPLE_CLASSES ${SUB}
 fi
--- a/Show More
+++ b/Show More