Upload to Main
This commit is contained in:
99
docs/Co-CoOp.md
Normal file
99
docs/Co-CoOp.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# Conditional Prompt Learning for Vision-Language Models (Co-CoOp, CVPR'22)
|
||||
[](https://arxiv.org/abs/2203.05557)
|
||||
|
||||
We provide the scripts in [scripts/cocoop](../scripts/cocoop) to reproduce Co-CoOp results (CVPR'22).
|
||||
|
||||
Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `MaPLe/`.
|
||||
|
||||
## Generalization From Base to New Classes
|
||||
|
||||
This corresponds to the experiments in Section 4.1, i.e., Table 1.
|
||||
|
||||
You will need both `scripts/cocoop/base2new_train.sh` and `scripts/cocoop/base2new_test.sh`. The former trains a model on bash classes while the latter evaluates the trained model on new classes. Both scripts have two input arguments, i.e., `DATASET` and `SEED`.
|
||||
|
||||
`DATASET` takes as input a dataset name, like `imagenet` or `caltech101`. The valid names are the files' names in `CoOp/configs/datasets/`.
|
||||
|
||||
Below we provide an example on how to evaluate the model on ImageNet.
|
||||
|
||||
```bash
|
||||
# seed=1
|
||||
bash scripts/cocoop/base2new_train.sh imagenet 1
|
||||
bash scripts/cocoop/base2new_test.sh imagenet 1
|
||||
|
||||
# seed=2
|
||||
bash scripts/cocoop/base2new_train.sh imagenet 2
|
||||
bash scripts/cocoop/base2new_test.sh imagenet 2
|
||||
|
||||
# seed=3
|
||||
bash scripts/cocoop/base2new_train.sh imagenet 3
|
||||
bash scripts/cocoop/base2new_test.sh imagenet 3
|
||||
```
|
||||
|
||||
When the evaluation is done, you can use `parse_test_res.py` to automatically calculate the average results. For instance, after you finish the evaluation (including `base2new_train.sh` and `base2new_test.sh`) on ImageNet using the aforementioned commands, you would get
|
||||
|
||||
```
|
||||
output
|
||||
|–– base2new/
|
||||
| |–– test_new/
|
||||
| | |–– imagenet/
|
||||
| | | |–– shots_16/
|
||||
| | | | |–– CoCoOp/
|
||||
| | | | | |–– vit_b16_c4_ep10_batch1_ctxv1/
|
||||
| | | | | | |–– seed1/
|
||||
| | | | | | |–– seed2/
|
||||
| | | | | | |–– seed3/
|
||||
| |–– train_base/
|
||||
| | |–– imagenet/
|
||||
| | | |–– shots_16/
|
||||
| | | | |–– CoCoOp/
|
||||
| | | | | |–– vit_b16_c4_ep10_batch1_ctxv1/
|
||||
| | | | | | |–– seed1/
|
||||
| | | | | | |–– seed2/
|
||||
| | | | | | |–– seed3/
|
||||
```
|
||||
|
||||
Then, to get the average performance on the base classes, run
|
||||
|
||||
```bash
|
||||
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1
|
||||
```
|
||||
|
||||
To get the average performance on the new classes, run
|
||||
|
||||
```bash
|
||||
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/CoCoOp/vit_b16_c4_ep10_batch1_ctxv1 --test-log
|
||||
```
|
||||
|
||||
## Cross-Dataset Transfer
|
||||
|
||||
This corresponds to the experiments in Section 4.2, i.e., Table 2.
|
||||
|
||||
The relevant scripts are `scripts/cocoop/xd_train.sh` and `scripts/cocoop/xd_test.sh` where the `DATASET` variable is set to the default, namely `imagenet`. To train the model, run
|
||||
|
||||
```bash
|
||||
# seed=1
|
||||
bash scripts/cocoop/xd_train.sh 1
|
||||
|
||||
# seed=2
|
||||
bash scripts/cocoop/xd_train.sh 2
|
||||
|
||||
# seed=3
|
||||
bash scripts/cocoop/xd_train.sh 3
|
||||
```
|
||||
|
||||
Then, you evaluate the model on other datasets, e.g.,
|
||||
|
||||
```bash
|
||||
for SEED in 1 2 3
|
||||
do
|
||||
bash scripts/cocoop/xd_test.sh caltech101 ${SEED}
|
||||
bash scripts/cocoop/xd_test.sh oxford_pets ${SEED}
|
||||
bash scripts/cocoop/xd_test.sh stanford_cars ${SEED}
|
||||
done
|
||||
```
|
||||
|
||||
## Domain Generalization
|
||||
|
||||
This corresponds to the experiments in Section 4.3, i.e., Table 3.
|
||||
|
||||
The steps are similar to those discussed in "Cross-Dataset Transfer" except you evaluate the model on the variants of ImageNet, i.e., `imagenetv2`, `imagenet_sketch`, `imagenet_a` and `imagenet_r`.
|
||||
110
docs/CoOp.md
Normal file
110
docs/CoOp.md
Normal file
@@ -0,0 +1,110 @@
|
||||
# Learning to Prompt for Vision-Language Models (CoOp, IJCV'22)
|
||||
[](https://arxiv.org/abs/2109.01134)
|
||||
|
||||
We provide the scripts in [scripts/coop](../scripts/coop) to reproduce CoOp results (IJCV'22).
|
||||
|
||||
Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `MaPLe/`.
|
||||
|
||||
## Few-Shot Learning
|
||||
|
||||
All you need is `scripts/coop/main.sh`, which contains six input arguments.
|
||||
|
||||
`DATASET` takes as input a dataset name, like `imagenet` or `caltech101`. The valid names are the files' names in `configs/datasets/`.
|
||||
|
||||
`CFG` means which config file to use, such as `rn50`, `rn101` or `vit_b32` (see `configs/trainers/coop/`). Note that for ImageNet, we use `configs/trainers/coop/*_ep50.yaml` for all settings (please follow the implementation details shown in the paper).
|
||||
|
||||
Below we provide examples on how to run CoOp on Caltech101.
|
||||
|
||||
**CLIP + CoOp (M=16, end)**:
|
||||
- 1 shot: `bash scripts/coop/main.sh caltech101 rn50_ep50 end 16 1 False`
|
||||
- 2 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 end 16 2 False`
|
||||
- 4 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 end 16 4 False`
|
||||
- 8 shots: `bash scripts/coop/main.sh caltech101 rn50 end 16 8 False`
|
||||
- 16 shots: `bash scripts/coop/main.sh caltech101 rn50 end 16 16 False`
|
||||
|
||||
**CLIP + CoOp (M=16, mid)**:
|
||||
- 1 shot: `bash scripts/coop/main.sh caltech101 rn50_ep50 middle 16 1 False`
|
||||
- 2 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 middle 16 2 False`
|
||||
- 4 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 middle 16 4 False`
|
||||
- 8 shots: `bash scripts/coop/main.sh caltech101 rn50 middle 16 8 False`
|
||||
- 16 shots: `bash scripts/coop/main.sh caltech101 rn50 middle 16 16 False`
|
||||
|
||||
**CLIP + CoOp (M=16, end, CSC)**:
|
||||
- 1 shot: `bash scripts/coop/main.sh caltech101 rn50_ep50 end 16 1 True`
|
||||
- 2 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 end 16 2 True`
|
||||
- 4 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 end 16 4 True`
|
||||
- 8 shots: `bash scripts/coop/main.sh caltech101 rn50 end 16 8 True`
|
||||
- 16 shots: `bash scripts/coop/main.sh caltech101 rn50 end 16 16 True`
|
||||
|
||||
**CLIP + CoOp (M=16, mid, CSC)**:
|
||||
- 1 shot: `bash scripts/coop/main.sh caltech101 rn50_ep50 middle 16 1 True`
|
||||
- 2 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 middle 16 2 True`
|
||||
- 4 shots: `bash scripts/coop/main.sh caltech101 rn50_ep100 middle 16 4 True`
|
||||
- 8 shots: `bash scripts/coop/main.sh caltech101 rn50 middle 16 8 True`
|
||||
- 16 shots: `bash scripts/coop/main.sh caltech101 rn50 middle 16 16 True`
|
||||
|
||||
After the experiments are finished, you can use `parse_test_res.py` to calculate the average results instead of manually looking into the log files. Say the structure of `output/` is
|
||||
|
||||
```
|
||||
output
|
||||
|–– caltech101/
|
||||
| |–– CoOp/
|
||||
| | |–– rn50_16shots/
|
||||
| | | |–– nctx16_cscFalse_ctpend/
|
||||
| | | | |–– seed1/
|
||||
| | | | |–– seed2/
|
||||
| | | | |–– seed3/
|
||||
| | |–– rn50_8shots/
|
||||
| | | |–– nctx16_cscFalse_ctpend/
|
||||
| | | | |–– seed1/
|
||||
| | | | |–– seed2/
|
||||
| | | | |–– seed3/
|
||||
```
|
||||
|
||||
To calculate the average results for the folder `rn50_16shots/nctx16_cscFalse_ctpend/`, you can run
|
||||
|
||||
```bash
|
||||
python parse_test_res.py output/caltech101/CoOp/rn50_16shots/nctx16_cscFalse_ctpend
|
||||
```
|
||||
|
||||
Then, you will see something like this in your terminal
|
||||
|
||||
```bash
|
||||
Parsing files in output/caltech101/CoOp/rn50_16shots/nctx16_cscFalse_ctpend
|
||||
file: output/caltech101/CoOp/rn50_16shots/nctx16_cscFalse_ctpend/seed1/log.txt. accuracy: 91.81%. error: 8.19%.
|
||||
file: output/caltech101/CoOp/rn50_16shots/nctx16_cscFalse_ctpend/seed2/log.txt. accuracy: 92.01%. error: 7.99%.
|
||||
file: output/caltech101/CoOp/rn50_16shots/nctx16_cscFalse_ctpend/seed3/log.txt. accuracy: 92.17%. error: 7.83%.
|
||||
===
|
||||
Summary of directory: output/caltech101/CoOp/rn50_16shots/nctx16_cscFalse_ctpend
|
||||
* accuracy: 92.00% +- 0.15%
|
||||
* error: 8.00% +- 0.15%
|
||||
===
|
||||
```
|
||||
|
||||
**How to initialize the context tokens with pre-trained word vectors?** Specify the words for the parameter `TRAINER.COOP.CTX_INIT` in your config file. In our paper, we use `configs/trainers/rn50_ctxv1.yaml` (give this file to `--config-file`, see `scripts/coop/main.sh`), which uses "a photo of a" as the initialization words.
|
||||
|
||||
**How to visualize nearest words for the learned context tokens?** All you need is `interpret_prompt.py`. Say the learned tokens are saved in `a/b/c/prompt_learner/model.pth.tar` and you would like to see the top-3 nearest words for each token. In this case, run `python interpret_prompt.py a/b/c/prompt_learner/model.pth.tar 3`
|
||||
|
||||
## Robustness to Distribution Shift
|
||||
To reproduce the robustness experiments, you can simply load the models learned on ImageNet and evaluate them on the following datasets: `imagenetv2`, `imagenet-sketch`, `imagenet-a` and `imagenet-r`.
|
||||
|
||||
The command is provided in `scripts/coop/eval.sh`. The key arguments are `--model-dir`, `--load-epoch` and `--eval-only`. `--model-dir` indicates the directory where the models are saved (i.e. the entire folder containing `log.txt`, the tensorboard file and `prompt_learner/`). `--load-epoch` tells the code to load the model saved at a specific epoch, like `--load-epoch 50` for ImageNet (see the [source code](https://github.com/KaiyangZhou/Dassl.pytorch/blob/master/dassl/engine/trainer.py#L169) for more details).
|
||||
|
||||
For example, to evaluate `CLIP + CoOp (M=16, end)` on ImageNetV2, you can do
|
||||
|
||||
```bash
|
||||
# Don't need to use rn5_ep50 here as no training is performed
|
||||
bash scripts/coop/eval.sh imagenetv2 rn50
|
||||
```
|
||||
|
||||
The default setting is `SHOTS=16`. Feel free to modify the script.
|
||||
|
||||
Again, you can use `parse_test_res.py` to automate the calculation of average performance. This time you should append `--test-log`, e.g., `python parse_test_res.py directory --test-log`.
|
||||
|
||||
## Zero-Shot CLIP
|
||||
|
||||
See `**scripts/zsclip/zeroshot.sh**`.
|
||||
|
||||
## Linear Probe CLIP
|
||||
|
||||
Please move to [lpclip/](lpclip/).
|
||||
233
docs/DATASETS.md
Normal file
233
docs/DATASETS.md
Normal file
@@ -0,0 +1,233 @@
|
||||
# How to install datasets
|
||||
|
||||
### Acknowledgement: This readme file for installing datasets has been borrowed directly from [CoOp's](https://github.com/KaiyangZhou/CoOp/blob/main/DATASETS.md) official repository.
|
||||
|
||||
We suggest putting all datasets under the same folder (say `$DATA`) to ease management and following the instructions below to organize datasets to avoid modifying the source code. The file structure looks like
|
||||
|
||||
```
|
||||
$DATA/
|
||||
|–– imagenet/
|
||||
|–– caltech-101/
|
||||
|–– oxford_pets/
|
||||
|–– stanford_cars/
|
||||
```
|
||||
|
||||
If you have some datasets already installed somewhere else, you can create symbolic links in `$DATA/dataset_name` that point to the original data to avoid duplicate download.
|
||||
|
||||
Datasets list:
|
||||
- [ImageNet](#imagenet)
|
||||
- [Caltech101](#caltech101)
|
||||
- [OxfordPets](#oxfordpets)
|
||||
- [StanfordCars](#stanfordcars)
|
||||
- [Flowers102](#flowers102)
|
||||
- [Food101](#food101)
|
||||
- [FGVCAircraft](#fgvcaircraft)
|
||||
- [SUN397](#sun397)
|
||||
- [DTD](#dtd)
|
||||
- [EuroSAT](#eurosat)
|
||||
- [UCF101](#ucf101)
|
||||
- [ImageNetV2](#imagenetv2)
|
||||
- [ImageNet-Sketch](#imagenet-sketch)
|
||||
- [ImageNet-A](#imagenet-a)
|
||||
- [ImageNet-R](#imagenet-r)
|
||||
|
||||
The instructions to prepare each dataset are detailed below. To ensure reproducibility and fair comparison for future work, we provide fixed train/val/test splits for all datasets except ImageNet where the validation set is used as test set. The fixed splits are either from the original datasets (if available) or created by us.
|
||||
|
||||
### ImageNet
|
||||
- Create a folder named `imagenet/` under `$DATA`.
|
||||
- Create `images/` under `imagenet/`.
|
||||
- Download the dataset from the [official website](https://image-net.org/index.php) and extract the training and validation sets to `$DATA/imagenet/images`. The directory structure should look like
|
||||
```
|
||||
imagenet/
|
||||
|–– images/
|
||||
| |–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
|
||||
| |–– val/
|
||||
```
|
||||
- If you had downloaded the ImageNet dataset before, you can create symbolic links to map the training and validation sets to `$DATA/imagenet/images`.
|
||||
- Download the `classnames.txt` to `$DATA/imagenet/` from this [link](https://drive.google.com/file/d/1-61f_ol79pViBFDG_IDlUQSwoLcn2XXF/view?usp=sharing). The class names are copied from [CLIP](https://github.com/openai/CLIP/blob/main/notebooks/Prompt_Engineering_for_ImageNet.ipynb).
|
||||
|
||||
### Caltech101
|
||||
- Create a folder named `caltech-101/` under `$DATA`.
|
||||
- Download `101_ObjectCategories.tar.gz` from http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz and extract the file under `$DATA/caltech-101`.
|
||||
- Download `split_zhou_Caltech101.json` from this [link](https://drive.google.com/file/d/1hyarUivQE36mY6jSomru6Fjd-JzwcCzN/view?usp=sharing) and put it under `$DATA/caltech-101`.
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
caltech-101/
|
||||
|–– 101_ObjectCategories/
|
||||
|–– split_zhou_Caltech101.json
|
||||
```
|
||||
|
||||
### OxfordPets
|
||||
- Create a folder named `oxford_pets/` under `$DATA`.
|
||||
- Download the images from https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz.
|
||||
- Download the annotations from https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz.
|
||||
- Download `split_zhou_OxfordPets.json` from this [link](https://drive.google.com/file/d/1501r8Ber4nNKvmlFVQZ8SeUHTcdTTEqs/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
oxford_pets/
|
||||
|–– images/
|
||||
|–– annotations/
|
||||
|–– split_zhou_OxfordPets.json
|
||||
```
|
||||
|
||||
### StanfordCars
|
||||
- Create a folder named `stanford_cars/` under `$DATA`.
|
||||
- Download the train images http://ai.stanford.edu/~jkrause/car196/cars_train.tgz.
|
||||
- Download the test images http://ai.stanford.edu/~jkrause/car196/cars_test.tgz.
|
||||
- Download the train labels https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.
|
||||
- Download the test labels http://ai.stanford.edu/~jkrause/car196/cars_test_annos_withlabels.mat.
|
||||
- Download `split_zhou_StanfordCars.json` from this [link](https://drive.google.com/file/d/1ObCFbaAgVu0I-k_Au-gIUcefirdAuizT/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
stanford_cars/
|
||||
|–– cars_test\
|
||||
|–– cars_test_annos_withlabels.mat
|
||||
|–– cars_train\
|
||||
|–– devkit\
|
||||
|–– split_zhou_StanfordCars.json
|
||||
```
|
||||
|
||||
### Flowers102
|
||||
- Create a folder named `oxford_flowers/` under `$DATA`.
|
||||
- Download the images and labels from https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz and https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat respectively.
|
||||
- Download `cat_to_name.json` from [here](https://drive.google.com/file/d/1AkcxCXeK_RCGCEC_GvmWxjcjaNhu-at0/view?usp=sharing).
|
||||
- Download `split_zhou_OxfordFlowers.json` from [here](https://drive.google.com/file/d/1Pp0sRXzZFZq15zVOzKjKBu4A9i01nozT/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
oxford_flowers/
|
||||
|–– cat_to_name.json
|
||||
|–– imagelabels.mat
|
||||
|–– jpg/
|
||||
|–– split_zhou_OxfordFlowers.json
|
||||
```
|
||||
|
||||
### Food101
|
||||
- Download the dataset from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract the file `food-101.tar.gz` under `$DATA`, resulting in a folder named `$DATA/food-101/`.
|
||||
- Download `split_zhou_Food101.json` from [here](https://drive.google.com/file/d/1QK0tGi096I0Ba6kggatX1ee6dJFIcEJl/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
food-101/
|
||||
|–– images/
|
||||
|–– license_agreement.txt
|
||||
|–– meta/
|
||||
|–– README.txt
|
||||
|–– split_zhou_Food101.json
|
||||
```
|
||||
|
||||
### FGVCAircraft
|
||||
- Download the data from https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz.
|
||||
- Extract `fgvc-aircraft-2013b.tar.gz` and keep only `data/`.
|
||||
- Move `data/` to `$DATA` and rename the folder to `fgvc_aircraft/`.
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
fgvc_aircraft/
|
||||
|–– images/
|
||||
|–– ... # a bunch of .txt files
|
||||
```
|
||||
|
||||
### SUN397
|
||||
- Create a folder named `sun397/` under `$DATA`.
|
||||
- Download the images http://vision.princeton.edu/projects/2010/SUN/SUN397.tar.gz.
|
||||
- Download the partitions https://vision.princeton.edu/projects/2010/SUN/download/Partitions.zip.
|
||||
- Extract these files under `$DATA/sun397/`.
|
||||
- Download `split_zhou_SUN397.json` from this [link](https://drive.google.com/file/d/1y2RD81BYuiyvebdN-JymPfyWYcd8_MUq/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
sun397/
|
||||
|–– SUN397/
|
||||
|–– split_zhou_SUN397.json
|
||||
|–– ... # a bunch of .txt files
|
||||
```
|
||||
|
||||
### DTD
|
||||
- Download the dataset from https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz and extract it to `$DATA`. This should lead to `$DATA/dtd/`.
|
||||
- Download `split_zhou_DescribableTextures.json` from this [link](https://drive.google.com/file/d/1u3_QfB467jqHgNXC00UIzbLZRQCg2S7x/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
dtd/
|
||||
|–– images/
|
||||
|–– imdb/
|
||||
|–– labels/
|
||||
|–– split_zhou_DescribableTextures.json
|
||||
```
|
||||
|
||||
### EuroSAT
|
||||
- Create a folder named `eurosat/` under `$DATA`.
|
||||
- Download the dataset from http://madm.dfki.de/files/sentinel/EuroSAT.zip and extract it to `$DATA/eurosat/`.
|
||||
- Download `split_zhou_EuroSAT.json` from [here](https://drive.google.com/file/d/1Ip7yaCWFi0eaOFUGga0lUdVi_DDQth1o/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
eurosat/
|
||||
|–– 2750/
|
||||
|–– split_zhou_EuroSAT.json
|
||||
```
|
||||
|
||||
### UCF101
|
||||
- Create a folder named `ucf101/` under `$DATA`.
|
||||
- Download the zip file `UCF-101-midframes.zip` from [here](https://drive.google.com/file/d/10Jqome3vtUA2keJkNanAiFpgbyC9Hc2O/view?usp=sharing) and extract it to `$DATA/ucf101/`. This zip file contains the extracted middle video frames.
|
||||
- Download `split_zhou_UCF101.json` from this [link](https://drive.google.com/file/d/1I0S0q91hJfsV9Gf4xDIjgDq4AqBNJb1y/view?usp=sharing).
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
ucf101/
|
||||
|–– UCF-101-midframes/
|
||||
|–– split_zhou_UCF101.json
|
||||
```
|
||||
|
||||
### ImageNetV2
|
||||
- Create a folder named `imagenetv2/` under `$DATA`.
|
||||
- Go to this github repo https://github.com/modestyachts/ImageNetV2.
|
||||
- Download the matched-frequency dataset from https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz and extract it to `$DATA/imagenetv2/`.
|
||||
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenetv2/`.
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
imagenetv2/
|
||||
|–– imagenetv2-matched-frequency-format-val/
|
||||
|–– classnames.txt
|
||||
```
|
||||
|
||||
### ImageNet-Sketch
|
||||
- Download the dataset from https://github.com/HaohanWang/ImageNet-Sketch.
|
||||
- Extract the dataset to `$DATA/imagenet-sketch`.
|
||||
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-sketch/`.
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
imagenet-sketch/
|
||||
|–– images/ # contains 1,000 folders whose names have the format of n*
|
||||
|–– classnames.txt
|
||||
```
|
||||
|
||||
### ImageNet-A
|
||||
- Create a folder named `imagenet-adversarial/` under `$DATA`.
|
||||
- Download the dataset from https://github.com/hendrycks/natural-adv-examples and extract it to `$DATA/imagenet-adversarial/`.
|
||||
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-adversarial/`.
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
imagenet-adversarial/
|
||||
|–– imagenet-a/ # contains 200 folders whose names have the format of n*
|
||||
|–– classnames.txt
|
||||
```
|
||||
|
||||
### ImageNet-R
|
||||
- Create a folder named `imagenet-rendition/` under `$DATA`.
|
||||
- Download the dataset from https://github.com/hendrycks/imagenet-r and extract it to `$DATA/imagenet-rendition/`.
|
||||
- Copy `$DATA/imagenet/classnames.txt` to `$DATA/imagenet-rendition/`.
|
||||
|
||||
The directory structure should look like
|
||||
```
|
||||
imagenet-rendition/
|
||||
|–– imagenet-r/ # contains 200 folders whose names have the format of n*
|
||||
|–– classnames.txt
|
||||
```
|
||||
46
docs/INSTALL.md
Normal file
46
docs/INSTALL.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Installation
|
||||
|
||||
This codebase is tested on Ubuntu 20.04.2 LTS with python 3.8. Follow the below steps to create environment and install dependencies.
|
||||
|
||||
* Setup conda environment (recommended).
|
||||
```bash
|
||||
# Create a conda environment
|
||||
conda create -y -n dapt python=3.8
|
||||
|
||||
# Activate the environment
|
||||
conda activate dapt
|
||||
|
||||
# Install torch (requires version >= 1.8.1) and torchvision
|
||||
# Please refer to https://pytorch.org/ if you need a different cuda version
|
||||
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
|
||||
```
|
||||
|
||||
* Install dassl library.
|
||||
```bash
|
||||
# Instructions borrowed from https://github.com/KaiyangZhou/Dassl.pytorch#installation
|
||||
|
||||
# Clone this repo
|
||||
git clone https://github.com/KaiyangZhou/Dassl.pytorch.git
|
||||
cd Dassl.pytorch/
|
||||
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Install this library (no need to re-build if the source code is modified)
|
||||
python setup.py develop
|
||||
cd ..
|
||||
```
|
||||
|
||||
* Clone DAPT code repository and install requirements
|
||||
```bash
|
||||
# Clone MaPLe code base
|
||||
git clone https://github.com/muzairkhattak/multimodal-prompt-learning.git
|
||||
|
||||
cd DAPT/
|
||||
# Install requirements
|
||||
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Update setuptools package
|
||||
pip install setuptools==59.5.0
|
||||
```
|
||||
226
docs/RUN.md
Normal file
226
docs/RUN.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Training and Evaluation
|
||||
|
||||
We provide bash scripts in [scripts/](../scripts) for each prompting variant including MaPLe, vision, language and independent V-L prompting.
|
||||
Make sure to configure the dataset paths in environment variable `DATA` and run the commands from the main directory `DAPT/`.
|
||||
Below we provide training and evaluation instructions for MaPLe. The same instructions applies for all other variants including *Vision (VPT), Language and independent V-L prompting* .
|
||||
|
||||
|
||||
### Training time and compute
|
||||
We train MaPLe+DAPT on each dataset with a batch size of 4 using a **single** NVIDIA 3090 GPU.
|
||||
|
||||
## Quick Start
|
||||
Here we provide an illustrative script in `scripts/dapt/dp_few_shot.sh`, where you could directly bash it to run for any dataset you like, and the core command is like:
|
||||
```bash
|
||||
python train.py --root /home/ubuntu/Data_file/few_shot_data --seed 1 --trainer MaPLe --dataset-config-file configs/datasets/oxford_pets.yaml --config-file configs/trainers/MaPLe/vit_b16_t.yaml --output-dir output/DAPT --mode dapt-g
|
||||
DATASET.NUM_SHOTS
|
||||
1
|
||||
DATASET.SELECTION_RATIO
|
||||
1.0
|
||||
```
|
||||
Here `--mode` refers to whether GradCAM or SEEM-generated masks (`dapt-s`) are used for visual decoupling, and `DATASET.SELECTION_RATIO` refers to the scale of data selection, where you could set it to `[0,1]`
|
||||
|
||||
|
||||
## Normal Pipieline of DAPT
|
||||
|
||||
> Below are the original MaPLe scripts, you may refer to the follwing settings to perform other experiments:
|
||||
|
||||
|
||||
#### (1) Base-to-Novel class generalization setting
|
||||
The default training settings are provided in config file at `configs/trainers/MaPLe/vit_b16_c2_ep5_batch4_2ctx.yaml`. All hyper-parameters such as prompt length, prompt depth, etc., can be modified using this config file.
|
||||
|
||||
Below, we provide instructions to train MaPLe on imagenet.
|
||||
|
||||
|
||||
```bash
|
||||
# Other possible dataset values includes [caltech101, food101, dtd, ucf101, oxford_flowers, oxford_pets, fgvc_aircraft, stanford_cars, sun397, eurosat]
|
||||
|
||||
# seed=1
|
||||
# trains and evaluates on base classes
|
||||
bash scripts/dapt/base2new_train_maple.sh imagenet 1
|
||||
# evaluates on novel classes
|
||||
bash scripts/dapt/base2new_test_maple.sh imagenet 1
|
||||
|
||||
# seed=2
|
||||
# trains and evaluates on base classes
|
||||
bash scripts/dapt/base2new_train_maple.sh imagenet 2
|
||||
# evaluates on novel classes
|
||||
bash scripts/dapt/base2new_test_maple.sh imagenet 2
|
||||
|
||||
# seed=3
|
||||
# trains and evaluates on base classes
|
||||
bash scripts/dapt/base2new_train_maple.sh imagenet 3
|
||||
# evaluates on novel classes
|
||||
bash scripts/dapt/base2new_test_maple.sh imagenet 3
|
||||
```
|
||||
|
||||
#### Averaging results over 3 seeds:
|
||||
Once the above trainings and evaluations are completed, the `output/` directory should have the following structure:
|
||||
|
||||
```
|
||||
output
|
||||
|–– base2new/
|
||||
| |–– test_new/
|
||||
| | |–– imagenet/
|
||||
| | | |–– shots_16/
|
||||
| | | | |–– MaPLe/
|
||||
| | | | | |–– vit_b16_c2_ep5_batch4_2ctx/
|
||||
| | | | | | |–– seed1/
|
||||
| | | | | | |–– seed2/
|
||||
| | | | | | |–– seed3/
|
||||
| |–– train_base/
|
||||
| | |–– imagenet/
|
||||
| | | |–– shots_16/
|
||||
| | | | |–– MaPLe/
|
||||
| | | | | |–– vit_b16_c2_ep5_batch4_2ctx/
|
||||
| | | | | | |–– seed1/
|
||||
| | | | | | |–– seed2/
|
||||
| | | | | | |–– seed3/
|
||||
```
|
||||
|
||||
Now use the script `parse_test_res.py` and run the commands below to calculate the averaged results:
|
||||
```bash
|
||||
# prints averaged results for base classes
|
||||
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx
|
||||
# averaged results for novel classes
|
||||
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx --test-log
|
||||
```
|
||||
|
||||
The above steps can be repeated for other individual datasets.
|
||||
|
||||
#### Reproducing results using pre-trained weights for base-to-novel generalization setting
|
||||
|
||||
We show an example to reproduce results for imagenet. Follow the instructions below to reproduce results using our pre-trained model weights:
|
||||
* Download the zipped folder containing pre-trained weights for a single dataset from this [link](https://drive.google.com/drive/folders/1-tB6BUDBzs9CXTOJ7p5hM4Svq1tL_mGz?usp=sharing). Additionally we also provide the log files for both training and evaluation. After unzipping, the directory should look like this:
|
||||
|
||||
```
|
||||
imagenet
|
||||
|–– base/
|
||||
| |–– seed1/
|
||||
| |–– seed2/
|
||||
| |–– seed3/
|
||||
|–– novel/
|
||||
| |–– seed1/
|
||||
| |–– seed2/
|
||||
| |–– seed3/
|
||||
```
|
||||
|
||||
Now use the evaluation script `scripts/maple/reproduce_maple.sh` and run the commands below to calculate the averaged results:
|
||||
```bash
|
||||
# evaluate on base and novel classes for SEED1
|
||||
bash scripts/dapt/reproduce_maple.sh imagenet 1 /path/to/imagenet/weights/folder
|
||||
# evaluate on base and novel classes for SEED2
|
||||
bash scripts/dapt/reproduce_maple.sh imagenet 2 /path/to/imagenet/weights/folder
|
||||
# evaluate on base and novel classes for SEED3
|
||||
bash scripts/dapt/reproduce_maple.sh imagenet 3 /path/to/imagenet/weights/folder
|
||||
```
|
||||
|
||||
This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
|
||||
|
||||
```bash
|
||||
# prints averaged results for base classes
|
||||
python parse_test_res.py output/base2new/train_base/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx
|
||||
# averaged results for novel classes
|
||||
python parse_test_res.py output/base2new/test_new/imagenet/shots_16/MaPLe/vit_b16_c2_ep5_batch4_2ctx --test-log
|
||||
```
|
||||
|
||||
|
||||
#### (2) Cross-Dataset Transfer
|
||||
We provide instructions to train MaPLe on imageNet using all 1000 classes and then evaluating it directory on new downstream datasets.
|
||||
We provide cross-dataset config for MaPLe: `configs/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets.yaml`.
|
||||
* Firstly, train MaPLe on imagenet in few-shot manner (for all 3 seeds).
|
||||
|
||||
```bash
|
||||
# seed=1
|
||||
bash scripts/dapt/xd_train_maple.sh imagenet 1
|
||||
# seed=2
|
||||
bash scripts/dapt/xd_train_maple.sh imagenet 2
|
||||
# seed=3
|
||||
bash scripts/dapt/xd_train_maple.sh imagenet 3
|
||||
```
|
||||
|
||||
* Now evaluate imageNet model on downstream datasets.
|
||||
|
||||
```bash
|
||||
for SEED in 1 2 3
|
||||
do
|
||||
bash scripts/dapt/xd_test_maple.sh caltech101 ${SEED}
|
||||
bash scripts/dapt/xd_test_maple.sh oxford_pets ${SEED}
|
||||
bash scripts/dapt/xd_test_maple.sh stanford_cars ${SEED}
|
||||
done
|
||||
```
|
||||
|
||||
#### (3) Domain Generalization
|
||||
We use imagenet trained MaPLe model for domain generalization experiments. The steps are similar to above cross-dataset experiments, however, model is evaluated on imagenet variants.
|
||||
* Evaluate imageNet model on variants of imagenet (domain shift datasets).
|
||||
|
||||
```bash
|
||||
for SEED in 1 2 3
|
||||
do
|
||||
bash scripts/dapt/xd_test_maple.sh imagenetv2 ${SEED}
|
||||
bash scripts/dapt/xd_test_maple.sh imagenet_sketch ${SEED}
|
||||
bash scripts/dapt/xd_test_maple.sh imagenet_a ${SEED}
|
||||
bash scripts/dapt/xd_test_maple.sh imagenet_r ${SEED}
|
||||
done
|
||||
```
|
||||
|
||||
|
||||
You can obtain averaged results by using the script `parse_test_res.py` and following the similar steps as provided in base-to-novel generalization experiments.
|
||||
<br>
|
||||
|
||||
|
||||
#### Reproducing official results for cross-dataset and domain generalization setting
|
||||
|
||||
We provide the instructions below to reproduce domain-generalization and cross-datasets results using our pre-trained imagenet model weights for MaPLe:
|
||||
* Download the zipped folder containing pre-trained weights for imagenet from this [link](https://drive.google.com/drive/folders/1bmhvmNZc13WJ5U71qt0t8k91wyuoemVF?usp=sharing). Additionally, we also provide the log files for both training and evaluation. After unzipping, the directory should look like this:
|
||||
|
||||
```
|
||||
imagenet
|
||||
|–– seed1/
|
||||
|–– seed2/
|
||||
|–– seed3/
|
||||
```
|
||||
|
||||
Now use the evaluation script `scripts/maple/reproduce_maple_xd.sh` and run the commands below to calculate the averaged results:
|
||||
```bash
|
||||
# evaluate on given dataset for SEED1
|
||||
bash scripts/dapt/reproduce_maple_xd.sh food101 1 /path/to/imagenet/weights/folder
|
||||
# evaluate on given dataset for SEED2
|
||||
bash scripts/dapt/reproduce_maple_xd.sh food101 2 /path/to/imagenet/weights/folder
|
||||
# evaluate on given dataset for SEED3
|
||||
bash scripts/dapt/reproduce_maple_xd.sh food101 3 /path/to/imagenet/weights/folder
|
||||
```
|
||||
|
||||
This should evaluate and save the log files in `output/` directory. To obtain the averaged results, run:
|
||||
|
||||
```bash
|
||||
# prints averaged results for food101 dataset
|
||||
python parse_test_res.py output/evaluation/MaPLe/vit_b16_c2_ep5_batch4_2ctx_cross_datasets_16shots/food101 --test-log
|
||||
```
|
||||
|
||||
|
||||
#### Training and Evaluating other variants
|
||||
|
||||
For other variants including vision, language and independent V-L prompting techniques, we provide their corresponding configs and scripts as follows.
|
||||
|
||||
```
|
||||
configs
|
||||
|–– datasets/
|
||||
|–– trainers/
|
||||
| |–– CoCoOp/
|
||||
| |–– CoOp/
|
||||
| |–– MaPLe/
|
||||
| |–– IVLP/
|
||||
| |–– VPT/
|
||||
```
|
||||
|
||||
```
|
||||
scripts
|
||||
|–– cocoop/
|
||||
|–– coop/
|
||||
|–– language-prompting/
|
||||
|–– maple/
|
||||
|–– independent-vlp/
|
||||
```
|
||||
|
||||
Please use the corresponding config and script files and follow the same instructions as provided for MaPLe in order to train and evaluate the other variants. Same instructions can be followed to reproduce results of other variants using provided pretrained weights.
|
||||
This repository also supports using official [CoOp](CoOp.md) and [Co-CoOp](Co-CoOp.md) configs and models.
|
||||
BIN
docs/fig.png
Normal file
BIN
docs/fig.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 147 KiB |
BIN
docs/main_fig.png
Normal file
BIN
docs/main_fig.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 331 KiB |
Reference in New Issue
Block a user