9.3 KiB
How to install datasets
Acknowledgement: This readme file for installing datasets has been borrowed directly from CoOp's official repository.
We suggest putting all datasets under the same folder (say $DATA) to ease management and following the instructions below to organize datasets to avoid modifying the source code. The file structure looks like
$DATA/
|–– imagenet/
|–– caltech-101/
|–– oxford_pets/
|–– stanford_cars/
If you have some datasets already installed somewhere else, you can create symbolic links in $DATA/dataset_name that point to the original data to avoid duplicate download.
Datasets list:
- ImageNet
- Caltech101
- OxfordPets
- StanfordCars
- Flowers102
- Food101
- FGVCAircraft
- SUN397
- DTD
- EuroSAT
- UCF101
- ImageNetV2
- ImageNet-Sketch
- ImageNet-A
- ImageNet-R
The instructions to prepare each dataset are detailed below. To ensure reproducibility and fair comparison for future work, we provide fixed train/val/test splits for all datasets except ImageNet where the validation set is used as test set. The fixed splits are either from the original datasets (if available) or created by us.
ImageNet
- Create a folder named
imagenet/under$DATA. - Create
images/underimagenet/. - Download the dataset from the official website and extract the training and validation sets to
$DATA/imagenet/images. The directory structure should look like
imagenet/
|–– images/
| |–– train/ # contains 1,000 folders like n01440764, n01443537, etc.
| |–– val/
- If you had downloaded the ImageNet dataset before, you can create symbolic links to map the training and validation sets to
$DATA/imagenet/images. - Download the
classnames.txtto$DATA/imagenet/from this link. The class names are copied from CLIP.
Caltech101
- Create a folder named
caltech-101/under$DATA. - Download
101_ObjectCategories.tar.gzfrom http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz and extract the file under$DATA/caltech-101. - Download
split_zhou_Caltech101.jsonfrom this link and put it under$DATA/caltech-101.
The directory structure should look like
caltech-101/
|–– 101_ObjectCategories/
|–– split_zhou_Caltech101.json
OxfordPets
- Create a folder named
oxford_pets/under$DATA. - Download the images from https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz.
- Download the annotations from https://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz.
- Download
split_zhou_OxfordPets.jsonfrom this link.
The directory structure should look like
oxford_pets/
|–– images/
|–– annotations/
|–– split_zhou_OxfordPets.json
StanfordCars
- Create a folder named
stanford_cars/under$DATA. - Download the train images http://ai.stanford.edu/~jkrause/car196/cars_train.tgz.
- Download the test images http://ai.stanford.edu/~jkrause/car196/cars_test.tgz.
- Download the train labels https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz.
- Download the test labels http://ai.stanford.edu/~jkrause/car196/cars_test_annos_withlabels.mat.
- Download
split_zhou_StanfordCars.jsonfrom this link.
The directory structure should look like
stanford_cars/
|–– cars_test\
|–– cars_test_annos_withlabels.mat
|–– cars_train\
|–– devkit\
|–– split_zhou_StanfordCars.json
Flowers102
- Create a folder named
oxford_flowers/under$DATA. - Download the images and labels from https://www.robots.ox.ac.uk/~vgg/data/flowers/102/102flowers.tgz and https://www.robots.ox.ac.uk/~vgg/data/flowers/102/imagelabels.mat respectively.
- Download
cat_to_name.jsonfrom here. - Download
split_zhou_OxfordFlowers.jsonfrom here.
The directory structure should look like
oxford_flowers/
|–– cat_to_name.json
|–– imagelabels.mat
|–– jpg/
|–– split_zhou_OxfordFlowers.json
Food101
- Download the dataset from https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/ and extract the file
food-101.tar.gzunder$DATA, resulting in a folder named$DATA/food-101/. - Download
split_zhou_Food101.jsonfrom here.
The directory structure should look like
food-101/
|–– images/
|–– license_agreement.txt
|–– meta/
|–– README.txt
|–– split_zhou_Food101.json
FGVCAircraft
- Download the data from https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/archives/fgvc-aircraft-2013b.tar.gz.
- Extract
fgvc-aircraft-2013b.tar.gzand keep onlydata/. - Move
data/to$DATAand rename the folder tofgvc_aircraft/.
The directory structure should look like
fgvc_aircraft/
|–– images/
|–– ... # a bunch of .txt files
SUN397
- Create a folder named
sun397/under$DATA. - Download the images http://vision.princeton.edu/projects/2010/SUN/SUN397.tar.gz.
- Download the partitions https://vision.princeton.edu/projects/2010/SUN/download/Partitions.zip.
- Extract these files under
$DATA/sun397/. - Download
split_zhou_SUN397.jsonfrom this link.
The directory structure should look like
sun397/
|–– SUN397/
|–– split_zhou_SUN397.json
|–– ... # a bunch of .txt files
DTD
- Download the dataset from https://www.robots.ox.ac.uk/~vgg/data/dtd/download/dtd-r1.0.1.tar.gz and extract it to
$DATA. This should lead to$DATA/dtd/. - Download
split_zhou_DescribableTextures.jsonfrom this link.
The directory structure should look like
dtd/
|–– images/
|–– imdb/
|–– labels/
|–– split_zhou_DescribableTextures.json
EuroSAT
- Create a folder named
eurosat/under$DATA. - Download the dataset from http://madm.dfki.de/files/sentinel/EuroSAT.zip and extract it to
$DATA/eurosat/. - Download
split_zhou_EuroSAT.jsonfrom here.
The directory structure should look like
eurosat/
|–– 2750/
|–– split_zhou_EuroSAT.json
UCF101
- Create a folder named
ucf101/under$DATA. - Download the zip file
UCF-101-midframes.zipfrom here and extract it to$DATA/ucf101/. This zip file contains the extracted middle video frames. - Download
split_zhou_UCF101.jsonfrom this link.
The directory structure should look like
ucf101/
|–– UCF-101-midframes/
|–– split_zhou_UCF101.json
ImageNetV2
- Create a folder named
imagenetv2/under$DATA. - Go to this github repo https://github.com/modestyachts/ImageNetV2.
- Download the matched-frequency dataset from https://s3-us-west-2.amazonaws.com/imagenetv2public/imagenetv2-matched-frequency.tar.gz and extract it to
$DATA/imagenetv2/. - Copy
$DATA/imagenet/classnames.txtto$DATA/imagenetv2/.
The directory structure should look like
imagenetv2/
|–– imagenetv2-matched-frequency-format-val/
|–– classnames.txt
ImageNet-Sketch
- Download the dataset from https://github.com/HaohanWang/ImageNet-Sketch.
- Extract the dataset to
$DATA/imagenet-sketch. - Copy
$DATA/imagenet/classnames.txtto$DATA/imagenet-sketch/.
The directory structure should look like
imagenet-sketch/
|–– images/ # contains 1,000 folders whose names have the format of n*
|–– classnames.txt
ImageNet-A
- Create a folder named
imagenet-adversarial/under$DATA. - Download the dataset from https://github.com/hendrycks/natural-adv-examples and extract it to
$DATA/imagenet-adversarial/. - Copy
$DATA/imagenet/classnames.txtto$DATA/imagenet-adversarial/.
The directory structure should look like
imagenet-adversarial/
|–– imagenet-a/ # contains 200 folders whose names have the format of n*
|–– classnames.txt
ImageNet-R
- Create a folder named
imagenet-rendition/under$DATA. - Download the dataset from https://github.com/hendrycks/imagenet-r and extract it to
$DATA/imagenet-rendition/. - Copy
$DATA/imagenet/classnames.txtto$DATA/imagenet-rendition/.
The directory structure should look like
imagenet-rendition/
|–– imagenet-r/ # contains 200 folders whose names have the format of n*
|–– classnames.txt