Who's Waldo – Dataset

Downloading the dataset (334 GB)

Once you've recieved access, you should install the huggingface-cli and download our dataset with the following script.


  mkdir whoswaldo && cd whoswaldo
  huggingface-cli login  # if not already logged in

  huggingface-cli download --repo-type dataset "apoorvkh/whoswaldo" --local-dir .
  
  for f in archives/whoswaldo_data_*.tar; do
      echo "Extracting $f..."
      tar -xf "$f"
  done
  
  rm -r archives  # if extraction was successful


whoswaldo
├── data
│   ├── 000000
│   │   ├── caption.txt
│   │   ├── coreferences.json
│   │   ├── detections.json
│   │   ├── ground_truth.json
│   │   ├── image.jpg
│   │   └── license.json
│   ...
│   └── 271746/
└── splits
    ├── test.json
    ├── train.txt
    └── val.json

train.txt: # Line-seperated list of image ids in the training set
{val,test}.json: { "102990" : [2, 1, 0, 3] }  # image id : ground_truth.json keys

During evaluation, we compute accuracy as an average over independent ground truth links (i.e. over each image—link pair). In other words, you should not compute accuracy per image, rather over all ground truth links.

Please refer to "Dataset Size and Splits" in Section 4 of our paper to learn more about how our splits were generated.

whoswaldo/data/002629

image.jpg : 1874 x 1500 px

caption.txt : "Portola Valley, Calif., native, Maj. Gen. [NAME], Commanding general of the Multi-National Division-Baghdad briefs the new U.S. Ambassador to Iraq, [NAME] (center), on the day's plan to take a driving tour of Haifa Street and a walking tour of the Sayliah Market in central Baghdad June 26."

coreferences.json : [ [[153, 159]], [[42, 48]] ]  # clusters of co-referring name tokens

detections.json : [{
    "keypoints" : [(x, y, score), ... ]
    "bbox" : [x1, y1, x2, y2, score],
}, ... ]  # bounding boxes, COCO whole body landmarks, relative to image dimensions

ground_truth.json : { "0" : 2 }  # coreference idx : detection idx

licenses.json : {
    "commons_url": https://commons.wikimedia.org/?curid=39335624,
    "license": "Public domain"
}   # "license_url" and "artist" keys are also often present

As Who's Waldo includes textual data from a real-world (i.e. messy) data source, it is important to encode all strings to "UTF-8" to properly handle special characters. We recommend using the following Python 3 code to read files:


import json

with open('path/to/file.json', 'r', encoding='utf-8') as file:
    data = json.load(file)

Dataset License

The images in our dataset are provided by Wikimedia Commons under various free licenses. These licenses permit the use, study, derivation, and redistribution of these images—sometimes with restrictions, e.g. requiring attribution and with copyleft. We provide source links, full license text, and attribution (when available) for all images, make no modifications to any image, and release these images under their original licenses. The associated captions are provided as a part of unstructured text in Wikimedia Commons, with rights to the original writers under the CC BY-SA 3.0 license. We modify these (as specified in our paper) and release such derivatives under the same license. We provide the rest of our dataset (i.e. detections, coreferences, and ground truth correspondences) under a CC BY-NC-SA 4.0 license.

Ethical Statement

People-centric datasets pose ethical challenges. For example, ImageNet has been scrutinized based on issues inherited from the “person” category in WordNet. Our task and dataset were created with careful attention to ethical questions, which we encountered throughout our work. Access to our dataset is provided for research purposes only and with restrictions on redistribution. Additionally, as we mask all names in captions, our dataset cannot be easily repurposed for unintended tasks, such as identification of people by name. Due to biases in our data source, we do not consider the data appropriate for developing non-research systems without further processing or augmentation. More details on distribution and intended uses are provided in a supplemental datasheet (movtivated by Datasheets for Datasets).

Datasheet for Who's Waldo

Citation

@InProceedings{Cui_2021_ICCV,
    author    = {Cui, Yuqing and Khandelwal, Apoorv and Artzi, Yoav and Snavely, Noah and Averbuch-Elor, Hadar},
    title     = {Who's Waldo? Linking People Across Text and Images},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {1374-1384}
}

Directory Structure

Dataset Splits

Example Directory

Reading JSON Files