Still frames from the QMUL GRID CCTV pedestrian surveillance dataset

QMUL underGround Re-IDentification (GRID) Dataset

GRID is a dataset of 256 pedestrian image pairs captured from the London underground CCTV system and used for research and development of person re-identification surveillance algorithm. The images were captured from 8 cameras in a "busy underground station" in London. An additional 775 images are included to act as distractors. Notably, the CCTV video footage was made available to the researchers by the UK Ministry of Defence.

"We would like to thank the UK MOD who have made the video footage available to the Queen Mary University of London." 1

The dataset website states that it "is intended for research purposes only and as such cannot be used commercially." However, publicly available available research papers show the GRID dataset was eventually used in at least 2 projects affiliated with commercial organizations: Microsoft Research Asia and by UBTECH Robotics, a Shenzhen based household robotics company.

The disparity between the dataset's origin in the London Underground and eventual application in research affiliated with Microsoft Research Asia and a household robotics company in China illustrate the impossibility of knowing how and where biometric data will be used.

Who used GRID?

The bar chart below presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.

Information Supply Chain

To help understand how GRID has been used around the world by commercial, military, and academic organizations; existing publicly available research citing QMUL UnderGround Re-IDentification Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations.

Dataset Citations

The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to trainĀ or test machine learning algorithms. If you use our data, please cite our work.

Cite the dataset author's work

If you reference or use any data from GRID cite the author's work:

    author = "Liu, C. and Gong, S. and Loy, Chen Change",
    title = "On-the-fly feature importance mining for person re-identification",
    journal = "Pattern Recognit.",
    year = "2014",
    volume = "47",
    pages = "1602-1615"
    author = "Gong, Shaogang",
    title = "Person re-identification",
    publisher = "Springer",
    year = "2014",
    address = "London",
    isbn = "978-1-4471-6296-4"
    author = "Liu, C. and Gong, S. and Loy, Chen Change and Lin, X.",
    title = "Evaluating Feature Importance for Re-identification",
    booktitle = "Person Re-Identification",
    year = "2014"

Cite Our Work

If you reference this research project or use any data from the MegaPixels project, cite our research as follows:

  author = {Harvey, Adam. LaPlace, Jules.},
  title = {MegaPixels: Origins and Endpoints of Datasets Created "In The Wild"},
  year = 2019-2020,
  url = {},
  urldate = {2020-09-05}