MegaPixels
WILDTRACK
Images from the WILDTRACK dataset were "acquired in a non-actor but realistic environment" raising questions about non-consensual data collection. Original image from WILDTRACK dataset camera 1 by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)

WILDTRACK Dataset

WILDTRACK is a surveillance video dataset of students recorded outside the ETH university main building in Zurich. The videos were acquired in an "unscripted", "non-actor but realistic environment". 1 2 In total, seven 35 minute videos containing thousands of students were surreptitiously recorded and made publicly available for any type of research. According to this analysis, informed consent was not obtained by the majority of students.

From these videos over 1,000 students, faculty, and bystanders were then annotated through Mechanical Turk to mark their position in each frame. Due to the large number of students in the videos, "annotating one frame [took] on average 10 minutes for a trained person." 1 These annotations were then utilized for the research papers The WILDTRACK Multi-Camera Person Dataset and WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection on multi-camera pedestrian detection with applications to security, surveillance, remote person identification, robotics, autonomous driving, and crowdsourcing. 2 5

UAV Aerial Surveillance

Despite the original intentions of the dataset for research in Switzerland and within the EU, two years after the WILDTRACK researchers publicly released their dataset, it appeared in a research paper on UAV surveillance at the International Conference on Systems and Informatics.

In the paper Human Detection Under UAV: an Improved Faster R-CNN Approach two authors, affiliated with Nanjing University of Aeronautics and the University of Leicester, proposed a new method for detecting and tracking small targets from UAV surveillance feeds with applications for "conducting aerial surveillance." 3 Figures published in their research paper confirm that video recordings of students at ETH Zurich were used for research project and development of foreign UAV surveillance technologies.

To be clear, there is no evidence showing that WILDTRACK images were explicitly used for any foreign military related applications. However the affiliation with Nanjing University of Aeronautics and Astronautics (NUAA) and other research papers funding by same grant, "Electro-optic Control Laboratory and Aviation Science Foundation Project (No 20175152036)", do leave open the possibility of the WILDTRACK dataset contributing towards such research. NUAA has produced over 40 unmanned aerial vehicles (UAVs) for China, most of which are small or micro sized UAVs with consumer or industrial surveillance capabilities. However, a limited number of these were made specifically for military reconnaissance. Additionally, a paper funded by the same project number (20175152036) and author mentions applications to "reconnaissance and other tasks." 4. Collectively these hints should be taken with caution, as they are merely illustrative of the larger issue with transational data flows that have been documented in other datasets on this site. Once data is shared it can no longer be controlled and often does end up being used for military research, though sometimes indirectly. This is equally true for UAV surveillance technologies outside China as in the Western world.

Retail Surveillance

Another example of unexpected use of the WILDTRACK dataset can be seen in research paper published the following year in 2019. In Priming Deep Pedestrian Detection with Geometric Context, researchers affiliated with Microsoft AI and Wormpex AI used the WILDTRACK student images for developing person detection algorithms with future applications "to multi-view tracking and people re-identification." 5

Because both Wormpex AI and Microsoft AI develop computer vision technologies for commercial applications, their use of the WILDTRACK dataset in research project could be considered as commercial reserach. Wormpex AI, a lesser known name than Microsoft, develops computer vision for retail and warehouse surveillance. Cloaking commercial research in seemingly academic styled papers is a common theme among research projects reviewed on this site. It would be more accurate to consider these papers as industrial research since there is no apparent connection to any academic or educational institution for either of the authors, though the Wormpex AI author was partly supported by a NSFC grant.

Informed Consent?

Recording videos "in a non-actor but realistic environment" with "unscripted dense groups of pedestrians standing and walking" brings into question whether students were not merely "unscripted" but also uninformed of their inclusion in a computer vision training dataset. 1 2

In the seven 35 minute videos, it is evident that the vast majority of students were either entirely unaware of the cameras or unaware of the research experiment. Even though the researchers did place notices underneath the cameras, the notices were only apparent for people who walked up to the cameras to investigate further. Students recorded into the WILDTRACK dataset could not have provided informed consent without at least reading these notices, and the vast majority did not even see the notice.

This type of "forced consent" was also seen in the Duke MTMC dataset where students at Duke University were similarily co-opted into a pedestrian detection dataset while walking to classes on campus. After an internal investigation at Duke around the ethics of the dataset, the researcher responsbile for creating Duke MTMC eventually made a public apology and removed public access to his dataset.

Both the Duke MTMC and WILDTRACK student dataset raise new questions about the ethics of collecting data in public and especially at universities. To be recorded into one of these datasets means that the students will forever be used as training data with no possible recourse for redaction in the countless copies already downloaded around the world.

Unless more meaningful restrictions and ethical frameworks are created for the collection and consent of artificial intelligence training data in public spaces, a public space in Zurich affords no more privacy protection than a public place in China or the United States.

Dates and Metadata

WILDTRACK DATASET

 WILDTRACK camera 1. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
WILDTRACK camera 1. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
 WILDTRACK camera 2. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
WILDTRACK camera 2. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
 WILDTRACK camera 3. Original image from WILDTRACK dataset by Chavdarova et al. Faces redacted by MegaPixels
WILDTRACK camera 3. Original image from WILDTRACK dataset by Chavdarova et al. Faces redacted by MegaPixels
 WILDTRACK camera 4. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
WILDTRACK camera 4. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
 WILDTRACK camera 5. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
WILDTRACK camera 5. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
 WILDTRACK camera 6. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)
WILDTRACK camera 6. Original image from WILDTRACK dataset by Chavdarova et al. (2017). Data visualization by Adam Harvey / megapixels.cc (2020)

Usage of these images in press or other publications is only permitted with full attribution to all authors as displayed in the image captions. Scroll down for further information on the WILDTRACK authors' original citation.

Who used WILDTRACK?

The bar chart below presents a ranking of the top countries where dataset citations originated. Mouse over individual columns to see yearly totals. These charts show at most the top 10 countries.

Information Supply Chain

To help understand how WILDTRACK has been used around the world by commercial, military, and academic organizations; existing publicly available research citing The WILDTRACK Seven-Camera HD Dataset was collected, verified, and geocoded to show how AI training data has proliferated around the world. Click on the markers to reveal research projects at that location.

Citation data is collected using SemanticScholar.org then dataset usage verified and geolocated. Citations are used to provide an estimated overview of how and where images were used based on institutional affiliations.

Dataset Citations

The dataset citations used in the visualizations were collected from Semantic Scholar, a website which aggregates and indexes research papers. Each citation was geocoded using names of institutions found in the PDF front matter, or as listed on other resources. These papers have been manually verified to show that researchers downloaded and used the dataset to train or test machine learning algorithms. If you use our data, please cite our work.

Supplementary Information

Cite Our Work

If you find this analysis helpful, please cite our work:

@online{megapixels,
  author = {Harvey, Adam. LaPlace, Jules.},
  title = {MegaPixels: Origins, Ethics, and Privacy Implications of Publicly Available Face Recognition Image Datasets},
  year = 2019,
  url = {https://megapixels.cc/},
  urldate = {2019-04-18}
}

Cite the Original Author's Work

If you find the WILDTRACK dataset useful or reference it in your work, please cite the author's original work as:

@article{Chavdarova2017TheWM,
 title={The WILDTRACK Multi-Camera Person Dataset},
 author={Tatjana Chavdarova and Pierre Baqu{\'e} and St{\'e}phane Bouquet and Andrii Maksai and Cijo Jose and Louis Lettry and Pascal Fua and Luc Van Gool and François Fleuret},
 journal={ArXiv},
 year={2017},
 volume={abs/1707.09299}
}

References

  • 1 abcdefgChavdarova, T., Baqué, P., Bouquet, S., Maksai, A., Jose, C., Lettry, L., Fua, P., Gool, L.V., & Fleuret, F. (2017). The WILDTRACK Multi-Camera Person Dataset. ArXiv, abs/1707.09299.
  • 2 abcChavdarova, Tatjana et al. “WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018): 5030-5039.
  • 3 aZhu, Hanshan et al. “Human Detection Under UAV: an Improved Faster R-CNN Approach.” 2018 5th International Conference on Systems and Informatics (ICSAI) (2018): 367-372.
  • 4 aXiang, C.L., Shi, H., Li, N., Ding, M., & Zhou, H. (2019). Pedestrian Detection Under Unmanned Aerial Vehicle an Improved Single-Stage Detector Based on RetinaNet. 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 1-6.
  • 5 abChakraborty, I., & Hua, G. (2019). Priming Deep Pedestrian Detection with Geometric Context. 2019 International Conference on Robotics and Automation (ICRA), 5516-5522.