The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification

Dante Wasmuht, Otto Brookes, Maximilian Schall, Pablo Palencia, Christopher Beirne, Tilo Burghardt, Majid Mirmehdi, Hjalmar Kühl, Mimi Arandjelovic, Sam Pottie, Peter Bermant, Brandon Asheim, Yi Jin Toh, Adam Elzinga, Jason Allan Holmberg, Andrew Whitworth, Eleanor Flatt, Laura Gustafson, Chaitanya Ryali, Yuan-Ting Hu, Baishan Guo, Andrew Westbury, Kate Saenko, Didac Suris; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 21679-21689

Abstract


Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient temporal and geographical diversity - leaving no suitable benchmark for training general-purpose MAT models applicable to wild animals. To address this, we introduce SA-FARI, the largest open-source MAT dataset for wild animals. It comprises 11,609 camera trap videos collected over 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories. Each video is exhaustively annotated culminating in 46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels. Alongside the task-specific annotations, we publish anonymized camera trap locations for each video. Finally, we present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal prompts. We also compare against vision only methods developed specifically for wildlife analysis. SA-FARI is the first large-scale dataset to combine high species diversity, multi-region coverage, and high-quality spatio-temporal annotations, offering a new foundation for advancing multi-animal tracking in the wild. The dataset is available at conservationxlabs.com/SA-FARI.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wasmuht_2026_CVPR, author = {Wasmuht, Dante and Brookes, Otto and Schall, Maximilian and Palencia, Pablo and Beirne, Christopher and Burghardt, Tilo and Mirmehdi, Majid and K\"uhl, Hjalmar and Arandjelovic, Mimi and Pottie, Sam and Bermant, Peter and Asheim, Brandon and Toh, Yi Jin and Elzinga, Adam and Holmberg, Jason Allan and Whitworth, Andrew and Flatt, Eleanor and Gustafson, Laura and Ryali, Chaitanya and Hu, Yuan-Ting and Guo, Baishan and Westbury, Andrew and Saenko, Kate and Suris, Didac}, title = {The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {21679-21689} }