-
[pdf]
[supp]
[arXiv]
[bibtex]@InProceedings{Crall_2026_WACV, author = {Crall, Jonathan}, title = {ScatSpotter: A Dog Poop Detection Dataset}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {March}, year = {2026}, pages = {549-558} }
ScatSpotter: A Dog Poop Detection Dataset
Abstract
Small, amorphous waste objects such as biological droppings and microtrash can be difficult to see, especially in cluttered scenes, yet they matter for environmental cleanliness, public health, and autonomous cleanup. We introduce "ScatSpotter": a new dataset of phone images annotated with polygons around dog feces, collected to train and study object detection and segmentation systems for small potentially camouflaged outdoor waste. We gathered data in mostly urban environments, using a "before/after/negative" (BAN) protocol: for a given location, we capture an image with the object present, an image from the same viewpoint after removal, and a nearby negative scene that often contains visually similar confusers. Image collection began in late 2020. This paper focuses on two dataset checkpoints from 2025 and 2024. The dataset contains over 9000 full-resolution images and 6000 polygon annotations. Of the author-captured images we held out 691 for validation and used the rest to train. Via community participation we obtained a 121-image test set that, while small, is independent from author-collected images and provides some generalization confidence across photographers, devices, and locations. Due to its limited size, we report both validation and test results. We explore the difficulty of the dataset using off-the-shelf VIT, MaskRCNN, YOLO-v9, and DINO-v2 models. Zero-shot DINO performs poorly, indicating limited foundational-model coverage of this category. Tuned DINO is the best model with a box-level average precision of 0.69 on a 691-image validation set and 0.70 on the test set. These results establish strong baselines and quantify the remaining difficulty of detecting small, camouflaged waste objects. To support open access to models and data (CC-BY 4.0 license), we compare centralized and decentralized distribution mechanisms and discuss trade-offs for sharing scientific data. Code for experiments and project details are hosted on GitHub.
Related Material
