DigiDogs: Single-View 3D Pose Estimation of Dogs Using Synthetic Training Data

Moira Shooter, Charles Malleson, Adrian Hilton; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 80-89

Abstract


We propose an approach to automatically extract the 3D pose of dogs from single-view RGB images using only synthetic data for training. Due to the lack of suitable 3D datasets, previous approaches have predominantly relied on 2D weakly supervised methods. While these approaches demonstrate promising results, some depth ambiguities still persist indicating the neural network's limited understanding of the 3D environment. To tackle these depth ambiguities, we generate a synthetic 3D pose dataset (DigiDogs) by modifying the popular video game Grand Theft Auto. Additionally, to address the domain gap between synthetic and real data, we harness the power of Meta's foundation model DINOv2 due to its generalisation capability and fine-tune it for the application of 3D pose estimation. Through a combination of qualitative and quantitative analyses, we demonstrate the viability of estimating the 3D pose of dogs from real-world images using synthetic training data.

Related Material


[pdf]
[bibtex]
@InProceedings{Shooter_2024_WACV, author = {Shooter, Moira and Malleson, Charles and Hilton, Adrian}, title = {DigiDogs: Single-View 3D Pose Estimation of Dogs Using Synthetic Training Data}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2024}, pages = {80-89} }