Accidental Turntables: Learning 3D Pose by Watching Objects Turn

Zezhou Cheng, Matheus Gadelha, Subhransu Maji; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 2113-2122

Abstract


We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn. Such videos are prevalent in practice (e.g. cars in roundabouts, airplanes near runways) and easy to collect. We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provide surprisingly accurate relative 3D pose estimation on such videos. We propose a multi-stage training scheme that first learns a canonical pose across a collection of videos and then supervises a model for single-view pose estimation. The proposed technique achieves competitive performance with respect to the existing state-of-the-art on standard benchmarks for 3D pose estimation, without requiring any pose labels during training. We also contribute an Accidental Turntables Dataset, containing a challenging set of 41,212 images of cars in cluttered backgrounds, motion blur, and illumination changes that serves as a benchmark for 3D pose estimation.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Cheng_2023_ICCV, author = {Cheng, Zezhou and Gadelha, Matheus and Maji, Subhransu}, title = {Accidental Turntables: Learning 3D Pose by Watching Objects Turn}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {2113-2122} }