Robust Object Detection via Instance-Level Temporal Cycle Confusion

Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, Trevor Darrell; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 9143-9152


Building reliable object detectors that are robust to domain shifts, such as various changes in context, viewpoint, and object appearances, is critical for real-world applications. In this work, we study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf), which operates on the region features of the object detectors. For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision. CycConf encourages the object detector to explore invariant structures across instances under various motions, which leads to improved model robustness in unseen domains at test time. We observe consistent out-of-domain performance improvements when training object detectors in tandem with self-supervised tasks on various domain adaptation benchmarks with static images (Cityscapes, Foggy Cityscapes, Sim10K) and large-scale video datasets (BDD100K and Waymo open data). The code and models are released at

Related Material

[pdf] [supp] [arXiv]
@InProceedings{Wang_2021_ICCV, author = {Wang, Xin and Huang, Thomas E. and Liu, Benlin and Yu, Fisher and Wang, Xiaolong and Gonzalez, Joseph E. and Darrell, Trevor}, title = {Robust Object Detection via Instance-Level Temporal Cycle Confusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2021}, pages = {9143-9152} }