Pixel-Level Contrastive Learning of Driving Videos With Optical Flow

Tomoya Takahashi, Shingo Yashima, Kohta Ishikawa, Ikuro Sato, Rio Yokota; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 3180-3187

Abstract


Recognition of the external environment through cameras and LIDAR play a central role in the safety of autonomous driving. The accuracy of such recognition has drastically improved with the advent of deep learning, but is still insufficient for fully autonomous driving. Even though it is possible to collect large amounts of driving data [1, 11, 14, 18], the cost to annotate such data is prohibitive. Recent efforts have focused on self-supervised learning, which does not require annotated data. In this work, we improve the accuracy of self-supervised learning on driving data by combing pixel-wise contrastive learning (PixPro) with optical flow. Unlike most self-supervised methods, PixPro is trained on pixel-level pretext tasks, which yields better accuracy on downstream tasks requiring dense pixel predictions. However, PixPro does not consider the large change in scale of objects, commonly found in driving data. We show that by incorporating optical flow into the pixel-wise contrastive pre-training, we can improve the performance of downstream tasks such as semantic segmentation on CityScapes. We found that using the optical flow between temporarily distant frames can help learn the invariance between large scale changes, which allows us to exceed the performance of the original PixPro method. Our code is released upon acceptance.

Related Material


[pdf]
[bibtex]
@InProceedings{Takahashi_2023_CVPR, author = {Takahashi, Tomoya and Yashima, Shingo and Ishikawa, Kohta and Sato, Ikuro and Yokota, Rio}, title = {Pixel-Level Contrastive Learning of Driving Videos With Optical Flow}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {3180-3187} }