Towards Human-Like Invariance: Self-Supervised Learning with Feature-Level Rotation Alignment

Sangjun Han, Woojin Cheong, Chang Hoon Song, Myungjoo Kang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 4761-4770

Abstract


Self-supervised learning (SSL) has made significant progress through joint-embedding methods that learn invariant representations across transformed views. However, achieving robustness to image rotations remains challenging, as naively incorporating rotation augmentations often degrades performance. Inspired by cognitive studies on human mental rotation, we propose FRTAlign, an SSL framework with feature-level alignment that explicitly mitigates rotation-induced shifts in the representation space. FRTAlign introduces a unified module that learns rotation-equivariant feature transformations and combines them with a lightweight rotation predictor to produce human-inspired rotation-invariant representations. This design enables the model to preserve performance on non-rotated samples while significantly improving robustness to rotated inputs. Through extensive experiments on STL10, ImageNet100, and EMNIST, we demonstrate that FRTAlign consistently outperforms baselines in both standard and rotated settings. Further analysis reveals that our method mitigates distributional shifts caused by rotation and is robust to architectural and hyperparameter variations.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Han_2025_ICCV, author = {Han, Sangjun and Cheong, Woojin and Song, Chang Hoon and Kang, Myungjoo}, title = {Towards Human-Like Invariance: Self-Supervised Learning with Feature-Level Rotation Alignment}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {4761-4770} }