-
[pdf]
[supp]
[bibtex]@InProceedings{Han_2025_ICCV, author = {Han, Sangjun and Cheong, Woojin and Song, Chang Hoon and Kang, Myungjoo}, title = {Towards Human-Like Invariance: Self-Supervised Learning with Feature-Level Rotation Alignment}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {4761-4770} }
Towards Human-Like Invariance: Self-Supervised Learning with Feature-Level Rotation Alignment
Abstract
Self-supervised learning (SSL) has made significant progress through joint-embedding methods that learn invariant representations across transformed views. However, achieving robustness to image rotations remains challenging, as naively incorporating rotation augmentations often degrades performance. Inspired by cognitive studies on human mental rotation, we propose FRTAlign, an SSL framework with feature-level alignment that explicitly mitigates rotation-induced shifts in the representation space. FRTAlign introduces a unified module that learns rotation-equivariant feature transformations and combines them with a lightweight rotation predictor to produce human-inspired rotation-invariant representations. This design enables the model to preserve performance on non-rotated samples while significantly improving robustness to rotated inputs. Through extensive experiments on STL10, ImageNet100, and EMNIST, we demonstrate that FRTAlign consistently outperforms baselines in both standard and rotated settings. Further analysis reveals that our method mitigates distributional shifts caused by rotation and is robust to architectural and hyperparameter variations.
Related Material
