-
[pdf]
[supp]
[bibtex]@InProceedings{Xiao_2025_CVPR, author = {Xiao, Yunfeng and Bai, Xiaowei and Chen, Baojun and Su, Hao and He, Hao and Xie, Liang and Yin, Erwei}, title = {De{\textasciicircum}2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {3091-3100} }
De^2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation
Abstract
3D Gaze estimation is a challenging task due to two main issues. First, existing methods focus on analyzing dense features (e.g., large pixel regions), which are sensitive to local noise (e.g., light spots, blurs) and result in increased computational complexity. Second, an eyeball model can correspond multiple gaze directions, and the entangled representation between gazes and models increases the learning difficulty. To address these issues, we propose De\textsuperscript 2Gaze , a lightweight and accurate model-aware 3D gaze estimation method. In De\textsuperscript 2 Gaze, we introduce two key innovations for deformable and decoupled representation learning. Specifically, first, we propose a deformable sparse attention mechanism that can adapt sparse sampling points to attention areas to avoid local noise influences. Second, we propose a spatial decoupling network with a dual-branch decoding architecture to disentangle invariant (e.g., eyeball radius, position) and variable (e.g., gaze, pupil, iris) features from the latent space. Compared to existing methods, De\textsuperscript 2 Gaze requires fewer sparse features, and achieves faster convergence speed, lower computational complexity, and higher accuracy in 3D gaze estimation.Qualitative and quantitative experiments demonstrate that De\textsuperscript 2 Gaze achieves state-of-the-art accuracy and high-quality semantic segmentation for 3D gaze estimation on the TEyeD dataset.
Related Material