Self-Attention With Convolution and Deconvolution for Efficient Eye Gaze Estimation From a Full Face Image

Jun O Oh, Hyung Jin Chang, Sang-Il Choi; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4992-5000

Abstract


This paper proposes a whole new face image-based eye gaze estimation network to solve low generalization performance. Due to the high variance of facial appearance and environmental conditions, conventional methods in gaze estimation have low generalization performance and are easily overfitted to training subjects. To solve this problem, we adopt a self-attention mechanism that has better generalization performance. Nevertheless, applying self-attention directly to an image incurs a high computational cost. Thus, we introduce a new projection that uses convolution in the entire face image to accurate model the local context and reduce the computational cost of self-attention. The proposed model also includes deconvolution that transforms the down-sampled global context to the same size as the input so that spatial information is not lost. We confirmed through observations that the new method achieved state of the art on the EYEDIAP, MPIIFaceGaze, Gaze360 and RT-GENE datasets and achieved a performance increase of 0.02deg to 0.30deg compared to the other state of the art model. In addition, we show the generalization performance of the proposed model through a cross-dataset evaluation.

Related Material


[pdf]
[bibtex]
@InProceedings{O_Oh_2022_CVPR, author = {O Oh, Jun and Chang, Hyung Jin and Choi, Sang-Il}, title = {Self-Attention With Convolution and Deconvolution for Efficient Eye Gaze Estimation From a Full Face Image}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4992-5000} }