A Unified Approach for Occlusion Tolerant 3D Facial Pose Capture and Gaze Estimation Using MocapNETs

Qammaz, Ammar; Argyros, Antonis A.

Ammar Qammaz, Antonis A. Argyros; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 3178-3188

Abstract

We tackle the challenging problems of 3D facial capture, head pose and gaze estimation. We do so by extending MocapNET, a highly effective deep learning motion capture framework. By leveraging state-of-the-art RGB/2D joint estimators, the proposed network ensemble converts 2D facial keypoints into a real-time 3D Bio-Vision Hierarchy (BVH) skeleton in an end-to-end fashion, incorporating inverse kinematics computations. Our approach achieves satisfactory performance on benchmark datasets and also architecturally excels in challenging scenarios with significant facial occlusions. Moreover, it runs in real-time on CPU, which makes it an ideal choice for applications requiring low-latency interactions. Overall, our unified approach for facial capture, head pose and gaze estimation provides a robust solution for capturing facial expressions and visual focus, with huge potential in HCI and AR/VR applications. Notably, our approach is naturally integrable with MocapNETs for 3D human body and hands pose estimation, offering one of the few state-of-the-art unified approaches that enable holistic recovery of 3D information regarding human gaze, face, upper/lower body, hands, and feet.

Related Material

[pdf]

[bibtex]

@InProceedings{Qammaz_2023_ICCV, author = {Qammaz, Ammar and Argyros, Antonis A.}, title = {A Unified Approach for Occlusion Tolerant 3D Facial Pose Capture and Gaze Estimation Using MocapNETs}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3178-3188} }