-
[pdf]
[bibtex]@InProceedings{Qammaz_2023_ICCV, author = {Qammaz, Ammar and Argyros, Antonis A.}, title = {A Unified Approach for Occlusion Tolerant 3D Facial Pose Capture and Gaze Estimation Using MocapNETs}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {3178-3188} }
A Unified Approach for Occlusion Tolerant 3D Facial Pose Capture and Gaze Estimation Using MocapNETs
Abstract
We tackle the challenging problems of 3D facial capture, head pose and gaze estimation. We do so by extending MocapNET, a highly effective deep learning motion capture framework. By leveraging state-of-the-art RGB/2D joint estimators, the proposed network ensemble converts 2D facial keypoints into a real-time 3D Bio-Vision Hierarchy (BVH) skeleton in an end-to-end fashion, incorporating inverse kinematics computations. Our approach achieves satisfactory performance on benchmark datasets and also architecturally excels in challenging scenarios with significant facial occlusions. Moreover, it runs in real-time on CPU, which makes it an ideal choice for applications requiring low-latency interactions. Overall, our unified approach for facial capture, head pose and gaze estimation provides a robust solution for capturing facial expressions and visual focus, with huge potential in HCI and AR/VR applications. Notably, our approach is naturally integrable with MocapNETs for 3D human body and hands pose estimation, offering one of the few state-of-the-art unified approaches that enable holistic recovery of 3D information regarding human gaze, face, upper/lower body, hands, and feet.
Related Material