Face Parsing From RGB and Depth Using Cross-Domain Mutual Learning
Existing methods of face parsing have proven effective at classifying each pixel of an RGB image into different facial components. However, there is a lack of face parsing research that utilizes depth domain. To the best of our knowledge, we present the first study to exploit 2.5D data for face parsing. We introduce a novel framework to jointly learn (1) RGB face parsing, (2) depth face parsing and (3) RGB-to-depth domain translation, which can be effective even when only a small amount of annotated depth data is available for training. To this end, we also create the first RGB-D face parsing benchmarks based on CelebAMask-HQ, LaPa and Helen by utilizing an off-the-shelf 3D head reconstruction model. Overall, our approach makes two main contributions. First, our method leverages mutual learning between RGB and depth face parsing, which enables bidirectional knowledge distillation between the two data domains. Second, our method utilizes end-to-end learning of RGB-to-depth domain translation and depth face parsing, which can help overcome the scarcity of annotated depth data. We perform extensive experiments to validate the effectiveness of our method, in which we achieve state-of-the-art results in RGB face parsing. As far as we know, we also report the first results on face parsing from depth data. All experiments are conducted on our new RGB-D face parsing datasets, which are publicly available at https://github.com/jyunlee/CelebAMask-HQ-D_LaPa-D_Helen-D.