-
[pdf]
[bibtex]@InProceedings{Hong_2025_WACV, author = {Hong, Junyoung and Yang, Hyeri and Kim, Ye Ju and Kim, Haerim and Kim, Shinwoong and Shim, Euna and Lee, Kyungjae}, title = {D2FP: Learning Implicit Prior for Human Parsing}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {9096-9106} }
D2FP: Learning Implicit Prior for Human Parsing
Abstract
Human parsing aims to segment human images into fine-grained semantic parts. Considering the underlying structure of the human body state-of-the-art methods typically depend on prior assumptions to represent intrinsic body relationships. However leveraging the same structural prior knowledge across various scenarios poses challenges in achieving stable prediction and requires additional network design efforts. To address these issues we introduce a novel method the Dynamic Dual Transformer for Parsing (D2FP) which dynamically learns the implicit prior structures of the human body. Specifically we derive input-dependent prior features from the learnable semantics of human images generating prior-embedded object queries accordingly before feeding them into the Transformer decoders. Our model includes three major components to effectively learn prior object queries: a prior extraction module a prior embedding module and a multi-scale dual Transformer decoder. Furthermore a novel prior enhancement strategy is introduced where the final decoded object queries provide structural clues to enhance initial prior features. Experimental results demonstrate the superiority and effectiveness of the proposed method across two well-known human parsing benchmarks: LIP and CIHP. Code and models are available at https://github.com/cvlab-yongin/D2FP.
Related Material