Improving Human Pose-Conditioned Generation: Fine-tuning ControlNet Models with Reinforcement Learning

Lee, Jeonghwan; Yun, Heywon; Kim, Jimin; Fashandi, Homa

Jeonghwan Lee, Heywon Yun, Jimin Kim, Homa Fashandi; Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops, 2025, pp. 140-149

Abstract

Advancements in diffusion-based text-to-image generation models have made it possible to create high-quality human images. However generating humans in desired poses using text prompts alone remains challenging. Image-to-image generation methods utilizing additional image conditions can address this issue; however they often struggle with generating images that accurately match conditioning images. Inspired by the success of Denoising Diffusion Policy Optimization(DDPO) in leveraging reinforcement learning to train generative models with the support of Large Language Models(LLM) we propose a novel fine-tuning framework that effectively understand pose-conditioning images. Our framework suggests newly designed reward functions specifically aimed at enhancing pose accuracy. We demonstrate that our method effectively improves human generation by enhancing pose accuracy and the correct generation of body parts without omissions or additions. Furthermore we prove that the usage of more detailed pose dataset along with our proposed reward functions leads to improved training results.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Lee_2025_WACV, author = {Lee, Jeonghwan and Yun, Heywon and Kim, Jimin and Fashandi, Homa}, title = {Improving Human Pose-Conditioned Generation: Fine-tuning ControlNet Models with Reinforcement Learning}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {February}, year = {2025}, pages = {140-149} }