-
[pdf]
[bibtex]@InProceedings{Isoda_2024_ACCV, author = {Isoda, Yuki and Kobayashi, Daisuke}, title = {Separate Guided Denoising Training for Human-Object Interaction Detection}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV) Workshops}, month = {December}, year = {2024}, pages = {405-420} }
Separate Guided Denoising Training for Human-Object Interaction Detection
Abstract
Understanding scenes requires not only the detection objects but also the recognition of the interactions between them. Human-Object Interaction (HOI) detection plays a crucial role in enhancing contextual comprehension by identifying the interactions between humans and objects, which is essential for building more robust and intelligent vision systems. While DETR-based models have shown significant success in HOI detection, they are hindered by slow training convergence. The SOV-STG method has attempted to address this challenge in previous research. To further improve the learning efficiency and accuracy of SOV-STG, we introduce a novel Separate Guided Denoising training strategy specifically designed for HOI detection. Our approach separates the denoising of noised ground truth data for both the humanobject decoder and the verb decoder, enabling more efficient and targeted training. Furthermore, we enhance training performance by merging redundant human-object pair annotations, and filtering and regenerating noised bounding boxes. The proposed method was validated on the HICO-DET dataset, achieving state-of-the-art results. Our contributions include a novel training strategy that improves accuracy and ablation studies demonstrating its effectiveness.
Related Material