Difformer for Action Segmentation

Nicolas Aziere, Tieqiao Wang, Sinisa Todorovic; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 2986-2995

Abstract


We propose a novel approach to supervised action segmentation that explicitly models uncertainty over framewise class predictions using the Dirichlet distribution. In contrast to most SOTA methods that rely on the multi-stage refinement of initially proposed frame labels, our approach recalibrates frame-level class distributions through a Dirichlet diffusion process, which is analytically tractable (closed-form) and hence computationally efficient. Diffusion parameters are estimated only at a sparse set of keyframes using a lightweight module, further reducing memory and runtime costs. Experiments on four benchmark datasets -- Breakfast, GTEA, 50Salads, and Assembly101 -- show that our approach achieves superior accuracy with fewer parameters and lower computational complexity than existing approaches.

Related Material


[pdf]
[bibtex]
@InProceedings{Aziere_2025_ICCV, author = {Aziere, Nicolas and Wang, Tieqiao and Todorovic, Sinisa}, title = {Difformer for Action Segmentation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {2986-2995} }