Generic Event Boundary Detection via Denoising Diffusion

Jaejun Hwang, Dayoung Gong, Manjin Kim, Minsu Cho; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 14084-14094

Abstract


Generic event boundary detection (GEBD) aims to identify natural boundaries in a video, segmenting it into distinct and meaningful chunks. Despite the inherent subjectivity of event boundaries, previous methods have focused on deterministic predictions, overlooking the diversity of plausible solutions. In this paper, we introduce a novel diffusion-based boundary detection model, dubbed DiffGEBD, that tackles the problem of GEBD from a generative perspective. The proposed model encodes relevant changes across adjacent frames via temporal self-similarity and then iteratively decodes random noise into plausible event boundaries being conditioned on the encoded features. Classifier-free guidance allows the degree of diversity to be controlled in denoising diffusion. In addition, we introduce a new evaluation metric to assess the quality of predictions considering both diversity and fidelity. Experiments show that our method achieves strong performance on two standard benchmarks, Kinetics-GEBD and TAPOS, generating diverse and plausible event boundaries.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Hwang_2025_ICCV, author = {Hwang, Jaejun and Gong, Dayoung and Kim, Manjin and Cho, Minsu}, title = {Generic Event Boundary Detection via Denoising Diffusion}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {14084-14094} }