Boundary-aware Temporal Sentence Grounding with Adaptive Proposal Refinement

Jianxiang Dong, Zhaozheng Yin; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 3943-3959

Abstract


Temporal sentence grounding (TSG) in videos aims to localize the temporal interval from an untrimmed video that is relevant to a given query sentence. In this paper, we introduce an effective proposal-based approach to solve the TSG problem. A Boundary-aware Feature Enhancement (BAFE) module is proposed to enhance the proposal feature with its boundary information, by imposing a new temporal difference loss. Meanwhile, we introduce a Boundary-aware Feature Aggregation (BAFA) module to aggregate boundary features and propose a Proposal-level Contrastive Learning (PCL) method to learn query-related content features by maximizing the mutual information between the query and proposals. Furthermore, we introduce a Proposal Interaction (PI) module with Adaptive Proposal Selection (APS) strategies to effectively refine proposal representations and make the final localization. Extensive experiments on Charades-STA, ActivityNet-Captions and TACoS datasets show the effectiveness of our solution. Our code is available at https://github.com/DJX1995/BAN-APR.

Related Material


[pdf] [code]
[bibtex]
@InProceedings{Dong_2022_ACCV, author = {Dong, Jianxiang and Yin, Zhaozheng}, title = {Boundary-aware Temporal Sentence Grounding with Adaptive Proposal Refinement}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {3943-3959} }