- [pdf] [code]
Boundary-aware Temporal Sentence Grounding with Adaptive Proposal Refinement
Temporal sentence grounding (TSG) in videos aims to localize the temporal interval from an untrimmed video that is relevant to a given query sentence. In this paper, we introduce an effective proposal-based approach to solve the TSG problem. A Boundary-aware Feature Enhancement (BAFE) module is proposed to enhance the proposal feature with its boundary information, by imposing a new temporal difference loss. Meanwhile, we introduce a Boundary-aware Feature Aggregation (BAFA) module to aggregate boundary features and propose a Proposal-level Contrastive Learning (PCL) method to learn query-related content features by maximizing the mutual information between the query and proposals. Furthermore, we introduce a Proposal Interaction (PI) module with Adaptive Proposal Selection (APS) strategies to effectively refine proposal representations and make the final localization. Extensive experiments on Charades-STA, ActivityNet-Captions and TACoS datasets show the effectiveness of our solution. Our code is available at https://github.com/DJX1995/BAN-APR.