DUN: Dual-Path Temporal Matching Network for Natural Language-Based Vehicle Retrieval

Ziruo Sun, Xinfang Liu, Xiaopeng Bi, Xiushan Nie, Yilong Yin; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 4061-4067

Abstract


Retrieving vehicles matching natural language descriptions from collections of videos is a novel and uniquely challenging task, requiring consideration not only of vehicle types and colors, but also of temporal relations, e.g., "A white crossover keeping straight behind a silver hatchback." To perform this task, we propose Dual-path Temporal Matching Network (DUN). DUN uses a pre-trained CNN and GloVe to extract visual and text features, respectively, and GRUs to mine temporal relationships in videos and sentences. Furthermore, the proposed network can attain superior performance by including techniques such as re-ranking. With its simple structure, DUN achieved second place on the AI City Challenge 2021 Track 5.

Related Material


[pdf]
[bibtex]
@InProceedings{Sun_2021_CVPR, author = {Sun, Ziruo and Liu, Xinfang and Bi, Xiaopeng and Nie, Xiushan and Yin, Yilong}, title = {DUN: Dual-Path Temporal Matching Network for Natural Language-Based Vehicle Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {4061-4067} }