A Multi-Granularity Retrieval System for Natural Language-Based Vehicle Retrieval
We focus on the task of the Natural language-based vehicle track retrieval of the 6th AI City Challenge. Performing target vehicle retrieval using natural language descriptions is a comprehensive task, requiring a model to first understand the semantics of the language and vision modalities and then match them to generate accurate retrieval results. However, this task involves the following challenges: (1) the ambiguity of the natural language descriptions towards a target vehicle; (2) the matching between the linguistic semantics of the language descriptions and the corresponding static and dynamic properties of the target vehicle; (3) the shortage of the annotated language and target vehicle pairs. Obviously, focusing on solving a subset of the problems cannot generate a robust retrieval model. Therefore, we propose a multi-granularity retrieval system to solve this task, consisting of three main modules: (1) Language parsing module that aims to obtain the fine-grained vehicle attributes (e.g. color, type and motion) from the language descriptions; (2) Language-augmented multi-query vehicle track retrieval module that serves as our baseline model to incorporate information from multiple imperfect queries; (3) Target vehicle attributes enhancement module that explicitly fuses the static and dynamic properties of the target vehicle to generate the final retrieval results. Our system has achieved the 1st place on the 6th AI City Challenge, yielding a strong performance on the private test set.