Towards Accurate Visual and Natural Language-Based Vehicle Retrieval Systems
In this work, we consider two tracks of the 2021 NVIDIA AI City Challenge, the City-Scale Multi-Camera Vehicle Re-identification and Natural language-based Vehicle Retrieval. For the vehicle re-identification task, we employ the state-of-art Excited Vehicle Re-Identification deep representation learning model coupled with best training practices and domain adaptation techniques to obtain robust embeddings. We further refine the re-identification results through a series of post-processing steps to remove camera and vehicle orientation bias that is inherent in the task of re-identification. We also take advantage of multiple observations of a vehicle using track-level information and finally obtain fine-grained retrieval results. For the task of Natural language-based vehicle retrieval we leverage the recently proposed Contrastive Language-Image Pre-training model and propose a simple yet effective text-based vehicle retrieval system. We compare our performance against the top submissions to the challenge and our systems are ranked 8^\text th in the public leaderboard for both tracks.