Tracked-Vehicle Retrieval by Natural Language Descriptions With Domain Adaptive Knowledge
This paper introduces our solution for Track 2 in AI City Challenge 2022. Track 2 task is TrackedVehicle Retrieval by Natural Language Descriptions with a real-world dataset with different scenarios and multi-camera. We mainly focus on developing a robust natural language-based vehicle retrieval system to address the domain bias problem due to unseen scenarios and multi-view multi-camera vehicle tracks. Specifically, we apply CLIP to effectively extract both visual and textual representation for contrastive representation learning. Furthermore, Since there are new scenarios in the test set, we propose a new Domain Adaptive Training method that utilizes the information from labeled data and transfers it to unlabeled data to generate pseudo labels. By using this simple and effective strategy, we not only breach the domain gap between the training set and test set but also require less computation cost and data compared to previous top performance methods. Finally, we use a post-processing method called pruning to eliminate the wrong retrieved vehicle track. Taking one step further, we also investigate the impact of different text formats and the number of pseudo labels data for the fine-tuning process. Our proposed method has achieved 3rd place on the AI City Challenge 2022, yielding a competitive performance of 47.73% MRR accuracy on the private test set, which verified the effectiveness and scalability of the proposed solution.