TextAug: Test Time Text Augmentation for Multimodal Person Re-Identification

Mulham Fawakherji, Eduard Vazquez, Pasquale Giampa, Binod Bhattarai; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, 2024, pp. 320-329


ultimodal Person Re-identification is gaining popularity in the research community due to its effectiveness compared to counter-part unimodal frameworks. However, the bottleneck for multimodal deep learning is the need for a large volume of multimodal training examples. Data augmentation techniques such as cropping, flipping, rotation, etc. are often employed in the image domain to improve the generalization of deep learning models. However, augmenting in other modalities than the image, such as text, is challenging and requires significant computational resources and external data sources. In this study, we investigate the effectiveness of two computer vision data augmentation techniques namely, "cutout" and "cutmix", for text augmentation in multi-modal person re-identification. In our approach, we merge these two augmentation strategies under one strategy called "CutMixOut" which involves randomly removing words or sub-phrases from a sentence (Cutout) and blending parts of two or more sentences to create diverse examples (CutMix) with a certain probability assigned to each operation. This augmentation was implemented at inference time without any prior training. Our results demonstrate that our techniques are simple and effective in improving the performance on multiple multimodal persons' re-identification benchmarks.

Related Material

@InProceedings{Fawakherji_2024_WACV, author = {Fawakherji, Mulham and Vazquez, Eduard and Giampa, Pasquale and Bhattarai, Binod}, title = {TextAug: Test Time Text Augmentation for Multimodal Person Re-Identification}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops}, month = {January}, year = {2024}, pages = {320-329} }