Mutual Quantization for Cross-Modal Search With Noisy Labels

Erkun Yang, Dongren Yao, Tongliang Liu, Cheng Deng; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 7551-7560

Abstract


Deep cross-modal hashing has become an essential tool for supervised multimodal search. These models tend to be optimized with large, curated multimodal datasets, where most labels have been manually verified. Unfortunately, in many scenarios, such accurate labeling may not be available. In contrast, datasets with low-quality annotations may be acquired, which inevitably introduce numerous mistakes or label noise and therefore degrade the search performance. To address the challenge, we present a general robust cross-modal hashing framework to correlate distinct modalities and combat noisy labels simultaneously. More specifically, we propose a proxy-based contrastive (PC) loss to mitigate the gap between different modalities and train networks for different modalities jointly with small-loss samples that are selected with the PC loss and a mutual quantization loss. The small-loss sample selection from such joint loss can help choose confident examples to guide the model training, and the mutual quantization loss can maximize the agreement between different modalities and is beneficial to improve the effectiveness of sample selection. Experiments on three widely-used multimodal datasets show that our method significantly outperforms existing state-of-the-art methods.

Related Material


[pdf]
[bibtex]
@InProceedings{Yang_2022_CVPR, author = {Yang, Erkun and Yao, Dongren and Liu, Tongliang and Deng, Cheng}, title = {Mutual Quantization for Cross-Modal Search With Noisy Labels}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2022}, pages = {7551-7560} }