Multi-View Multi-Label Canonical Correlation Analysis for Cross-Modal Matching and Retrieval

Rushil Sanghavi, Yashaswi Verma; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4701-4710

Abstract


In this paper, we address the problem of cross-modal retrieval in presence of multi-view and multi-label data. For this, we present Multi-view Multi-label Canonical Correlation Analysis (or MVMLCCA), which is a generalization of CCA for multi-view data that also makes use of high-level semantic information available in the form of multi-label annotations in each view. While CCA relies on explicit pairings/associations of samples between two views (or modalities), MVMLCCA uses the available multi-label annotations to establish correspondence across multiple (two or more) views without the need of explicit pairing of multi-view samples. Extensive experiments on two multi-modal datasets demonstrate that the proposed approach offers much more flexibility than the related approaches without compromising on scalability and cross-modal retrieval performance. Our code and precomputed features are available at https://github.com/Rushil231100/MVMLCCA.

Related Material


[pdf]
[bibtex]
@InProceedings{Sanghavi_2022_CVPR, author = {Sanghavi, Rushil and Verma, Yashaswi}, title = {Multi-View Multi-Label Canonical Correlation Analysis for Cross-Modal Matching and Retrieval}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4701-4710} }