Bi-Encoder Cascades for Efficient Image Search

Robert Hönig, Jan Ackermann, Mingyuan Chi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 1358-1363

Abstract


Modern neural encoders offer unprecedented text-image retrieval (TIR) accuracy, but their high computational cost impedes an adoption to large-scale image searches. To lower this cost, model cascades use an expensive encoder to refine the ranking of a cheap encoder. However, existing cascading algorithms focus on cross-encoders, which jointly process text-image pairs, but do not consider cascades of bi-encoders, which separately process texts and images. We introduce the small-world search scenario as a realistic setting where bi-encoder cascades can reduce costs. We then propose a cascading algorithm that leverages the small-world search scenario to reduce lifetime image encoding costs of a TIR system. Our experiments show cost reductions by up to 6x.

Related Material


[pdf]
[bibtex]
@InProceedings{Honig_2023_ICCV, author = {H\"onig, Robert and Ackermann, Jan and Chi, Mingyuan}, title = {Bi-Encoder Cascades for Efficient Image Search}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {1358-1363} }