Self-Supervised Incremental Learning of Object Representations from Arbitrary Image Sets

Leotescu, George; Popa, Alin-Ionut; Grigore, Diana-Nicoleta N; Voinea, Daniel; Perona, Pietro

George Leotescu, Alin-Ionut Popa, Diana-Nicoleta N Grigore, Daniel Voinea, Pietro Perona; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 8133-8143

Abstract

Computing a comprehensive and robust visual representation of an arbitrary object or category of objects is a complex problem. The difficulty increases when one starts from a set of uncalibrated images obtained from different sources. We propose a self-supervised approach Multi-Image Latent Embedding (MILE) which computes a single representation from such an image set. MILE operates incrementally considering one image at a time while processing various depictions of the class through a shared gated cross-attention mechanism. The representations are progressively refined as more available images are incorporated without requiring additional training. Our experiments on Amazon Berkeley Objects (ABO) and iNaturalist demonstrate the effectiveness in two tasks: object or category-specific image retrieval and unsupervised context-conditioned object segmentation. Moreover the proposed multi-image input setup opens new frontiers for the task of object retrieval. Our studies indicate that our models can capture descriptive representations that better encapsulate the intrinsic characteristics of the objects. Our code is available at https://github.com/amazon-science/mile.

Related Material

[pdf] [supp]

[bibtex]

@InProceedings{Leotescu_2025_WACV, author = {Leotescu, George and Popa, Alin-Ionut and Grigore, Diana-Nicoleta N and Voinea, Daniel and Perona, Pietro}, title = {Self-Supervised Incremental Learning of Object Representations from Arbitrary Image Sets}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {8133-8143} }