Unsupervised Multi-Object Segmentation Using Attention and Soft-Argmax

Bruno Sauvalle, Arnaud de La Fortelle; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023, pp. 3267-3276

Abstract


We introduce a new architecture for unsupervised object-centric representation learning and multi-object detection and segmentation, which uses a translation-equivariant attention mechanism to predict the coordinates of the objects present in the scene and to associate a feature vector to each object. A transformer encoder handles occlusions and redundant detections, and a convolutional autoencoder is in charge of background reconstruction. We show that this architecture significantly outperforms the state of the art on complex synthetic benchmarks.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Sauvalle_2023_WACV, author = {Sauvalle, Bruno and de La Fortelle, Arnaud}, title = {Unsupervised Multi-Object Segmentation Using Attention and Soft-Argmax}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2023}, pages = {3267-3276} }