Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning

Kim, Jinwoo; Choi, Janghyuk; Choi, Ho-Jin; Kim, Seon Joo

Jinwoo Kim, Janghyuk Choi, Ho-Jin Choi, Seon Joo Kim; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 19198-19207

Abstract

Object-centric learning (OCL) aspires general and com- positional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply various data-driven inductive biases by utilizing geometric or temporal information in the multi-image data. Single-view images carry less information about how to disentangle a given scene than videos or multi-view im- ages do. Hence, owing to the difficulty of applying induc- tive biases, OCL for single-view images still remains chal- lenging, resulting in inconsistent learning of object-centric representation. To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHep- herding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. The new modules, At- tention Refining Kernel (ARK) and Intermediate Point Pre- dictor and Encoder (IPPE), respectively, prevent slots from being distracted by the background noise and indicate lo- cations for slots to focus on to facilitate learning of object- centric representation. We also propose a weak- and semi- supervision approach for OCL, whilst our proposed frame- work can be used without any assistant annotation during the inference. Experiments show that our proposed method enables consistent learning of object-centric representa- tion and achieves strong performance across four datasets. Code is available at https://github.com/object- understanding/SLASH.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Kim_2023_CVPR, author = {Kim, Jinwoo and Choi, Janghyuk and Choi, Ho-Jin and Kim, Seon Joo}, title = {Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {19198-19207} }