Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood, Muhammad Haris Khan; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 23041-23051

Abstract


Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of developing a robust ULD framework we explore the potential of a recent paradigm of self-supervised learning algorithms known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of diffusion models for ULD task we make the following core contributions. First we propose a ZeroShot ULD baseline based on simple clustering of random pixel locations with nearest neighbour matching. It delivers better results than the existing ULD methods. Second motivated by the ZeroShot performance we develop a ULD algorithm based on diffusion features using self-training and clustering which also outperforms prior methods by notable margins. Third we introduce a new proxy task based on generating latent pose codes and also propose a two-stage clustering mechanism to facilitate effective pseudo-labeling resulting in a significant performance improvement. Overall our approach consistently outperforms state-of-the-art methods on four challenging benchmarks AFLW MAFL CatHeads and LS3D by significant margins.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Tourani_2024_CVPR, author = {Tourani, Siddharth and Alwheibi, Ahmed and Mahmood, Arif and Khan, Muhammad Haris}, title = {Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {23041-23051} }