MOVES: Manipulated Objects in Video Enable Segmentation

Higgins, Richard E. L.; Fouhey, David F.

Richard E. L. Higgins, David F. Fouhey; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 6334-6343

Abstract

We present a method that uses manipulation to learn to understand the objects people hold and as well as hand-object contact. We train a system that takes a single RGB image and produces a pixel-embedding that can be used to answer grouping questions (do these two pixels go together) as well as hand-association questions (is this hand holding that pixel). Rather painstakingly annotate segmentation masks, we observe people in realistic video data. We show that pairing epipolar geometry with modern optical flow produces simple and effective pseudo-labels for grouping. Given people segmentations, we can further associate pixels with hands to understand contact. Our system achieves competitive results on hand and hand-held object tasks.

Related Material

[pdf]

[bibtex]

@InProceedings{Higgins_2023_CVPR, author = {Higgins, Richard E. L. and Fouhey, David F.}, title = {MOVES: Manipulated Objects in Video Enable Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2023}, pages = {6334-6343} }