Joint Discovery of Object States and Manipulation Actions

Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic, Simon Lacoste-Julien; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2127-2136


Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cost with constraints. We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision. We demonstrate successful discovery of seven manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations. We show that our joint formulation results in an improvement of object state discovery by action recognition and vice versa.

Related Material

[pdf] [arXiv]
author = {Alayrac, Jean-Baptiste and Laptev, Ivan and Sivic, Josef and Lacoste-Julien, Simon},
title = {Joint Discovery of Object States and Manipulation Actions},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}