The IKEA ASM Dataset: Understanding People Assembling Furniture Through Actions, Objects and Pose

Ben-Shabat, Yizhak; Yu, Xin; Saleh, Fatemeh; Campbell, Dylan; Rodriguez-Opazo, Cristian; Li, Hongdong; Gould, Stephen

Yizhak Ben-Shabat, Xin Yu, Fatemeh Saleh, Dylan Campbell, Cristian Rodriguez-Opazo, Hongdong Li, Stephen Gould; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, pp. 847-859

Abstract

The availability of a large labelled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks. In the context of understanding human activities, existing public datasets, while large in size, are often limited to a single RGB camera and provide only per-frame or per-clip action annotations. To enable richer analysis and understanding of human activities, we introduce IKEA ASM---a three million frame, multi-view, furniture assembly video dataset that includes depth, atomic actions, object segmentation, and human poses. Additionally, we benchmark prominent methods for video action recognition, object segmentation and human pose estimation tasks on this challenging dataset. The dataset enables the development of holistic methods, which integrate multi-modal and multi-view data to better perform on these tasks.

Related Material

[pdf]

[bibtex]

@InProceedings{Ben-Shabat_2021_WACV, author = {Ben-Shabat, Yizhak and Yu, Xin and Saleh, Fatemeh and Campbell, Dylan and Rodriguez-Opazo, Cristian and Li, Hongdong and Gould, Stephen}, title = {The IKEA ASM Dataset: Understanding People Assembling Furniture Through Actions, Objects and Pose}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2021}, pages = {847-859} }