OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 445-456

Abstract


We present OAKINK2 a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance Primitive Task and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks treating them as a sequence of object affordance fulfillment. The first level Affordance outlines the functionalities that objects in the scene can afford the second level Primitive Task describes the minimal interaction units that humans interact with the object to achieve its affordance and the third level Complex Task illustrates how Primitive Tasks are composed and interdependent. OAKINK2 dataset provides multi-view image streams and precise pose annotations for the human body hands and various interacting objects. This extensive collection supports applications such as interaction reconstruction and motion synthesis. Based on the 3-level abstraction of OAKINK2 we explore a task-oriented framework for Complex Task Completion (CTC). CTC aims to generate a sequence of bimanual manipulation to achieve task objectives. Within the CTC framework we employ Large Language Models (LLMs) to decompose the complex task objectives into sequences of Primitive Tasks and have developed a Motion Fulfillment Model that generates bimanual hand motion for each Primitive Task. OAKINK2 datasets and models are available at https://oakink.net/v2.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Zhan_2024_CVPR, author = {Zhan, Xinyu and Yang, Lixin and Zhao, Yifei and Mao, Kangrui and Xu, Hanlin and Lin, Zenan and Li, Kailin and Lu, Cewu}, title = {OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {445-456} }