Trial-Oriented Visual Rearrangement

Yuyi Liu, Xinhang Song, Tianliang Qi, Shuqiang Jiang; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, pp. 8022-8031

Abstract


Towards visual room rearrangement for embodied agents, this paper tackles the intricate challenge of restoring a disarrayed scene configuration to its intended goal state. The task necessitates a range of sophisticated capabilities, including efficient spatial navigation, precise and accurate object interaction, sensitive scene change detection, and meticulous restoration techniques. The inherent complexity of this endeavor stems from the diverse nature of potential object changes, encompassing movements within the space, alterations in appearance, and changes in existence--where objects may be introduced or removed from the scene. Previous methods, either end-to-end reinforcement learning or modular approaches, struggle with handling these changes in a unified manner due to the heterogeneous nature of the inference spaces. To address this, this paper proposes a Trial-Oriented Visual Rearrangement (TOR) framework, which leverages the principles of stronger embodiment to prune the joint reasoning space and identify a smaller shared space for processing various object changes. TOR maintains a differential point cloud representation to capture environmental changes and uses two core mechanisms, assessment and refinement, to iteratively restore the scene to the goal state. Experimental results demonstrate the effectiveness of TOR in restoring both object movement and appearance changes and show its generalization to complex multi-room environments.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Liu_2025_ICCV, author = {Liu, Yuyi and Song, Xinhang and Qi, Tianliang and Jiang, Shuqiang}, title = {Trial-Oriented Visual Rearrangement}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2025}, pages = {8022-8031} }