Efficient 3D Video Engine Using Frame Redundancy
Traditional 3d video understanding methods process videos frame by frame. We argue that a lot of computation in this mechanism is redundant based on a key observation - adjacent frames in 3D videos have visually similar geometry structure. To handle the redundancy, we propose the Efficient 3D Video Engine (EVE), aiming to avoid the computation of redundant points. It consists of two modules: 1) redundancy removing module designed to detect redundancy and remove it; 2) residual learning module to extract features on non-redundant points. As a simple plug and play framework, EVE can be easily incorporated in main-stream 3D models. Experiments demonstrate that EVE can significantly reduce computation without performance loss on large scale datasets. On the other hand, with similar computation, EVE outperforms the strong baseline by up to 4.1 mIoU on SemanticKITTI. The code is available on https://github.com/ecr23xx/eve.