Mixed Neural Voxels for Fast Multi-view Video Synthesis

Feng Wang, Sinan Tan, Xinghang Li, Zeyue Tian, Yafei Song, Huaping Liu; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19706-19716

Abstract


Synthesizing high-fidelity videos from real-world multiview input is challenging due to the complexities of real-world environments and high-dynamic movements. Previous works based on neural radiance fields have demonstrated high-quality reconstructions of dynamic scenes. However, training such models on real-world scenes is time-consuming, usually taking days or weeks. In this paper, we present a novel method named MixVoxels to efficiently represent dynamic scenes which leads to fast training and rendering speed. The proposed MixVoxels represents the 4D dynamic scenes as a mixture of static and dynamic voxels and processes them with different networks. In this way, the computation of the required modalities for static voxels can be processed by a lightweight model, which essentially reduces the amount of computation as many daily dynamic scenes are dominated by static backgrounds. To distinguish the two kinds of voxels, we propose a novel variation field to estimate the temporal variance of each voxel. For the dynamic representations, we design an inner-product time query method to efficiently query multiple time steps, which is essential to recover the high-dynamic movements. As a result, with 15 minutes of training for dynamic scenes with inputs of 300-frame videos, MixVoxels achieves better PSNR than previous methods. For rendering, MixVoxels can render a novel view video with 1K resolution at 37 fps. Codes and trained models are available at https://github.com/fengres/mixvoxels.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Wang_2023_ICCV, author = {Wang, Feng and Tan, Sinan and Li, Xinghang and Tian, Zeyue and Song, Yafei and Liu, Huaping}, title = {Mixed Neural Voxels for Fast Multi-view Video Synthesis}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = {October}, year = {2023}, pages = {19706-19716} }