Mind the Gap! Sparse Novel View Synthesis as Natural Video Completion

Yan Xu, Yixing Wang, Stella X. Yu

The University of Michigan

TL;DR: We reinterpret sparse-input novel view synthesis as a video completion problem—recovering missing frames between wide-baseline inputs. This formulation enables us to leverage powerful generative priors from pretrained video diffusion models to synthesize plausible intermediate views, thereby providing additional constraints for under-observed regions during 3D-GS training.

Our Framework

Overview of the framework. After initializing 3D-GS from sparse input images (①), ② we create guidance images and assess their uncertainties based on the current 3D-GS renderings. ③ The guidance images guide the diffusion process through the uncertainty-aware modulation. The diffusion process enhances high-uncertain regions while preserving reliable parts. ④ The generated pseudo-view images are then used to densify the Gaussian primitives and to constrain the 3D-GS training. For illustration, we show pseudo-view generation from one image pair, though all pairs are processed sequentially in practice.

Mind the Gap! Sparse Novel View Synthesis as Natural Video Completion

Our Framework

Results on LLFF

Results on DTU