Mind the Gap! Sparse Novel View Synthesis as Natural Video Completion

The University of Michigan

TL;DR: We reinterpret sparse-input novel view synthesis as a video completion problem—recovering missing frames between wide-baseline inputs. This formulation enables us to leverage powerful generative priors from pretrained video diffusion models to synthesize plausible intermediate views, thereby providing additional constraints for under-observed regions during 3D-GS training.

Our Framework

Overview of the framework. After initializing 3D-GS from sparse input images (①), ② we create guidance images and assess their uncertainties based on the current 3D-GS renderings. ③ The guidance images guide the diffusion process through the uncertainty-aware modulation. The diffusion process enhances high-uncertain regions while preserving reliable parts. ④ The generated pseudo-view images are then used to densify the Gaussian primitives and to constrain the 3D-GS training. For illustration, we show pseudo-view generation from one image pair, though all pairs are processed sequentially in practice.

Results on LLFF

Results on DTU