Consistent View Synthesis with Pose-Guided Diffusion Models

Propose a framework based on diffusion models for consistent and realistic long-term novel view synthesis. Diffusion models have achieved impressive performance on many content creation applications, such as image-to-image translation and text-to- image generation.


Breaking the Curse of Quality Saturation with User-Centric Ranking

we introduce an alternative formulation called “user-centric ranking” based on a transposed view, which casts ‘users’ as ‘tokens’ and ‘items’ as ‘documents’ instead. We show that this formulation has a number of advantages and shows less sign of quality saturation when trained on substantially larger data sets.


Robust Dynamic Radiance Fields

We introduce RoDynRF, an algorithm for reconstructing dynamic radiance fields from casual videos. Unlike existing approaches, we do not require accurate camera poses as input. Our method optimizes camera poses and two radiance fields, modeling static and dynamic elements. Our approach includes a coarse-to-fine strategy and epipolar geometry to exclude moving pixels, deformation fields, time- dependent appearance models, and regularization losses for improved consistency.