Propose a framework based on diffusion models for consistent and realistic long-term novel view synthesis. Diffusion models have achieved impressive performance on many content creation applications, such as image-to-image translation and text-to- image generation.