Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

Duochao Shi¹* Weijie Wang^1,4* Donny Y. Chen² Zeyu Zhang^2,4 Jia-Wang Bian³ Bohan Zhuang¹ Chunhua Shen¹

¹Zhejiang University, China ²Monash University, Australia ³MBZUAI ⁴GigaAI

* Equal contribution

Abstract

Depth maps are widely used in feed-forward 3D Gaussian Splatting (3DGS) pipelines by unprojecting them into 3D point clouds for novel view synthesis. This approach offers advantages such as efficient training, the use of known camera poses, and accurate geometry estimation. However, depth discontinuities, which are particularly problematic at the boundaries of the reconstructed geometry, often lead to fragmented or sparse point clouds, degrading rendering quality---a well-known limitation of depth-based representations. To tackle this issue, we introduce PM-Loss, a novel regularization loss based on a pointmap predicted by a pre-trained transformer. Although the pointmap itself may be less accurate than the depth map, it provides a powerful prior for geometric coherence and structural completeness, especially at the very edges where depth prediction falters. With the improved depth map, our method significantly improves the feed-forward 3DGS across various architectures and scenes, delivering consistently better rendering results.

Method

Overview of our PM-Loss. The process begins by estimating a dense point map of the scene using a pre-trained model. This estimated point map then serves as direct 3D supervision for training a feed-forward 3D Gaussian Splatting model. Crucially, unlike conventional methods relying predominantly on 2D supervision, our approach leverages explicit 3D geometric cues, leading to enhanced 3D shape fidelity.

Comparison Experiments

Point Cloud Visualization

Rendering Results

Qualitative comparisons on DL3DV(top two rows) and RealEstate10K(bottom two rows). Adding PM-Loss leads to significant improvements in rendering quality at boundaries. Note the mitigation of blurry artifacts(row 1,3) and black regions(row 2,4) in the rendered views.

Quantitative results in the boundary-aware setting.

Results on DTU with varying input numbers

Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

TL;DR

Abstract

Method

Comparison Experiments

Citation