Research Paper

SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM

A dense SLAM paper that uses 3D Gaussians for online tracking, mapping, and real-time rendering from an RGB-D camera.

December 2023RGB-D SLAMarXiv:2312.02126

Detailed Reading

SplaTAM treats Gaussians as a live map. In a normal 3DGS pipeline, camera poses are computed first and reconstruction happens offline. In SLAM, pose and map must be estimated together as frames arrive. SplaTAM uses RGB-D input so depth can constrain this online process.

The pipeline alternates between tracking and mapping. For tracking, it renders the current Gaussian map and optimizes camera pose to match the incoming frame. For mapping, it adds or updates Gaussians where the current view reveals new surfaces, using silhouettes and depth cues to avoid uncontrolled growth.

This paper is useful because it connects 3DGS to robotics and AR. A Gaussian map can be photorealistic for visualization and also useful for tracking. The trade-off is online robustness: noisy depth, fast motion, and map expansion must all be handled in real time.

SplaTAM brings 3DGS into online RGB-D SLAM. Instead of reconstructing a scene offline after poses are known, it tracks camera motion and updates a Gaussian map as frames arrive. This changes the role of 3DGS from a final renderer to a live mapping representation.

The system alternates tracking and mapping. Tracking estimates the current camera pose by rendering the Gaussian map and comparing it with incoming RGB-D observations. Mapping inserts or updates Gaussians using depth to anchor geometry, then optimizes the local map so future renders explain the sensor stream.

The algorithmic benefit of RGB-D is that depth greatly reduces ambiguity. Vanilla 3DGS from RGB depends on SfM initialization and multi-view photometric cues, but SLAM needs incremental decisions. Depth lets the system place Gaussians more reliably and use rendering residuals for pose refinement.

The paper is important because it connects splatting with robotics-style state estimation. Its limits are typical of SLAM: noisy depth, fast motion, loop closure, and dynamic objects can still cause drift or map artifacts. Still, it shows why Gaussians are attractive for dense maps: they can be optimized, rendered, and inspected directly.

What The Paper Does

SplaTAM brings 3DGS into embodied mapping. Instead of assuming offline COLMAP poses, it tracks and maps online from a moving RGB-D camera.

The method uses differentiable rendering and silhouette-guided optimization to expand and refine a dense Gaussian map while estimating camera pose.

Core Ideas

  • Uses Gaussians as the map representation for both tracking and rendering.
  • Adds online map expansion as new regions are observed.
  • Shows strong pose estimation, map construction, and novel-view synthesis results.

Why It Matters

  • It is one of the most visible early Gaussian SLAM systems and has a large open-source footprint.
  • It connects 3DGS with robotics, embodied AI, and real-time mapping rather than offline capture.
  • It demonstrates that splats can be a live map representation, not only an offline render asset.

Read This If

  • You work on RGB-D mapping, robotics, AR scanning, or online reconstruction.
  • You want to avoid an offline SfM prerequisite.
  • You are comparing Gaussian maps with neural or TSDF maps.

Limitations And Caveats

  • The method relies on RGB-D input, so it is not a pure monocular solution.
  • Online systems must balance tracking robustness, map growth, and compute budget.
  • Fast camera motion and poor depth can still hurt tracking and reconstruction.