Pipeline Tutorial

Alignment Tutorial For 3D Gaussian Splatting

How to turn a photo set into reliable cameras and sparse points before training a splat.

Step 2: AlignmentPractical workflow

Tutorial Scope

What This Page Covers

Alignment is where images become a reconstruction problem. The output is not the final splat; it is the camera intrinsics, camera poses, and sparse geometry that tell the trainer where every image was captured.

When alignment is wrong, the final splat usually shows floaters, doubled surfaces, smeared walls, or broken scale. This guide focuses on practical SfM choices, quality checks, and fixes before you spend time training.

What Alignment Produces

A 3DGS trainer needs two things before optimization: images and a camera model for each image. Alignment estimates intrinsics such as focal length and distortion, extrinsics such as camera position and rotation, and a sparse point cloud that provides initialization. The sparse cloud is not the final scene, but it gives Gaussians a reasonable place to start.

The common open-source route is COLMAP. It detects local features, matches features across image pairs, verifies geometry, incrementally registers cameras, triangulates points, and bundle-adjusts the result. GLOMAP uses a global SfM strategy that can be faster for large datasets, while Metashape and RealityCapture provide polished visual tools and export paths.

  • Cameras: one pose and calibration per registered image.
  • Sparse points: triangulated feature points used for initialization and diagnostics.
  • Undistorted images: often required by trainers so rendering uses a simple camera model.
  • Transforms file or COLMAP database: the bridge between alignment and training.

Choose The Right Alignment Path

For most research and local 3DGS workflows, use COLMAP first because almost every trainer understands its output. For very large photo sets, test GLOMAP or a global SfM workflow because incremental reconstruction can become slow or get stuck in local ordering choices. For survey, drone, or commercial photogrammetry projects, Metashape and RealityCapture can be better because they expose masks, control points, GPS weighting, and visual repair tools.

Do not treat mobile app poses as automatically better or worse than SfM poses. ARKit, LiDAR, and VISLAM can be faster and more stable in low-texture indoor spaces, but they may drift or have scale quirks. SfM can be very accurate when photos have texture and overlap, but it struggles with blur, repeated patterns, glossy surfaces, and disconnected trajectories.

  • COLMAP: default open 3DGS path, best compatibility.
  • GLOMAP: useful for larger datasets and global SfM experiments.
  • Metashape: strong GUI, masking, GCP, drone, and professional photogrammetry workflows.
  • RealityCapture: fast commercial alignment and export for production pipelines.
  • Nerfstudio process-data: convenient wrapper when you accept its defaults.

COLMAP Workflow In Practice

A practical COLMAP run has four phases: feature extraction, matching, sparse reconstruction, and image undistortion. Automatic reconstruction is fine for a first pass, but command-line steps make failures easier to diagnose. If only a few images register, do not train yet. Fix capture, matching, camera model, or image selection first.

Feature extraction should use a camera model that matches your data. Single phone or fixed-lens camera sets are easier than mixed lenses. Sequential video frames often benefit from sequential matching; unordered object photos often use exhaustive matching; large datasets may need vocabulary-tree or spatial matching. After sparse reconstruction, undistort images for the trainer and keep the sparse model.

  • If images are from one lens, use single-camera assumptions when appropriate.
  • Use sequential matching for extracted video frames and walking sequences.
  • Use exhaustive matching for smaller unordered object sets.
  • For drone imagery with GPS, use spatial or guided matching when available.
  • After reconstruction, inspect the sparse model before undistorting and training.

Quality Checks Before Training

The first number to check is registered image count. A perfect count is not required, but a dataset where only 30 percent of images register is usually missing coverage or matching quality. The second check is spatial distribution: cameras should follow your capture path, not cluster into a collapsed knot or split into unrelated islands.

Look at sparse points and camera frustums. The sparse cloud should outline the subject or space with enough coverage to initialize Gaussians. Repeated patterns, blank walls, trees, and reflective objects can create false matches. If a region has cameras but no sparse points, training may still render it poorly because initialization is weak.

  • Registered image ratio should be high enough to cover every important area.
  • Camera path should match how you moved during capture.
  • Sparse points should not form a flat sheet unless the scene is actually flat.
  • There should be no large disconnected component unless you plan to train separate scenes.
  • Check that image orientation, scale, and up direction are not wildly wrong.

Fixing Common Alignment Failures

If too few images register, first remove blurry frames and near-duplicates, then try a different matching strategy. If the dataset is video-derived, extract fewer, sharper frames. If a scene has repeated walls or windows, add images that connect distinctive features, or use masks to suppress moving/background clutter in object captures.

If cameras register but the reconstruction bends or duplicates surfaces, inspect camera model and distortion. Wide-angle phones, action cameras, and 360 cameras often need special handling. If you process 360 imagery as normal perspective photos without conversion or correct camera type, poses can be unstable and downstream training will inherit the error.

  • Blur failure: remove bad frames, extract cleaner frames, reshoot if necessary.
  • Low texture failure: add angled close views of textured regions or use depth/SLAM-assisted capture.
  • Repeated pattern failure: add bridge photos with unique context and use guided matching carefully.
  • Object-on-table failure: mask background or use a turntable workflow with consistent framing.
  • 360 failure: convert to rectilinear views or use a pipeline that supports equirectangular cameras.

Exporting For Trainers

Different trainers expect different dataset structures. The original Inria implementation expects COLMAP sparse data and images in a familiar folder layout. Nerfstudio creates a processed dataset and transforms metadata through ns-process-data. gsplat examples can fit a COLMAP capture directly. OpenSplat can read several formats, including COLMAP, OpenSfM, ODM, and nerfstudio-style inputs.

Keep both the raw alignment result and the processed training folder. If training fails, you may need to revisit camera registration, image downscaling, distortion, or sparse point initialization. A clean handoff makes debugging possible without rerunning every step from scratch.

  • Save original images separately from resized or undistorted images.
  • Keep COLMAP sparse model, database, and any exported transforms file.
  • Document which images failed to register and whether they were removed.
  • If using Metashape or RealityCapture, export camera parameters in a format your trainer can convert.

Common Failure Modes

  • Training from bad poses usually wastes time; visual artifacts often come from alignment, not the trainer.
  • Changing image order or filenames after processing can break dataset references.
  • Mixed focal lengths and automatic digital zoom can confuse camera calibration.
  • Downscaling too early can remove features needed for matching.
  • A visually dense sparse cloud can still be wrong if cameras are bent, flipped, or split.

Handoff To The Next Step

  • Provide registered camera poses, intrinsics, undistorted images, and sparse points.
  • Keep a note with registered image count and any removed frames.
  • Confirm coordinate scale and orientation if the next stage needs metric or engine placement.
  • Use a trainer-compatible folder layout, such as COLMAP sparse plus images or nerfstudio processed data.
  • Only move to training when alignment covers the important scene regions.

Reference Tutorials And Docs

These sources were used as research input. The guide above is written as a consolidated 3DGS workflow rather than copied from any single tutorial.