Pipeline Tutorial

Capture Tutorial For 3D Gaussian Splatting

How to shoot source images or video that actually align, train, and render well in a 3DGS pipeline.

Step 1: CapturePractical workflow

Tutorial Scope

What This Page Covers

Capture quality determines almost everything downstream. A better trainer can reduce artifacts, but it cannot recover surfaces that were never seen, frames that are blurred, or lighting changes that confuse camera matching.

This guide covers object capture, rooms, outdoor scenes, drone paths, mobile apps, 360 cameras, and video extraction. The goal is to produce clean overlap, stable exposure, enough parallax, and a dataset that the alignment stage can trust.

Decide What Kind Of Scene You Are Capturing

Object, room, street, landscape, and drone captures need different movement patterns. A small object benefits from multiple orbits at different heights. A room needs wall passes, corner passes, and coverage of floors, ceilings, and occluded furniture sides. A large outdoor scene needs broad baseline variation and often oblique views rather than only front-facing photos.

Before shooting, mark what must survive in the final splat. If the final viewer needs a clean front facade, put more images around edges, windows, vegetation, and reflective areas. If the final output is a walkable room, spend extra time on doorways, corners, glossy floors, under tables, and transition spaces between rooms.

  • Small object: 60 to 120 photos, three orbits, plus high and low angles.
  • Single room: 150 to 300 photos or clean extracted frames, with perimeter and center passes.
  • Building exterior: multiple loops at different distances, plus side and elevated oblique views.
  • Drone scene: combine nadir, oblique, and lower side passes when safe and legal.

Photos Versus Video

Still photos are usually the safest choice because they are sharper, higher resolution, and easier to inspect. Video can be faster, but it often introduces rolling shutter, compression, motion blur, and too many near-duplicate frames. If you use video, record at high frame rate, move slowly, and extract frames at a rate that keeps overlap without flooding COLMAP with redundant images.

For video extraction, do not simply take every frame. You want frames that differ enough to add parallax but still overlap heavily with neighbors. A practical starting point is one to three frames per second for a slow walk, then reduce or increase based on scene size and motion. When the camera turns quickly, discard frames from the turn because they often contain blur.

  • Use photos for final-quality object, product, architecture, and research captures.
  • Use video for quick rooms, creator workflows, or when mobile apps expect it.
  • Avoid panorama mode because stitched panoramas break normal SfM assumptions.
  • Avoid shallow depth of field because blurred backgrounds create weak feature matches.

Lighting, Exposure, And Surface Preparation

The alignment stage wants stable visual features; the training stage wants consistent colors. Lock exposure and white balance when you can. If the camera changes exposure every few frames, the trainer may bake the lighting change into Gaussians and produce flicker, color bands, or floaters.

Soft, even lighting is usually better than dramatic lighting. Harsh shadows move as you walk and can become false texture. Reflections are not always fatal for 3DGS, but mirror-like surfaces still need many angles and stable surroundings. For small objects, remove cluttered backgrounds or use masks later if the background dominates feature matching.

  • Keep ISO low and shutter speed high enough to avoid motion blur.
  • Use overcast outdoor light or large diffuse indoor light when possible.
  • Do not move objects, chairs, curtains, or doors during capture.
  • Avoid people, pets, screens, cars, and water unless they are meant to be part of the scan.

Movement Patterns That Work

The golden rule is overlap plus parallax. Overlap means two neighboring images see the same features. Parallax means the camera moved enough that the alignment system can recover depth. Rotating in place gives overlap but weak parallax; walking too far between shots gives parallax but poor overlap. Good capture lives between those extremes.

For objects, start with a base orbit where the full subject stays in frame. Then add a higher orbit, a lower orbit, and close detail passes. For rooms, walk the perimeter first, then add a center pass and point the camera into corners and behind furniture. For buildings, combine near detail passes with wider context views so COLMAP can connect local details into one global model.

  • Aim for 60 to 80 percent overlap between neighboring images.
  • Pause briefly before each still photo if shooting handheld.
  • Keep the camera path continuous; avoid teleporting between disconnected clusters.
  • Add bridge photos when moving from one room or facade side to another.

Mobile App And 360 Capture Notes

Apps such as Scaniverse, Polycam, KIRI Engine, Luma AI, and RealityScan can hide parts of the pipeline, but the same capture physics still apply. Slow motion, complete coverage, and stable light matter more than the brand of app. If the app can export PLY, SPLAT, SPZ, or raw posed images, keep the highest-quality source as your archive.

360 cameras are useful for rooms and fast coverage, but equirectangular images often need special handling. Some pipelines split panoramas into rectilinear views; others can process equirectangular images directly. Avoid putting the tripod, operator, or black camera nadir region in the final field of view, or be ready to mask/crop those areas before alignment.

  • For mobile cloud apps, validate the exported splat in an independent viewer before deleting source captures.
  • For LiDAR or depth-assisted apps, keep the raw posed capture if you plan to train in Nerfstudio later.
  • For 360 capture, use slow movement and stable camera height, then extract enough rectilinear views for COLMAP.

Before You Leave The Location

Always do a field review before packing up. Look for motion blur, missing backsides, unreadable dark areas, and areas only visible from one angle. If you see a hole in the capture path, shoot a small bridge sequence from a known area into the missing region.

The most expensive capture mistake is discovering missing coverage after the location is unavailable. A five-minute validation pass can save hours of alignment and training. When in doubt, shoot extra views with stable light instead of hoping the trainer will invent the missing detail.

  • Review thumbnails for blur, accidental floor/sky shots, and exposure jumps.
  • Check that thin objects, corners, and occluded sides have multiple angles.
  • Keep original images, extracted frames, app exports, and any camera metadata together.
  • Write a short note about camera, lens, app, frame extraction rate, and scene constraints.

Common Failure Modes

  • Too many near-duplicate video frames slow matching and can still fail to add useful parallax.
  • Fast handheld motion creates blur that looks acceptable in a video but fails as reconstruction input.
  • Disconnected capture clusters can align separately and never merge into one scene.
  • Auto exposure and moving shadows create color changes that become training artifacts.
  • Transparent, mirror, water, and screen surfaces require extra angles and still may need cleanup.

Handoff To The Next Step

  • Deliver a folder of curated images or extracted frames with obvious bad frames removed.
  • Keep filenames in capture order when possible; sequence can help tools and humans debug.
  • Record camera type, focal length if fixed, video FPS, extraction rate, and any app export settings.
  • If using mobile apps, export both the splat and the raw or posed capture when available.
  • Proceed to alignment only after the image set has coverage, sharpness, and lighting consistency.

Reference Tutorials And Docs

These sources were used as research input. The guide above is written as a consolidated 3DGS workflow rather than copied from any single tutorial.