Detailed Reading
SharpTimeGS is built around a simple but powerful observation: not every Gaussian in a dynamic scene should live for the same amount of time. Background walls, floors, and static objects may persist throughout a sequence, while hands, cloth folds, or moving objects may only need short intervals of support. Treating them with one temporal decay model creates either drift in static regions or insufficient detail in dynamic regions.
The paper introduces a learnable lifespan parameter for each primitive. Instead of using a Gaussian-shaped temporal decay where visibility quickly fades around a center time, it reformulates temporal visibility with a flat-top profile. A primitive can stay consistently active over its intended interval, reducing redundant densification and avoiding flicker around long-lived content.
Lifespan also modulates motion. Long-lived primitives should not drift just because the deformation field has freedom to move them, while short-lived primitives should remain flexible enough to capture rapid motion. By decoupling motion magnitude from temporal duration, the method reduces background instability without freezing genuinely dynamic regions.
The densification strategy uses both lifespan and velocity. Regions with pronounced motion receive more capacity, while stable areas remain compact. This is a more temporally aware version of the original 3DGS density control idea: add primitives where the sequence needs explanatory power, not just where a single image has high error.
The paper is important because it models time as a property of primitives, not only as an input to a deformation network. That makes the representation more interpretable: a Gaussian has a spatial role, an appearance role, a motion role, and a temporal existence interval. This is a useful mental model for future 4DGS compression and editing systems.
Its limitation is that lifespan learning introduces another coupled variable into an already complex optimization. Incorrect lifespans can hide motion or create popping, and very complex topology changes may still need richer models. The contribution is valuable because it directly targets the stability-sharpness tradeoff that users notice in dynamic playback.