Detailed Reading
FRoG belongs to the canonical-field family of dynamic 3DGS methods. These methods keep Gaussian attributes in a canonical space and use a deformation field to transform them over time. The approach is powerful, but it can become slow, over-dependent on initialization, and unstable when lighting or texture gives weak gradients.
The first contribution is an embedding strategy for faster dynamic rendering. Per-Gaussian embeddings and coarse-to-fine temporal embeddings give the deformation model a more direct handle on time. Early fusion of temporal embeddings reduces the cost of repeatedly computing dynamic attributes and helps the network learn motion at different temporal scales.
The second contribution addresses sparse or poor initialization. If the canonical field starts with missing support, the deformation field has to compensate by moving the wrong primitives, which increases optimization difficulty. Depth- and error-guided sampling inserts new Gaussians at low-deviation positions where the current model needs capacity, reducing the burden on deformation.
The third contribution targets local optima in dim scenes. Low light and weak texture can make color and opacity updates misleading, so the method modulates opacity variation to avoid bad explanations that trap optimization. This is a useful reminder that dynamic reconstruction failures often come from the coupling of geometry, appearance, and visibility, not from motion alone.
FRoG is valuable because it improves both speed and robustness in a setting where many methods report good quality but are fragile in real scenes. It is particularly relevant when initial point clouds are sparse, scenes have static and dynamic detail mixed together, or lighting makes photometric optimization unreliable.
The limitation is that it still inherits the canonical deformation assumption. Large topology changes, severe occlusion, or highly non-rigid motion can remain difficult. The paper should be read as a practical strengthening of deformable 3DGS rather than a complete replacement for all dynamic-scene representations.