The Temporal Trap: Entanglement in Pre-Trained Visual Representations for Visuomotor Policy Learning

The Discovery: We identify Temporal Entanglement as a critical flaw in Pre-Trained Visual Representations (PVRs), where models fundamentally fail to perceive task progression, leading to poor policy performance.

The Evidence: We show that traditional fixes (i.e., frame stacking, latent flow (FLARE), Causal Transformers, and even Video-PVRs) are insufficient to resolve this issue. By introducing a simple timestamp baseline that drastically outperforms these sophisticated methods, we prove that current models lack the essential temporal disentanglement required for robust robotic control.

BibTex

@inproceedings{tsagkas2025temporaltrapentanglementpretrained,
    title={The Temporal Trap: Entanglement in Pre-Trained Visual Representations for Visuomotor Policy Learning},
    author={Tsagkas, Nikolaos and Sochopoulos, Andreas and Danier, Duolikun and Xiaoxuan Lu, Chris and Mac Aodha, Oisin},
    booktitle={International Conference on Robotics and Automation (ICRA)},
    year={2026}          
    }
Copied to clipboard!