The Temporal Trap: Entanglement in Pre-Trained Visual Representations for Visuomotor Policy Learning
TL;DR
The Discovery: We identify Temporal Entanglement as a critical flaw in Pre-Trained Visual Representations (PVRs), where models fundamentally fail to perceive task progression, leading to poor policy performance.
The Evidence: We show that traditional fixes (i.e., frame stacking, latent flow (FLARE), Causal Transformers, and even Video-PVRs) are insufficient to resolve this issue. By introducing a simple timestamp baseline that drastically outperforms these sophisticated methods, we prove that current models lack the essential temporal disentanglement required for robust robotic control.
BibTex
@inproceedings{tsagkas2025temporaltrapentanglementpretrained,
title={The Temporal Trap: Entanglement in Pre-Trained Visual Representations for Visuomotor Policy Learning},
author={Tsagkas, Nikolaos and Sochopoulos, Andreas and Danier, Duolikun and Xiaoxuan Lu, Chris and Mac Aodha, Oisin},
booktitle={International Conference on Robotics and Automation (ICRA)},
year={2026}
}
Updated December 2025