MIM4D: How Self-Supervised 4D Learning Revolutionizes Autonomous Driving Perception

2 days ago 高效码农

MIM4D: Masked Multi-View Video Modeling for Autonomous Driving Representation Learning Why Autonomous Driving Needs Better Visual Representation Learning? In autonomous driving systems, multi-view video data captured by cameras forms the backbone of environmental perception. However, current approaches face two critical challenges: Dependency on Expensive 3D Annotations: Traditional supervised learning requires massive labeled 3D datasets, limiting scalability. Ignored Temporal Dynamics: Single-frame or monocular methods fail to capture motion patterns in dynamic scenes. MIM4D (Masked Modeling with Multi-View Video for Autonomous Driving) introduces an innovative solution. Through dual-path masked modeling (spatial + temporal) and 3D volumetric rendering, it learns robust geometric representations …