SpatialTree: Decoding the Hidden Hierarchy of Spatial Intelligence in AI

6 hours ago 高效码农

SpatialTree: How Spatial Abilities Hierarchically Develop in Multimodal LLMs Have you ever wondered how AI perceives the size of objects, judges distances, or predicts movement when looking at an image? In cognitive science, human spatial ability develops progressively—from basic perception to complex reasoning and real-world interaction. Yet for multimodal large language models (MLLMs), this hierarchical structure has long been poorly understood, with most research focusing on isolated tasks rather than the bigger picture. Today, we’ll explore SpatialTree—a cognitive science-inspired framework that organizes AI’s spatial abilities into four distinct layers. It also introduces the first capability-centric hierarchical benchmark, allowing us to …