# CLaRa: Teaching a Language Model to Compress, Retrieve, and Answer in One Breath How to shrink Wikipedia 128× and still beat full-text baselines—without ever labeling “relevant” documents. ## TL;DR CLaRa (Continuous Latent Reasoning) unifies retrieval and generation inside a single LLM by: Offline-compressing every document into 32–256 “memory tokens”; Learning to retrieve with a differentiable top-k operator; Training everything end-to-end with nothing more than next-token prediction loss. On four open QA data sets the framework matches or outperforms full-text RAG while using 1–2 % of the usual context length. ## Table of Contents The Two Walls Hitting Every RAG …
“ In an era where AI models are ballooning to trillions of parameters, a model smaller than two smartphone photos is defeating giants like DeepSeek-R1 and Gemini 2.5 Pro in the ARC-AGI challenge. “Is bigger always better?” This question has lingered in artificial intelligence for years. While major tech companies race to release increasingly larger models, Samsung SAIL Montreal’s Alexia Jolicoeur-Martineau took the opposite path. Her Tiny Recursive Model (TRM) uses just 7 million parameters—smaller than many image classification models—yet achieves 45% accuracy on ARC-AGI-1 and 8% on the more challenging ARC-AGI-2, outperforming competitors with thousands of times more parameters. …