MIT’s ‘RL’s Razor’ Reveals Why Reinforcement Learning Fine-Tuning Beats SFT in Knowledge Retention

6 hours ago 高效码农

Why Reinforcement Learning Fine-Tuning Forgets Less: Inside MIT’s “RL’s Razor” What makes RL forget less than supervised fine-tuning? It stays closest to the original model in KL-divergence on the new task—every update is a small, on-policy re-weighting rather than a lunge toward an arbitrary label distribution. 1 The Catastrophic-Forgetting Pain Is Still Real One-sentence takeaway Foundation models learn new tricks quickly, but they also lose old ones—unless you train with on-policy RL. Summary Post-training is now the default path to adapt large models. Supervised Fine-Tuning (SFT) is easy to implement but notorious for erasing prior capabilities. Previous remedies (weight regularizers, …