When Residual Connections Go Rogue: How We Tamed Hyper-Connections with Geometry Hyper-Connections promised better performance but delivered training instability. Manifold-Constrained Hyper-Connections fix this by forcing residual mappings onto the Birkhoff polytope, restoring stability while preserving all performance gains with only 6.7% overhead. Introduction: The Hidden Cost of Wider Residual Streams What happens when you try to increase a model’s capacity by widening its residual connections without adding constraints? You get unpredictable signal explosions that crash training runs. We learned this the hard way while training a 27-billion parameter model. For a decade, residual connections have been the quiet heroes of …