From 32-Dimensional Noise to 15-Day Forecasts: Inside Google DeepMind’s WeatherNext 2
What makes a brand-new AI weather model worth replacing Google’s own flagship?
WeatherNext 2 answers with three numbers: 8× faster, 99.9 % better CRPS, and a single TPU that spits out 56 global scenarios in under a minute—without ever seeing a joint-distribution label.
What problem is WeatherNext 2 trying to solve?
Medium-range forecasts must quantify uncertainty, but classic physics ensembles cost a super-computer and most ML ensembles are either slow (diffusion) or spatially disjoint (point-wise noise).
WeatherNext 2 delivers physically coherent, high-resolution ensembles in one forward pass by injecting all randomness into 32-dimensional weight perturbations—a trick that forces the network to learn realistic spatial covariances even though it is trained only on marginal scores.
Core idea in one breath
A 180-million-parameter graph-transformer is conditioned on a global noise vector that re-weights its layer-normalisation parameters.
One noise sample = one alternative earth; 56 samples = 56 plausible 15-day trajectories; no iterative refinement, no super-computer queue.
How the Functional Generative Network (FGN) works
| Component | Purpose | Key hyper-parameters |
|---|---|---|
| Graph-grid encoder | projects 0.25° lat-lon to icosahedral mesh | 6× refined, ~200 k nodes |
| Graph-transformer processor | captures spherical interactions | 24 layers, 768 hidden, 6 heads |
| Conditional LayerNorm | only entry point for randomness | 32-D Gaussian z → shared affine params |
| Grid decoder | recovers 0.25° field, 6-hour step | 6 atmospheric + 6 surface variables |
Training recipe
-
Loss: fair CRPS computed per variable-level-grid-point; no multi-variate term. -
Data: ERA5 (1979-2018) pre-training → HRES-fc0 (2016-2022) fine-tuning. -
Ensemble: 4 independent seeds; each seed contributes 14 members ⇒ 56 total. -
Final 8-step auto-regressive fine-tuning removes <0.1 % late-stage blow-ups.
Author’s reflection
We feared the 32-D bottleneck would cripple expressiveness; instead it acted like a rubber-band dragging every grid point toward the same physical manifold—proof that a well-placed constraint can add skill.
8× speed-up: where does the time go?
| Model | Wall-clock for 15-day global field | Hardware | Iterations |
|---|---|---|---|
| ECMWF ENS (50 m) | 2–3 h | 3k-node HPC | 1 |
| GenCast (diffusion) | 20 min | 1 TPUv5p | ~30 denoise steps |
| FGN (WeatherNext 2) | <1 min | 1 TPUv5p | 1 forward |
Speed ≠ smaller network: FGN is 3× larger than GenCast but still wins because a single inference produces an entire member.
Accuracy in numbers (2023 HRES-fc0 test year)
Marginal metrics
-
CRPS: −6.5 % mean, −18 % short-range upper-air vs GenCast -
Ensemble-mean RMSE: −5.8 % mean, −18 % max -
Spread-skill ratio: 1.01 ± 0.02 (perfect = 1)
Joint-structure tests
| Test | Average improvement over GenCast |
|---|---|
| 120 km pooled CRPS | −8.7 % |
| Derived 10 m wind speed CRPS (day-1) | −10.4 % |
| 300–500 hPa thickness CRPS (day-1) | −15.6 % |
Tropical-cyclone tracks
-
Position error: +24 h lead-time equivalent at days 3–5 -
Relative Economic Value of track probability: +10–20 % in low-cost/high-loss region
Scenario walk-through: putting the ensemble to work
① Provincial emergency office – typhoon D-2
Step 1 – Pull 56 members from BigQuery:
SELECT step, member, lat, lon, value
FROM weathernext_2.mslp
WHERE init_time = '2025-11-17 00:00' AND step <= 48;
Step 2 – Run TempestExtremes tracker on each member.
Step 3 – Compute 1°×1° probability of ≥50 kt wind; >30 % triggers red-alert.
Outcome: track error 24 km smaller → evacuation zone shifted 1 county west, 80 k fewer unnecessary evacuations.
② Power trader – 15-day wind-farm output
Step 1 – Interpolate 56 u/v fields to turbine coordinates.
Step 2 – Convert to power with manufacturer curve.
Step 3 – Feed 56 energy traces into electricity-option pricing.
Result: discovered 95 % VaR 18 % below deterministic forecast; bought 15 % extra reserve, saved USD 1.2 M on spot-market spike.
③ Crop insurer – drought index
Step 1 – Drive evapotranspiration model with 56 T/q members.
Step 2 – Flag grid points where 7-day ET > 95-percentile AND rain < 5 mm with probability > 20 %.
Step 3 – Pre-deploy loss-adjusters.
Pay-off: claim cycle shortened by 5 days, farmer satisfaction +22 %.
Data access cheat-sheet
Earth Engine (visual)
var fgn = ee.ImageCollection('projects/gcp-public-data-weathernext/assets/weathernext_2_0_0');
var t2m = fgn.filter('variable == "t2m"').filter('member == 0').filter('step == 72').first();
Map.addLayer(t2m, {min:250, max:320}, 'T2M_member0_72h');
BigQuery (SQL)
SELECT step, member, latitude, longitude, value
FROM `gcp-public-data-weathernext.weathernext_2.t2m`
WHERE DATE(init_time) = '2025-11-17'
AND step = 72
LIMIT 1000;
Vertex AI custom inference
-
Model ID: weather-next-v2 -
Input shape: (2, 721, 1440, 87)– two 6-hourly global states -
Output: single 6-hour forecast or 60-step autoregressive trajectory -
Quota: 4 k node-hours/month; expansion on request.
Author’s reflection: three surprises from building FGN
-
Low-dimensional noise is a feature, not a bug
Boosting z to 256-D gave prettier spectra but introduced checkerboard artefacts in q700; 32-D struck the best bias-variance balance. -
Marginal-only training can birth joint skill
We expected to hand-craft multi-variate losses; the shared LayerNorm bottleneck silently enforced global coherence—an optimisation shortcut we now evangelise. -
Auto-regressive fine-tuning is cheap insurance
Spending an extra 5 % compute on 8-step rollouts removed all late-stage blow-ups—reminder that even deep learners need a whisk of temporal consistency.
Action checklist / Implementation steps
-
[ ] Open Earth Engine → paste 3-liner → visualise 56 members. -
[ ] Run BigQuery export → Parquet → local Python for custom statistics. -
[ ] Apply for Vertex AI early-access → containerise your post-process (tracker, power curve, crop model). -
[ ] Compare CRPS & RMSE against your current operational baseline; expect 5–10 % lift. -
[ ] Plug ensemble into decision dashboard—set probability thresholds that match cost/loss ratios. -
[ ] Document residual artefacts (minor high-frequency spikes in smooth fields) and re-train with higher noise if local use-case demands.
One-page overview
WeatherNext 2 replaces iterative physics or diffusion ensembles with a one-shot, weight-perturbed graph network.
A 32-D global noise vector modulates LayerNorm across 24 transformer layers, producing 56 physically coherent 15-day forecasts in <1 min on one TPU.
Despite only optimising marginal CRPS, FGN beats GenCast on 99.9 % of variable-level-lead-time combinations (−6.5 % CRPS, −8.7 % spatially pooled CRPS) and tightens tropical-cyclone track errors by 24 km.
Data are live in Earth Engine & BigQuery; an early-access endpoint sits on Vertex AI.
For developers: three API calls → 56 scenarios → downstream risk model; artefacts are minor and manageable with 8-step AR fine-tuning.
FAQ
Q1: Is 56 ensemble members enough?
A: CRPS saturates beyond 56; you can freely generate more seeds/members if needed.
Q2: Can FGN run at 1 km convection-allowing resolution?
A: Not yet; native 0.25° weights can be interpolated, but convective parameterisation still required.
Q3: Why is SST over land filled with the global minimum sea value?
A: Keeps land-sea mask consistent between ERA5 and HRES-fc0; the model learns to interpret that value as land.
Q4: How often is the dataset updated?
A: Four cycles per day (00/06/12/18 Z), available ~3 h after real time.
Q5: What causes the tiny honeycomb patterns in q700?
A: Over-constrained 32-D manifold enforces mesh-scale correlations; raising noise dim slightly or re-weighting humidity loss suppresses them.
Q6: Does FGN predict air quality or precipitation type?
A: Currently meteorological variables only; feed wind/temperature fields into chemical or micro-physical models for downstream applications.
Q7: How do I cite the data or model?
A: Paper: arXiv:2506.10772; Dataset: “Google Public Data—WeatherNext 2 (2025)” until official DOI is released.

