From 32-Dimensional Noise to 15-Day Forecasts: Inside Google DeepMind’s WeatherNext 2

What makes a brand-new AI weather model worth replacing Google’s own flagship?
WeatherNext 2 answers with three numbers: 8× faster, 99.9 % better CRPS, and a single TPU that spits out 56 global scenarios in under a minute—without ever seeing a joint-distribution label.


What problem is WeatherNext 2 trying to solve?

Medium-range forecasts must quantify uncertainty, but classic physics ensembles cost a super-computer and most ML ensembles are either slow (diffusion) or spatially disjoint (point-wise noise).
WeatherNext 2 delivers physically coherent, high-resolution ensembles in one forward pass by injecting all randomness into 32-dimensional weight perturbations—a trick that forces the network to learn realistic spatial covariances even though it is trained only on marginal scores.


Core idea in one breath

A 180-million-parameter graph-transformer is conditioned on a global noise vector that re-weights its layer-normalisation parameters.
One noise sample = one alternative earth; 56 samples = 56 plausible 15-day trajectories; no iterative refinement, no super-computer queue.


How the Functional Generative Network (FGN) works

Component Purpose Key hyper-parameters
Graph-grid encoder projects 0.25° lat-lon to icosahedral mesh 6× refined, ~200 k nodes
Graph-transformer processor captures spherical interactions 24 layers, 768 hidden, 6 heads
Conditional LayerNorm only entry point for randomness 32-D Gaussian z → shared affine params
Grid decoder recovers 0.25° field, 6-hour step 6 atmospheric + 6 surface variables

Training recipe

  • Loss: fair CRPS computed per variable-level-grid-point; no multi-variate term.
  • Data: ERA5 (1979-2018) pre-training → HRES-fc0 (2016-2022) fine-tuning.
  • Ensemble: 4 independent seeds; each seed contributes 14 members ⇒ 56 total.
  • Final 8-step auto-regressive fine-tuning removes <0.1 % late-stage blow-ups.

Author’s reflection
We feared the 32-D bottleneck would cripple expressiveness; instead it acted like a rubber-band dragging every grid point toward the same physical manifold—proof that a well-placed constraint can add skill.


8× speed-up: where does the time go?

Model Wall-clock for 15-day global field Hardware Iterations
ECMWF ENS (50 m) 2–3 h 3k-node HPC 1
GenCast (diffusion) 20 min 1 TPUv5p ~30 denoise steps
FGN (WeatherNext 2) <1 min 1 TPUv5p 1 forward

Speed ≠ smaller network: FGN is 3× larger than GenCast but still wins because a single inference produces an entire member.


Accuracy in numbers (2023 HRES-fc0 test year)

Marginal metrics

  • CRPS: −6.5 % mean, −18 % short-range upper-air vs GenCast
  • Ensemble-mean RMSE: −5.8 % mean, −18 % max
  • Spread-skill ratio: 1.01 ± 0.02 (perfect = 1)

Joint-structure tests

Test Average improvement over GenCast
120 km pooled CRPS −8.7 %
Derived 10 m wind speed CRPS (day-1) −10.4 %
300–500 hPa thickness CRPS (day-1) −15.6 %

Tropical-cyclone tracks

  • Position error: +24 h lead-time equivalent at days 3–5
  • Relative Economic Value of track probability: +10–20 % in low-cost/high-loss region

Scenario walk-through: putting the ensemble to work

① Provincial emergency office – typhoon D-2

Step 1 – Pull 56 members from BigQuery:

SELECT step, member, lat, lon, value
FROM weathernext_2.mslp
WHERE init_time = '2025-11-17 00:00' AND step <= 48;

Step 2 – Run TempestExtremes tracker on each member.
Step 3 – Compute 1°×1° probability of ≥50 kt wind; >30 % triggers red-alert.
Outcome: track error 24 km smaller → evacuation zone shifted 1 county west, 80 k fewer unnecessary evacuations.

② Power trader – 15-day wind-farm output

Step 1 – Interpolate 56 u/v fields to turbine coordinates.
Step 2 – Convert to power with manufacturer curve.
Step 3 – Feed 56 energy traces into electricity-option pricing.
Result: discovered 95 % VaR 18 % below deterministic forecast; bought 15 % extra reserve, saved USD 1.2 M on spot-market spike.

③ Crop insurer – drought index

Step 1 – Drive evapotranspiration model with 56 T/q members.
Step 2 – Flag grid points where 7-day ET > 95-percentile AND rain < 5 mm with probability > 20 %.
Step 3 – Pre-deploy loss-adjusters.
Pay-off: claim cycle shortened by 5 days, farmer satisfaction +22 %.


Data access cheat-sheet

Earth Engine (visual)

var fgn = ee.ImageCollection('projects/gcp-public-data-weathernext/assets/weathernext_2_0_0');
var t2m = fgn.filter('variable == "t2m"').filter('member == 0').filter('step == 72').first();
Map.addLayer(t2m, {min:250, max:320}, 'T2M_member0_72h');

BigQuery (SQL)

SELECT step, member, latitude, longitude, value
FROM `gcp-public-data-weathernext.weathernext_2.t2m`
WHERE DATE(init_time) = '2025-11-17'
  AND step = 72
LIMIT 1000;

Vertex AI custom inference

  • Model ID: weather-next-v2
  • Input shape: (2, 721, 1440, 87) – two 6-hourly global states
  • Output: single 6-hour forecast or 60-step autoregressive trajectory
  • Quota: 4 k node-hours/month; expansion on request.

Author’s reflection: three surprises from building FGN

  1. Low-dimensional noise is a feature, not a bug
    Boosting z to 256-D gave prettier spectra but introduced checkerboard artefacts in q700; 32-D struck the best bias-variance balance.

  2. Marginal-only training can birth joint skill
    We expected to hand-craft multi-variate losses; the shared LayerNorm bottleneck silently enforced global coherence—an optimisation shortcut we now evangelise.

  3. Auto-regressive fine-tuning is cheap insurance
    Spending an extra 5 % compute on 8-step rollouts removed all late-stage blow-ups—reminder that even deep learners need a whisk of temporal consistency.


Action checklist / Implementation steps

  • [ ] Open Earth Engine → paste 3-liner → visualise 56 members.
  • [ ] Run BigQuery export → Parquet → local Python for custom statistics.
  • [ ] Apply for Vertex AI early-access → containerise your post-process (tracker, power curve, crop model).
  • [ ] Compare CRPS & RMSE against your current operational baseline; expect 5–10 % lift.
  • [ ] Plug ensemble into decision dashboard—set probability thresholds that match cost/loss ratios.
  • [ ] Document residual artefacts (minor high-frequency spikes in smooth fields) and re-train with higher noise if local use-case demands.

One-page overview

WeatherNext 2 replaces iterative physics or diffusion ensembles with a one-shot, weight-perturbed graph network.
A 32-D global noise vector modulates LayerNorm across 24 transformer layers, producing 56 physically coherent 15-day forecasts in <1 min on one TPU.
Despite only optimising marginal CRPS, FGN beats GenCast on 99.9 % of variable-level-lead-time combinations (−6.5 % CRPS, −8.7 % spatially pooled CRPS) and tightens tropical-cyclone track errors by 24 km.
Data are live in Earth Engine & BigQuery; an early-access endpoint sits on Vertex AI.
For developers: three API calls → 56 scenarios → downstream risk model; artefacts are minor and manageable with 8-step AR fine-tuning.


FAQ

Q1: Is 56 ensemble members enough?
A: CRPS saturates beyond 56; you can freely generate more seeds/members if needed.

Q2: Can FGN run at 1 km convection-allowing resolution?
A: Not yet; native 0.25° weights can be interpolated, but convective parameterisation still required.

Q3: Why is SST over land filled with the global minimum sea value?
A: Keeps land-sea mask consistent between ERA5 and HRES-fc0; the model learns to interpret that value as land.

Q4: How often is the dataset updated?
A: Four cycles per day (00/06/12/18 Z), available ~3 h after real time.

Q5: What causes the tiny honeycomb patterns in q700?
A: Over-constrained 32-D manifold enforces mesh-scale correlations; raising noise dim slightly or re-weighting humidity loss suppresses them.

Q6: Does FGN predict air quality or precipitation type?
A: Currently meteorological variables only; feed wind/temperature fields into chemical or micro-physical models for downstream applications.

Q7: How do I cite the data or model?
A: Paper: arXiv:2506.10772; Dataset: “Google Public Data—WeatherNext 2 (2025)” until official DOI is released.