Galileo: One Model to Map the World

A practical guide to the open-source, all-in-one remote-sensing foundation model


Table of Contents

  1. Why another remote-sensing model?
  2. What Galileo can “see”
  3. Inside the model — building blocks made simple
  4. How Galileo teaches itself without labels
  5. The 127 155 training scenes that keep Galileo honest
  6. Benchmarks that matter — 11 tasks, one winner
  7. Quick start: load, run and fine-tune in minutes
  8. Frequently asked questions

1. Why another remote-sensing model?

Remote sensing is noisy.
Images arrive in different wavelengths, resolutions and schedules.
Objects of interest range from a two-pixel fishing boat to a thousand-pixel glacier.
Most pretrained models are locked to a single sensor or a single task.

Galileo was built to solve exactly this fragmentation.

  • One checkpoint handles optical, radar, elevation, weather, night-lights and more.
  • One architecture digests single images, image time-series or pure pixel time-series.
  • One training recipe teaches global context and local detail at the same time.

The result: state-of-the-art results on 11 public benchmarks without hand-crafted rules or extra labels.


2. What Galileo can “see”

Data type Source example Spatial resolution Temporal cadence
Multispectral imagery Sentinel-2 (all bands except B1, B9, B10) 10 m 5-day revisit
Synthetic-aperture radar Sentinel-1 (VV, VH) 10 m 6-day revisit
Vegetation index Sentinel-2 NDVI 10 m 5-day
Elevation & slope SRTM 30 m DEM resampled 10 m static
Land-cover probability Dynamic World 10 m 10 m monthly aggregate
Crop-type map World Cereal 10 m 10 m seasonal
Weather ERA5 precipitation & temperature 0.25° daily
Climate water balance TerraClimate soil moisture & evapotranspiration 4 km monthly
Night-time lights VIIRS 500 m monthly
Population density LandScan 1 km annual
Location meta-data Latitude, longitude, month index

You can feed Galileo any subset of the above. Missing bands are simply zero-filled; the model still works.


3. Inside the model — building blocks made simple

3.1 From pixels to tokens

Think of an image as a Lego board.
Galileo chops the board into small square bricks (patches).
Each brick is 10 m × 10 m in the real world and may contain several wavelengths or months of data.

  • Spatial patchesH/P × W/P tokens
  • Time steps → stacked along the sequence
  • Channel groups → RGB, red-edge, SAR, weather variables, etc.

This design allows the same code to ingest:

  • a single 64 × 64 image,
  • a 96 × 96 tile with 24 months, or
  • a single pixel measured 24 times.

3.2 Three model sizes

Model Parameters GPU hours (H100) Everyday analogy
Nano 0.8 M 200 Runs on a modern laptop
Tiny 5.3 M 259 Runs on a desktop GPU
Base 85 M 573 Runs on a server or cloud

Choose the trade-off between accuracy and compute that matches your project budget.


4. How Galileo teaches itself without labels

Galileo uses self-supervised learning — no human labels during pre-training.

4.1 Two complementary tasks

Task Masking style Target depth Best suited for
Global Mask large spatial or temporal chunks Deep transformer layer Image-level classification
Local Randomly mask 5 % of tokens Linear projection (shallow) Pixel-level segmentation

Global task in plain words

  1. Hide a big rectangle of the image.
  2. Ask the model to guess what should be there using the rest of the scene.
  3. Reward answers that are close to a “moving-average” version of the model itself.

Local task in plain words

  1. Hide random small patches.
  2. Ask the model to reproduce the exact pixel values.
  3. Penalise answers that look like any other random patch.

Because the two tasks alternate every batch, Galileo learns both context and detail without ever seeing a labelled image.


5. The 127 155 training scenes that keep Galileo honest

5.1 Sampling the planet

  1. Divide the global 10 m land-cover map into 10 km × 10 km tiles.
  2. Compute two features for each tile:

    • pixel-level class histogram
    • latitude/longitude centroid
  3. Run k-means (k = 150 000) on these features.
  4. Download the tile nearest to each centroid.
  5. 85 % of tiles successfully export → 127 155 training scenes.

This two-feature clustering guarantees geographical diversity (from tundra to tropics) and semantic diversity (cities, forests, deserts, croplands).

5.2 Each scene contains

  • 96 × 96 pixels at 10 m (9.6 km × 9.6 km)
  • 24 monthly time steps
  • 9 data modalities (up to 17 channel groups)

6. Benchmarks that matter — 11 tasks, one winner

6.1 Image classification (Top-1 accuracy)

Dataset Train split Galileo-Base Closest rival
EuroSat 100 % 93.0 % CROMA 85.6 %
EuroSat 1 % labels 56.6 % SoftCon 27.2 %

Low-label regimes reveal Galileo’s transfer power.

6.2 Semantic segmentation (mean IoU)

Dataset Modality Galileo-Base Next best
MADOS marine debris Sentinel-2 67.6 % CROMA 64.2 %
Sen1Floods11 flood Sentinel-1 SAR 79.4 % CROMA 78.9 %

6.3 Pixel time-series crop classification

Task Galileo-Tiny Presto (specialist)
Togo crop vs non-crop 74.7 % 75.5 %
Brazil coffee vs rest 97.2 % 98.8 %
Kenya maize vs rest 85.4 % 84.0 %

A generalist model matches or beats a purpose-built time-series model.

6.4 Model ranking summary

Across 11 benchmarks and 4 training fractions (100 %, 20 %, 5 %, 1 %):

  • Galileo-Base ranks 1st overall.
  • Galileo-Tiny ranks 2nd overall.
  • No other single model appears in the top-3 for both image and time-series tasks.

7. Quick start: load, run and fine-tune in minutes

7.1 Install

git clone https://github.com/nasaharvest/galileo.git
cd galileo
pip install -e .

7.2 Load the smallest model

from single_file_galileo import Encoder
model = Encoder.load_from_folder("data/models/nano", device="cpu")

7.3 Prepare one sample

import torch
from src.data.utils import S2_BANDS, construct_galileo_input

# Example: 2 months, 4 × 4 pixels, all Sentinel-2 bands
s2 = torch.randn(2, 4, 4, len(S2_BANDS))
inputs = construct_galileo_input(s2=s2, normalize=True)

7.4 Extract embeddings

with torch.no_grad():
    embedding = model(inputs)  # shape [B, L, D]

7.5 Fine-tune with your labels

  • Frozen features + linear probe: 30 s training on EuroSat 1 %.
  • Full fine-tune: 5 min on EuroSat 100 % (single RTX 4090).

8. Frequently asked questions

Q1: I only have RGB and radar. Is Galileo still useful?

Yes. The model gracefully handles missing modalities by zero-filling. Experiments show less than 2 % accuracy drop when half the modalities are removed.

Q2: Can I run Galileo on drone imagery at 1 m?

Yes.

  1. Resize or tile your 1 m imagery to 10 m patches.
  2. Keep the channel order; add dummy bands if needed.
  3. Fine-tune with 1 %–5 % labels — Galileo’s low-label performance is strong.

Q3: How do I map tokens back to pixel locations?

Tokens follow a row-major order:

row = token_id // (image_width / patch_size)
col = token_id %  (image_width / patch_size)

Q4: Which size should I choose?

  • Nano: real-time apps, edge devices, <1 GB RAM.
  • Tiny: research prototypes, single GPU.
  • Base: production services, multi-GPU.

Q5: Is Galileo open-source?

Yes.

Q6: Citation

@misc{tseng2025galileolearninggloballocal,
  title={Galileo: Learning Global and Local Features in Pretrained Remote Sensing Models},
  author={Gabriel Tseng and Anthony Fuller and Marlena Reil and Henry Herzog and Patrick Beukema and Favyen Bastani and James R. Green and Evan Shelhamer and Hannah Kerner and David Rolnick},
  year={2025},
  eprint={2502.09356},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2502.09356}
}

Closing thoughts

Remote sensing should not require a new model for every sensor, every task and every region.
With Galileo, a single checkpoint learns from the planet’s diversity and hands practitioners a robust starting point for crops, floods, deforestation or any other downstream use case.
Download the weights, run your first tile, and let Galileo do the heavy lifting.