Galileo: One Model to Map the World
A practical guide to the open-source, all-in-one remote-sensing foundation model
Table of Contents
-
Why another remote-sensing model? -
What Galileo can “see” -
Inside the model — building blocks made simple -
How Galileo teaches itself without labels -
The 127 155 training scenes that keep Galileo honest -
Benchmarks that matter — 11 tasks, one winner -
Quick start: load, run and fine-tune in minutes -
Frequently asked questions
1. Why another remote-sensing model?
Remote sensing is noisy.
Images arrive in different wavelengths, resolutions and schedules.
Objects of interest range from a two-pixel fishing boat to a thousand-pixel glacier.
Most pretrained models are locked to a single sensor or a single task.
Galileo was built to solve exactly this fragmentation.
-
One checkpoint handles optical, radar, elevation, weather, night-lights and more. -
One architecture digests single images, image time-series or pure pixel time-series. -
One training recipe teaches global context and local detail at the same time.
The result: state-of-the-art results on 11 public benchmarks without hand-crafted rules or extra labels.
2. What Galileo can “see”
Data type | Source example | Spatial resolution | Temporal cadence |
---|---|---|---|
Multispectral imagery | Sentinel-2 (all bands except B1, B9, B10) | 10 m | 5-day revisit |
Synthetic-aperture radar | Sentinel-1 (VV, VH) | 10 m | 6-day revisit |
Vegetation index | Sentinel-2 NDVI | 10 m | 5-day |
Elevation & slope | SRTM 30 m DEM resampled | 10 m | static |
Land-cover probability | Dynamic World 10 m | 10 m | monthly aggregate |
Crop-type map | World Cereal 10 m | 10 m | seasonal |
Weather | ERA5 precipitation & temperature | 0.25° | daily |
Climate water balance | TerraClimate soil moisture & evapotranspiration | 4 km | monthly |
Night-time lights | VIIRS | 500 m | monthly |
Population density | LandScan | 1 km | annual |
Location meta-data | Latitude, longitude, month index | — | — |
You can feed Galileo any subset of the above. Missing bands are simply zero-filled; the model still works.
3. Inside the model — building blocks made simple
3.1 From pixels to tokens
Think of an image as a Lego board.
Galileo chops the board into small square bricks (patches).
Each brick is 10 m × 10 m in the real world and may contain several wavelengths or months of data.
-
Spatial patches → H/P × W/P
tokens -
Time steps → stacked along the sequence -
Channel groups → RGB, red-edge, SAR, weather variables, etc.
This design allows the same code to ingest:
-
a single 64 × 64 image, -
a 96 × 96 tile with 24 months, or -
a single pixel measured 24 times.
3.2 Three model sizes
Model | Parameters | GPU hours (H100) | Everyday analogy |
---|---|---|---|
Nano | 0.8 M | 200 | Runs on a modern laptop |
Tiny | 5.3 M | 259 | Runs on a desktop GPU |
Base | 85 M | 573 | Runs on a server or cloud |
Choose the trade-off between accuracy and compute that matches your project budget.
4. How Galileo teaches itself without labels
Galileo uses self-supervised learning — no human labels during pre-training.
4.1 Two complementary tasks
Task | Masking style | Target depth | Best suited for |
---|---|---|---|
Global | Mask large spatial or temporal chunks | Deep transformer layer | Image-level classification |
Local | Randomly mask 5 % of tokens | Linear projection (shallow) | Pixel-level segmentation |
Global task in plain words
-
Hide a big rectangle of the image. -
Ask the model to guess what should be there using the rest of the scene. -
Reward answers that are close to a “moving-average” version of the model itself.
Local task in plain words
-
Hide random small patches. -
Ask the model to reproduce the exact pixel values. -
Penalise answers that look like any other random patch.
Because the two tasks alternate every batch, Galileo learns both context and detail without ever seeing a labelled image.
5. The 127 155 training scenes that keep Galileo honest
5.1 Sampling the planet
-
Divide the global 10 m land-cover map into 10 km × 10 km tiles. -
Compute two features for each tile: -
pixel-level class histogram -
latitude/longitude centroid
-
-
Run k-means (k = 150 000) on these features. -
Download the tile nearest to each centroid. -
85 % of tiles successfully export → 127 155 training scenes.
This two-feature clustering guarantees geographical diversity (from tundra to tropics) and semantic diversity (cities, forests, deserts, croplands).
5.2 Each scene contains
-
96 × 96 pixels at 10 m (9.6 km × 9.6 km) -
24 monthly time steps -
9 data modalities (up to 17 channel groups)
6. Benchmarks that matter — 11 tasks, one winner
6.1 Image classification (Top-1 accuracy)
Dataset | Train split | Galileo-Base | Closest rival |
---|---|---|---|
EuroSat | 100 % | 93.0 % | CROMA 85.6 % |
EuroSat | 1 % labels | 56.6 % | SoftCon 27.2 % |
Low-label regimes reveal Galileo’s transfer power.
6.2 Semantic segmentation (mean IoU)
Dataset | Modality | Galileo-Base | Next best |
---|---|---|---|
MADOS marine debris | Sentinel-2 | 67.6 % | CROMA 64.2 % |
Sen1Floods11 flood | Sentinel-1 SAR | 79.4 % | CROMA 78.9 % |
6.3 Pixel time-series crop classification
Task | Galileo-Tiny | Presto (specialist) |
---|---|---|
Togo crop vs non-crop | 74.7 % | 75.5 % |
Brazil coffee vs rest | 97.2 % | 98.8 % |
Kenya maize vs rest | 85.4 % | 84.0 % |
A generalist model matches or beats a purpose-built time-series model.
6.4 Model ranking summary
Across 11 benchmarks and 4 training fractions (100 %, 20 %, 5 %, 1 %):
-
Galileo-Base ranks 1st overall. -
Galileo-Tiny ranks 2nd overall. -
No other single model appears in the top-3 for both image and time-series tasks.
7. Quick start: load, run and fine-tune in minutes
7.1 Install
git clone https://github.com/nasaharvest/galileo.git
cd galileo
pip install -e .
7.2 Load the smallest model
from single_file_galileo import Encoder
model = Encoder.load_from_folder("data/models/nano", device="cpu")
7.3 Prepare one sample
import torch
from src.data.utils import S2_BANDS, construct_galileo_input
# Example: 2 months, 4 × 4 pixels, all Sentinel-2 bands
s2 = torch.randn(2, 4, 4, len(S2_BANDS))
inputs = construct_galileo_input(s2=s2, normalize=True)
7.4 Extract embeddings
with torch.no_grad():
embedding = model(inputs) # shape [B, L, D]
7.5 Fine-tune with your labels
-
Frozen features + linear probe: 30 s training on EuroSat 1 %. -
Full fine-tune: 5 min on EuroSat 100 % (single RTX 4090).
8. Frequently asked questions
Q1: I only have RGB and radar. Is Galileo still useful?
Yes. The model gracefully handles missing modalities by zero-filling. Experiments show less than 2 % accuracy drop when half the modalities are removed.
Q2: Can I run Galileo on drone imagery at 1 m?
Yes.
-
Resize or tile your 1 m imagery to 10 m patches. -
Keep the channel order; add dummy bands if needed. -
Fine-tune with 1 %–5 % labels — Galileo’s low-label performance is strong.
Q3: How do I map tokens back to pixel locations?
Tokens follow a row-major order:
row = token_id // (image_width / patch_size)
col = token_id % (image_width / patch_size)
Q4: Which size should I choose?
-
Nano: real-time apps, edge devices, <1 GB RAM. -
Tiny: research prototypes, single GPU. -
Base: production services, multi-GPU.
Q5: Is Galileo open-source?
Yes.
-
Code & weights: GitHub -
Hugging Face: nasaharvest/galileo -
Paper: arXiv 2502.09356
Q6: Citation
@misc{tseng2025galileolearninggloballocal,
title={Galileo: Learning Global and Local Features in Pretrained Remote Sensing Models},
author={Gabriel Tseng and Anthony Fuller and Marlena Reil and Henry Herzog and Patrick Beukema and Favyen Bastani and James R. Green and Evan Shelhamer and Hannah Kerner and David Rolnick},
year={2025},
eprint={2502.09356},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.09356}
}
Closing thoughts
Remote sensing should not require a new model for every sensor, every task and every region.
With Galileo, a single checkpoint learns from the planet’s diversity and hands practitioners a robust starting point for crops, floods, deforestation or any other downstream use case.
Download the weights, run your first tile, and let Galileo do the heavy lifting.