From Data Chaos to Tissue Atlases: How SpaSEG Makes Spatial Transcriptomics Simple

1. Why Spatial Transcriptomics Matters (and Where It Hurts)

Imagine cutting a thin slice of brain or tumor tissue and asking, “Which genes are where?”
Spatial transcriptomics (SRT) does exactly that. Instead of grinding tissue into single-cell soup, it keeps every cell in its original neighborhood and records gene activity in situ.

The payoff: you can see immune cells swarming around a tumor margin, or layer-specific neurons sitting exactly where they should.
The pain: a single experiment can produce half a million data points—each carrying thousands of gene counts. Traditional tools choke on size, lose spatial context, or refuse to work across different SRT platforms (10x Visium, Stereo-seq, MERFISH, etc.).

2. Meet SpaSEG: A Four-in-One Toolkit

SpaSEG is an unsupervised deep-learning framework built by BGI-Research and published in Genome Biology (2025).
In one pipeline it does:

Spatial domain identification – finds tissue regions with similar gene patterns.
Multi-section alignment – stitches neighboring slices into a 3-D map.
Spatially-variable gene (SVG) discovery – genes that switch on/off between regions.
Cell–cell interaction inference – guesses who is talking to whom, based on ligand–receptor pairs.

The trick: SpaSEG treats every spot as a pixel in a multi-channel image and runs a lightweight convolutional neural network (CNN).
No manual tuning, no platform-specific hacks.

3. How It Works in Plain English

Real-world step	SpaSEG analogy
Remove low-quality spots & genes	Crop and clean the image
PCA + z-score normalization	Compress color channels
CNN with 3×3 filters	Look at local neighborhoods
Edge-strength loss	Keep boundaries smooth, not pixelated
Two-stage training	“Preview” mode → “polish” mode

3.1 Two-Stage Training Cheat-Sheet

Stage	Epochs	Loss	Purpose
Warm-up	400	Mean-squared error	Initialize sensible weights
Refinement	≤2 000	α × cross-entropy + β × edge-strength	Final clusters with crisp edges

Recommended weights

Single slice: α = 0.4, β = 0.7
Multiple slices: α = 0.2, β = 0.4

4. Quick Installation & Mini-Workflow

Environment

Python ≥ 3.9
PyTorch ≥ 1.12 (GPU optional but recommended)

One-line install

pip install stereopy

Five-line starter notebook

import stereopy as st

data = st.io.read_10x_h5('my_visium_file.h5')  # 1. load
st.pp.normalize_total(data)                    # 2. normalize
st.pp.pca(data, n_comps=50)                    # 3. reduce
st.tl.spa_seg(data, n_domains=6)               # 4. segment
st.pl.domain(data, color='spa_seg')            # 5. visualize

5. Benchmark Highlights (What You Actually Get)

Dataset	Platform	Spots	Speed-up vs. SpaGCN	Memory peak
Human DLPFC	10x Visium	3,000	~3×	< 2 GB
Mouse whole brain	Stereo-seq	526,716	26×	9 GB
Mouse embryo	seqFISH	6,400	30×	< 1 GB
Breast IDC	10x Visium	4,000	5×	< 2 GB

6. Tutorial 1: Identify Tissue Layers in Human DLPFC

Goal: reproduce the famous six-layer cortex + white-matter map.

Download spatialLIBD sample 151673.
Run the 5-line starter above.
Compare to manual labels:
- ARI = 0.554 (higher than BayesSpace, SpaGCN, Leiden)
- Layers 2–6 clearly separated; layer 4 slightly fuzzy (known issue).

7. Tutorial 2: Million-Spot Mouse Brain Without Tears

Goal: handle Stereo-seq Bin20 (10 µm spots) on a single GPU.

Pre-binning: aggregate DNB counts into 10 µm bins → 526 k spots.
PCA: 50 components (explains >80 % variance).
SpaSEG finishes in 8 minutes; SpaGCN runs out of memory; Leiden takes 20 minutes and smears boundaries.

8. Tutorial 3: Stitch Four Adjacent Slices into 3-D

Goal: align mouse olfactory-bulb sections without external alignment software.

Load four consecutive Stereo-seq slices.
Concatenate into one AnnData object; add batch_key='slice_id'.
Run multi-slice SpaSEG (alpha=0.2, beta=0.4).
Granular cell layer (GCL) and subependymal zone (SEZ) line up automatically; F1_LISI score +25 % over Harmony/LIGER.

9. Tutorial 4: Find Region-Specific Genes

Goal: discover genes that only turn on in the hippocampus.

After segmentation:

svg = st.tl.spatial_variable_genes(data, domain_key='spa_seg')
st.pl.gene(data, genes=['Nnat','Krt10','Ibsp'])

Gene	Domain	Known role
Nnat	Brain	Neuron development
Krt10	Epidermis	Keratinization
Ibsp	Cartilage	Bone formation

All hits pass:

log2FC > 1.5
in-domain expression ratio > 75 %
FDR < 0.05

10. Tutorial 5: Map Who Talks to Whom

Goal: predict ligand–receptor pairs that drive tumor-immune crosstalk.

Workflow:

Spatial domains → SpaSEG
Cell fractions → cell2location deconvolution
L-R list → Squidpy curates CellPhoneDB + OmniPath pairs
Score per spot → geometric mean co-expression × neighbor entropy
Validation → correlate spot score with downstream gene expression

Example from breast IDC:

CXCL12–CXCR4 between CAFs and T cells
LTB–LTBR at tumor border
Spearman correlation 0.78 vs. known downstream targets.

11. FAQ – Troubleshooting in Real Projects

Q1: I only have 8 GB of RAM. Can I still run half-million-spot data?
Yes. Reduce batch_size or switch to CPU mode. Runtime increases ~2× but stays within hours.

Q2: How do I choose the number of spatial domains?
Start with anatomical knowledge (e.g., 6 cortical layers).
Check NMI/ARI elbow plot; SpaSEG merges over-clustered regions automatically after 2 000 epochs.

Q3: My Stereo-seq file is not a perfect grid—will accuracy suffer?
SpaSEG rescales coordinates to [0, 1] and zero-pads empty pixels. Empirical ARI loss < 0.02.

Q4: Can I combine Visium and MERFISH in one run?
Not yet. Cross-platform batch correction is on the roadmap. For now, analyze separately and compare SVG lists.

12. Limitations & Roadmap

H&E images: not used in current release; multimodal version planned.
Sparse matrices: PCA denoising is default; more aggressive imputation in testing.
Cross-platform batch: manual harmonization required today.

13. When to Choose SpaSEG – Decision Table

Need	Recommendation
Stereo-seq >100 k spots	Use SpaSEG for speed
Multi-section 3-D atlas	Use multi-slice mode
Clinical tumor heterogeneity	Use SVG + L-R pipeline
Teaching demo	5-line notebook is enough

14. Key Takeaway

SpaSEG turns gigabytes of chaotic spot-level counts into interpretable tissue maps—all with a few dozen lines of Python.
Whether you study brain layers, tumor margins, or embryonic development, you get:

Speed: minutes instead of hours
Accuracy: highest reported ARI/NMI across 12 benchmark datasets
Simplicity: one package, one function call per task

Try the notebook today and spend your saved time on biology, not code.

Quick Links

Paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03697-1
Docs & tutorials: https://stereopy.readthedocs.io/en/v1.6.0/Tutorials(Multi-sample)/SpaSEG.html

SpaSEG Revolutionizes Spatial Transcriptomics: Simplifying 3D Tissue Mapping & Large-Scale Data Analysis