★SequenceLayers in PyTorch: Build Streaming Neural Networks Like Lego Bricks★
A practical, 3,000-word guide to Google DeepMind’s industrial-grade sequence library, now fully available in PyTorch with 99 % test coverage.
Table of Contents
-
Why This Guide Exists -
Key Concepts in Plain English -
Installation & First Run -
Build a Transformer Block in Ten Lines -
Layer Catalog at a Glance -
Combinators: Writing Models as Functional Programs -
Streaming Details: Latency, Flush, and Alignment -
Real-World Recipes -
Common Pitfalls & Fixes -
Deployment Notes -
Takeaways
Why This Guide Exists
If you have ever built a text-to-speech system, a real-time translator, or a next-token language model, you have probably met these headaches:
- 🍄
Parallel training works, streaming inference breaks. - 🍄
Masks and caches grow into spaghetti code. - 🍄
Swapping one attention flavor forces a rewrite of the whole sampling loop.
Google DeepMind’s SequenceLayers library was designed to remove these pain points. It gives you a box of interchangeable bricks. Each brick:
- 🍄
knows its own state (KV cache, convolution buffer, RNN cell) - 🍄
offers identical results in both layer-wise (full-sequence) and step-wise (one-step-at-a-time) execution - 🍄
passes 297 automated tests, ensuring bitwise parity between training and inference
This guide translates the original Chinese documentation into clear, global English, keeps every install command and code snippet intact, and shows how you can use the freshly released PyTorch port in your own projects.
Key Concepts in Plain English
Installation & First Run
# 1. Clone
git clone https://github.com/user/sequence-layers-pytorch.git
cd sequence-layers-pytorch
# 2. Install in editable mode
pip install -e .
# 3. Validate the 99 % test suite
python -m pytest sequence_layers/pytorch/ -q
If you see 294/297 passed, you are ready to build.
Build a Transformer Block in Ten Lines
import torch
import sequence_layers.pytorch as sl
d_model, heads = 256, 4
# A single Pre-Norm Transformer block
block = sl.Sequential([
sl.Residual(sl.Sequential([
sl.LayerNorm(d_model),
sl.attention.DotProductSelfAttention(
input_size=d_model,
num_heads=heads,
max_future_horizon=0 # strictly causal
),
])),
sl.Residual(sl.Sequential([
sl.LayerNorm(d_model),
sl.Dense(1024), sl.ReLU(), sl.Dense(d_model)
]))
])
# 1) Training: whole sequence at once
x = sl.random_sequence(batch_size=2, length=80, channels=d_model)
y = block.layer(x, training=True)
# 2) Inference: one step at a time
state = block.get_initial_state(batch_size=2,
channel_spec=sl.ChannelSpec(shape=(d_model,)))
for t in range(80):
y_t, state = block.step(x[:, t:t+1], state, training=False)
Key Takeaways
- 🍄
Switching between training and streaming is a one-line change. - 🍄
All masking, caching, and alignment are handled internally.
Layer Catalog at a Glance
Photo by AltumCode on Unsplash
Every layer exposes:
- 🍄
output_ratio
– e.g., 1/2 for a strided conv - 🍄
receptive_field
– causal span in input frames - 🍄
input_latency
/output_latency
– exact padding and discard counts
Combinators: Writing Models as Functional Programs
6.1 Sequential – Linear Stacks
model = sl.Sequential([
sl.Conv1D(64, 3, padding='causal'),
sl.ReLU(),
sl.LSTM(64)
])
6.2 Parallel – Branches + Merge
Process the same input in two ways, then concatenate channels:
branch = sl.Parallel(
sl.Conv1D(32, 3, padding='causal'),
sl.Identity(),
mode='concat'
)
6.3 Residual – Skip Connections
res_block = sl.Residual([
sl.LayerNorm(128),
sl.Dense(512), sl.ReLU(), sl.Dense(128)
])
6.4 Repeat – Loop N Times
Save compilation time by reusing the same layer:
deep_net = sl.Repeat(
TransformerBlock(d_model=512),
num_repeats=12,
remat=True # gradient checkpointing
)
Streaming Details: Latency, Flush, and Alignment
Photo by NASA on Unsplash
7.1 Latency Explained
- 🍄
Output latency: how many initial masked frames the layer emits before real data arrives. - 🍄
Input latency: how many extra masked frames you must feed at the end to flush the internal buffer.
Example: A causal convolution with kernel size 5
conv = sl.Conv1D(64, 5, padding='causal')
print(conv.output_latency) # 4
print(conv.input_latency) # 4
7.2 Automatic Flush Helper
from sequence_layers.pytorch.utils_test import step_by_step_dynamic
y_stream = step_by_step_dynamic(model, x, training=False)
The helper adds zero padding and discards the correct heads/tails so the result is bit-identical to the layer-wise call.
Real-World Recipes
8.1 Speech Denoising – Lightweight Conv-Residual Model
def mini_denoiser():
return sl.Sequential([
sl.Conv1D(64, 7, padding='causal'),
sl.Repeat(
sl.Residual([
sl.Conv1D(64, 3, padding='causal'),
sl.ReLU()
]),
num_repeats=8
),
sl.Conv1D(1, 7, padding='causal')
])
- 🍄
280 k parameters, 81-frame receptive field (≈ 1 s at 16 kHz) - 🍄
Runs on mobile CPUs with < 5 ms latency per frame
8.2 Text-to-Speech – Duration-Predict Transformer
# Encoder: text → hidden
encoder = sl.Sequential([
sl.Embedding(vocab_size, 256),
sl.Repeat(TransformerBlock(256), 4)
])
# Decoder: hidden → mel-spectrogram
decoder = sl.Sequential([
sl.LSTM(256),
sl.Dense(mel_bins),
sl.OverlapAdd(frame_step=256) # mel → waveform
])
- 🍄
OverlapAdd handles windowed reconstruction automatically. - 🍄
Connect decoder.step()
to an audio callback for real-time playback.
Common Pitfalls & Fixes
Deployment Notes
- 🍄
TorchScript Ready: Every step
method is TorchScript-compatible; export viatorch.jit.trace
. - 🍄
Memory: Use Repeat(remat=True)
to enable gradient checkpointing—halves peak VRAM. - 🍄
Mobile: Verified conversion to LiteRT on Android devices; KV caches serialize correctly.
Takeaways
- 🍄
99 % test coverage means you can trust the bricks in production. - 🍄
Layer-wise ↔ step-wise parity removes the training-to-inference gap. - 🍄
Combinators let you describe architectures as lego diagrams, not code.
Clone the repo, run the tests, and start stacking bricks.
Your next streaming model is ten lines away.
Repository & Full Test Suite
https://github.com/user/sequence-layers-pytorch