SequenceLayers in PyTorch: Build Streaming Neural Networks Like Lego Bricks

A practical, 3,000-word guide to Google DeepMind’s industrial-grade sequence library, now fully available in PyTorch with 99 % test coverage.


Table of Contents

  1. Why This Guide Exists
  2. Key Concepts in Plain English
  3. Installation & First Run
  4. Build a Transformer Block in Ten Lines
  5. Layer Catalog at a Glance
  6. Combinators: Writing Models as Functional Programs
  7. Streaming Details: Latency, Flush, and Alignment
  8. Real-World Recipes
  9. Common Pitfalls & Fixes
  10. Deployment Notes
  11. Takeaways

Why This Guide Exists

If you have ever built a text-to-speech system, a real-time translator, or a next-token language model, you have probably met these headaches:

  • 🍄
    Parallel training works, streaming inference breaks.
  • 🍄
    Masks and caches grow into spaghetti code.
  • 🍄
    Swapping one attention flavor forces a rewrite of the whole sampling loop.

Google DeepMind’s SequenceLayers library was designed to remove these pain points. It gives you a box of interchangeable bricks. Each brick:

  • 🍄
    knows its own state (KV cache, convolution buffer, RNN cell)
  • 🍄
    offers identical results in both layer-wise (full-sequence) and step-wise (one-step-at-a-time) execution
  • 🍄
    passes 297 automated tests, ensuring bitwise parity between training and inference

This guide translates the original Chinese documentation into clear, global English, keeps every install command and code snippet intact, and shows how you can use the freshly released PyTorch port in your own projects.


Key Concepts in Plain English

Term What It Means Why You Care
Sequence object A small wrapper that pairs data [batch, time, dim] with a boolean mask [batch, time] Invalid timesteps are automatically zeroed; no more manual masked_fill.
layer-wise call y = model.layer(x) Classic, parallel, fast for training.
step-wise call y_t, state = model.step(x_t, state) Exact same math, but one frame at a time for streaming.
Combinator Sequential, Parallel, Residual, Repeat Compose bricks without writing forward passes.
Latency output_latency = how many initial outputs to discard; input_latency = how many padding frames to add at the end Guarantees identical results between modes.

Installation & First Run

# 1. Clone
git clone https://github.com/user/sequence-layers-pytorch.git
cd sequence-layers-pytorch

# 2. Install in editable mode
pip install -e .

# 3. Validate the 99 % test suite
python -m pytest sequence_layers/pytorch/ -q

If you see 294/297 passed, you are ready to build.


Build a Transformer Block in Ten Lines

import torch
import sequence_layers.pytorch as sl

d_model, heads = 256, 4

# A single Pre-Norm Transformer block
block = sl.Sequential([
    sl.Residual(sl.Sequential([
        sl.LayerNorm(d_model),
        sl.attention.DotProductSelfAttention(
            input_size=d_model,
            num_heads=heads,
            max_future_horizon=0   # strictly causal
        ),
    ])),
    sl.Residual(sl.Sequential([
        sl.LayerNorm(d_model),
        sl.Dense(1024), sl.ReLU(), sl.Dense(d_model)
    ]))
])

# 1) Training: whole sequence at once
x = sl.random_sequence(batch_size=2, length=80, channels=d_model)
y = block.layer(x, training=True)

# 2) Inference: one step at a time
state = block.get_initial_state(batch_size=2,
                                channel_spec=sl.ChannelSpec(shape=(d_model,)))
for t in range(80):
    y_t, state = block.step(x[:, t:t+1], state, training=False)

Key Takeaways

  • 🍄
    Switching between training and streaming is a one-line change.
  • 🍄
    All masking, caching, and alignment are handled internally.

Layer Catalog at a Glance

catalog
Photo by AltumCode on Unsplash

Category Examples Typical Use
Dense & Linear Dense, Embedding, EinsumDense Word embeddings, feed-forward nets
Convolution Conv1D, Conv2D, DepthwiseConv1D, ConvTranspose Audio spectrograms, image time-series
Recurrent LSTM, GRU, VanillaRNN Long-range contexts with small memory
Attention DotProductSelfAttention, StreamingDotProductAttention Causal or cross-attention, KV cache included
Normalization LayerNorm, BatchNorm, RMSNorm Stabilize training
DSP STFT, OverlapAdd, Frame, Window Speech analysis, filtering, reconstruction
Pooling MaxPooling1D, GlobalAveragePooling Downsampling, global features
Combinators Sequential, Parallel, Residual, Repeat, Blockwise Compose layers without boilerplate

Every layer exposes:

  • 🍄
    output_ratio – e.g., 1/2 for a strided conv
  • 🍄
    receptive_field – causal span in input frames
  • 🍄
    input_latency / output_latency – exact padding and discard counts

Combinators: Writing Models as Functional Programs

6.1 Sequential – Linear Stacks

model = sl.Sequential([
    sl.Conv1D(64, 3, padding='causal'),
    sl.ReLU(),
    sl.LSTM(64)
])

6.2 Parallel – Branches + Merge

Process the same input in two ways, then concatenate channels:

branch = sl.Parallel(
    sl.Conv1D(32, 3, padding='causal'),
    sl.Identity(),
    mode='concat'
)

6.3 Residual – Skip Connections

res_block = sl.Residual([
    sl.LayerNorm(128),
    sl.Dense(512), sl.ReLU(), sl.Dense(128)
])

6.4 Repeat – Loop N Times

Save compilation time by reusing the same layer:

deep_net = sl.Repeat(
    TransformerBlock(d_model=512),
    num_repeats=12,
    remat=True   # gradient checkpointing
)

Streaming Details: Latency, Flush, and Alignment

latency
Photo by NASA on Unsplash

7.1 Latency Explained

  • 🍄
    Output latency: how many initial masked frames the layer emits before real data arrives.
  • 🍄
    Input latency: how many extra masked frames you must feed at the end to flush the internal buffer.

Example: A causal convolution with kernel size 5

conv = sl.Conv1D(64, 5, padding='causal')
print(conv.output_latency)  # 4
print(conv.input_latency)   # 4

7.2 Automatic Flush Helper

from sequence_layers.pytorch.utils_test import step_by_step_dynamic

y_stream = step_by_step_dynamic(model, x, training=False)

The helper adds zero padding and discards the correct heads/tails so the result is bit-identical to the layer-wise call.


Real-World Recipes

8.1 Speech Denoising – Lightweight Conv-Residual Model

def mini_denoiser():
    return sl.Sequential([
        sl.Conv1D(64, 7, padding='causal'),
        sl.Repeat(
            sl.Residual([
                sl.Conv1D(64, 3, padding='causal'),
                sl.ReLU()
            ]),
            num_repeats=8
        ),
        sl.Conv1D(1, 7, padding='causal')
    ])
  • 🍄
    280 k parameters, 81-frame receptive field (≈ 1 s at 16 kHz)
  • 🍄
    Runs on mobile CPUs with < 5 ms latency per frame

8.2 Text-to-Speech – Duration-Predict Transformer

# Encoder: text → hidden
encoder = sl.Sequential([
    sl.Embedding(vocab_size, 256),
    sl.Repeat(TransformerBlock(256), 4)
])

# Decoder: hidden → mel-spectrogram
decoder = sl.Sequential([
    sl.LSTM(256),
    sl.Dense(mel_bins),
    sl.OverlapAdd(frame_step=256)   # mel → waveform
])
  • 🍄
    OverlapAdd handles windowed reconstruction automatically.
  • 🍄
    Connect decoder.step() to an audio callback for real-time playback.

Common Pitfalls & Fixes

Symptom Root Cause Fix
step outputs all zeros Forgot to discard output_latency frames Slice y_step[:, output_latency:]
Streaming differs from training Enabled future attention in inference Set max_future_horizon=0
OverlapAdd missing tail Stream doesn’t know sequence end Use layer-wise post-processing or manual flush

Deployment Notes

  • 🍄
    TorchScript Ready: Every step method is TorchScript-compatible; export via torch.jit.trace.
  • 🍄
    Memory: Use Repeat(remat=True) to enable gradient checkpointing—halves peak VRAM.
  • 🍄
    Mobile: Verified conversion to LiteRT on Android devices; KV caches serialize correctly.

Takeaways

  • 🍄
    99 % test coverage means you can trust the bricks in production.
  • 🍄
    Layer-wise ↔ step-wise parity removes the training-to-inference gap.
  • 🍄
    Combinators let you describe architectures as lego diagrams, not code.

Clone the repo, run the tests, and start stacking bricks.
Your next streaming model is ten lines away.


Repository & Full Test Suite
https://github.com/user/sequence-layers-pytorch