Exploring RegressLM: A Practical Guide to Text-to-Text Regression

Have you ever wondered how to predict numerical outcomes from messy, unstructured text data without getting bogged down in complicated feature engineering? That’s where RegressLM comes in. This library makes it straightforward to handle text-to-text regression tasks, turning strings into floating-point predictions. It’s especially useful for scenarios like simulating performance metrics in large systems, where data comes in forms like logs or configuration files.

In this article, we’ll walk through what RegressLM is, how to set it up, and ways to use it effectively. I’ll address common questions as we go, drawing from the library’s documentation and related research to keep things clear and actionable. Let’s start with the basics.

What Is RegressLM and Why Might You Need It?

Imagine you’re dealing with data from a massive compute cluster, like Google’s Borg system. You have logs full of details—timestamps, hardware distributions, job profiles—and you need to predict a efficiency metric, say, millions of instructions per second per computing unit. Traditional methods might require flattening all that into tables, which can lose important context or be impractical for varying data structures.

RegressLM offers a different approach: text-to-text regression. It treats your input as a string and outputs a numerical value through a language model-like process. This means you can feed in any text representation without rigid formatting. The library supports pretraining on large datasets and fine-tuning for specific tasks, making it flexible for multi-task learning.

From the research perspective, this method has shown promise in predicting performance for large systems. For instance, a 60M parameter model trained from scratch achieved near-perfect rank correlations (up to 0.99) on Borg data, outperforming tabular methods by reducing mean squared error by 100 times. It also adapts quickly to new tasks with just a few examples.

This image illustrates how RegressLM decodes a performance metric from textual system states, highlighting its application in real-world scenarios.

How Does Text-to-Text Regression Work in RegressLM?

You might be asking, “How does RegressLM actually turn text into numbers?” It’s built on regression language models (RLMs), which use an encoder-decoder architecture to process input strings and generate tokenized numerical outputs. The model learns by minimizing cross-entropy loss over tokens representing the target value, rather than direct error metrics like MSE.

Key elements include:

Input Representation: Any string, such as YAML-formatted logs from a compute cluster.
Output Tokenization: Numbers are broken into signs, mantissas, and exponents (e.g., 72.5 as <+><7><2><5>).
Training Process: Next-token prediction on pairs of input text and target values.
Inference: Sample multiple outputs and aggregate them for a prediction.

This setup avoids issues with fixed-length tensors in traditional regression, allowing variable-length inputs and handling nested data naturally.

In practice, for performance prediction in systems like Borg, inputs include:

Cluster names and locations.
Time windows for data collection.
Scheduler hyperparameters.
Machine distributions and job profiles.

These features can span thousands of characters, as shown in the table below:

Feature Type	Average Character Count
Cell Name	3
Physical Location	8
Time Window	86
Scheduler Hyperparameters	1,082
Machine Distribution	461
Job-on-machine Performance	268,157

The distribution of total input string lengths often follows a pattern where most are under 1M characters, but some extend further, emphasizing the need for models that handle long contexts.

![Distribution of input string lengths](假设路径或描述：Figure 3 from the document shows a histogram with string lengths from 1K to 10M, frequencies decreasing as length increases.)

Setting Up RegressLM: Step-by-Step Guide

If you’re ready to try it out, here’s how to get started. The installation is simple and uses standard Python tools.

How to Install RegressLM?

Clone the repository from GitHub: git clone https://github.com/google-deepmind/regress-lm.git.
Navigate to the directory: cd regress-lm.
Install the core libraries: pip install -e ..
For advanced models like T5Gemma, add extras: pip install ".[extras]".

This setup gives you access to the main classes and models. Note that it’s not an officially supported Google product, so it’s best for research or experimental use.

Once installed, you can import and use it in your Python scripts.

Basic Usage: Inference and Pretraining

Now, let’s dive into how to use RegressLM. There are two primary stages: inference for predictions and pretraining for better initial models.

How Do I Perform Inference with RegressLM?

Inference involves creating a model, fine-tuning on examples if needed, and sampling predictions.

Here’s a code example:

from regress_lm import core
from regress_lm import rlm

# Create a RegressLM instance with a maximum input token length.
reg_lm = rlm.RegressLM.from_default(max_input_len=2048)

# Prepare example pairs for fine-tuning.
examples = [core.Example(x='hello', y=0.3), core.Example(x='world', y=-0.3)]
reg_lm.fine_tune(examples)

# Define query inputs.
query1 = core.ExampleInput(x='hi')
query2 = core.ExampleInput(x='bye')

# Sample predictions.
samples1, samples2 = reg_lm.sample([query1, query2], num_samples=128)

This generates 128 samples per query, which you can average or analyze for distributions. It’s efficient for quick predictions on new inputs.

What About Pretraining?

Pretraining helps when you have large datasets. Use PyTorch for this:

from torch import optim
from regress_lm.models.pytorch import model as torch_model_lib

# Initialize the model.
model = torch_model_lib.PyTorchModel(...)

# Set up the optimizer.
optimizer = optim.Adafactor(
    filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4
)

# Training loop.
for _ in range(...):
    examples = [core.Example(x=..., y=...), ...]  # Your data here.
    tensor_examples = model.convert(examples)
    optimizer.zero_grad()
    loss, _ = model.compute_loss_and_metrics(tensor_examples)
    loss.backward()
    optimizer.step()

This loop updates the model over your data, improving its starting point for fine-tuning.

Extended Usage: Boosting Performance

You might wonder, “How can I make RegressLM perform better?” The library offers several ways to enhance accuracy and handle more complex scenarios.

Training a Custom Vocabulary

If your data has unique patterns, train a vocabulary on your corpus:

encoder_vocab = SentencePieceVocab.from_corpus(corpus_path='mydata.txt', vocab_size=1024)

This tailors the tokenization to your text, potentially improving efficiency.

Using Larger Models

Scale up for better results, though it increases compute needs:

model = PyTorchModel(num_encoder_layers=12, num_decoder_layers=12)

Larger sizes can capture more nuances in data like Borg logs.

Handling Multi-Objective Regression

For predicting multiple values:

reg_lm = rlm.RegressLM.from_default(max_num_objs=2)

# Examples with variable lengths.
examples = [core.Example(x='hello', y=[0.2]), core.Example(x='world', y=[-0.2, 0.3])]
reg_lm.fine_tune(examples)

# Sample multi-dimensional outputs.
samples = reg_lm.sample([core.ExampleInput(x='hi')], num_samples=128)[0]  # Shape: (128, 2)

This is useful for metrics with multiple components.

Integrating Pretrained Models

Use third-party models like T5Gemma:

from regress_lm.models.pytorch import t5gemma_model
model = t5gemma_model.T5GemmaModel('google/t5gemma-s-s-prefixlm')

It leverages existing checkpoints for faster starts.

Supporting Long Contexts

For inputs over 100K tokens, try alternative encoders:

# Mamba encoder.
model = PyTorchModel(encoder_type='mamba', additional_encoder_kwargs={'d_state': 128})

# Performer encoder.
model = PyTorchModel(encoder_type='performer', additional_encoder_kwargs={'num_features': 256})

These handle extended sequences, crucial for detailed system logs.

Real-World Application: Predicting Performance in Large Systems

Let’s talk about a practical example from research on Google’s Borg compute cluster. Borg manages job scheduling across machines, and predicting efficiency metrics like MIPS per GCU is key to optimization.

What Makes Prediction Challenging in Systems Like Borg?

Features are complex and nested:

Job requests with resources and replicas.
Machine states with available resources.
Profiling data from 10-second windows.

Traditional tabular methods struggle with this, often requiring expert feature engineering. Text-to-text regression sidesteps that by using full string representations.

An example input might look like this (anonymized):

cell: cell_a
2024/06/02 17:00:00 PDT, Day:Fri Week:21
search_space:
{’JOB/data_pipeline/PRODUCTION_WORKLOAD’:
[’machineE’, ’machineA’, ’none_selected’]}
assignments: {"JOB/data_pipeline/PRODUCTION_WORKLOAD": "machineA"}
distributions:
- platform: {machineA}
num_machines: 1.239e+03
low_level_zones: 5.200e+01
mid_level_zones: 5.200e+01
high_level_zones: 4.300e+01
resources: 5.481e+05
job_profiles:
- job: {user: data_pipeline, group_name: data_pipeline_workers}
platform_profiles:
machineD: {mean_mips_per_resource_usage: 8.165e+02}
machineA: {mean_mips_per_resource_usage: 9.590e+02}
machineF: {mean_mips_per_resource_usage: 8.321e+02}
machineC: {mean_mips_per_resource_usage: 7.098e+02}
limits:
job_requested_resource_limit: 1.217e+04
job_requested_num_vms: 1087

Predicting MIPS per GCU from this can take hours in simulation, but RLMs do it in seconds with high accuracy.

![Overview of using RLM for performance prediction.](假设路径或描述：Figure 1 shows system logs fed into an encoder-decoder RLM, outputting a predictive distribution of tokenized numbers.)

Key Insights from Research

High Accuracy: On Borg data, RLMs achieved 0.99 rank correlation fleet-wide, with 0.9 average.
Few-Shot Adaptation: Fine-tune on 500 examples for new clusters.
Uncertainty Handling: Models naturally quantify aleatoric and epistemic uncertainty through sampling.

The bias-variance decomposition shows that observing more features reduces epistemic uncertainty, bounding MSE by total variance over equivalence classes.

For a dataset, total variance is calculated as:

TotalVariance = (1/K) * Sum [Var(y | x in X_k)]

Where X_k are groups with identical observed features.

Ablations and Design Choices: What Works and Why?

You might ask, “Why these specific designs in RegressLM?” Research ablations highlight:

Encoders Are Essential: Decoder-only models underperform on complex inputs; encoders process long strings better.
No Language Pretraining Needed: Start from random initialization—regression focuses on correlations, not semantics.
Decoding Over Value Heads: Cross-entropy stabilizes training across varying scales.
Sequence Length Matters: Longer contexts (e.g., 2048 tokens) improve by capturing full details.
Model Size: Larger models (e.g., 12 layers) boost performance but increase costs.
Learning Rate and Stopping: Use 1e-4 with Adafactor; early stopping prevents overfitting.

These choices make RLMs robust for tasks like Borg prediction, where noise from stochastic loads adds challenge.

Contributors and How to Cite

The library was developed by Xingyou Song, Yash Akhauri, Dara Bahri, Michal Lukasik, Arissa Wongpanich, Adrian N. Reyes, and Bryan Lewandowski.

If using this in your work, cite:

Performance Prediction for Large Systems via Text-to-Text Regression (Akhauri et al., 2025, arXiv:2506.21718).
OmniPred: Language Models as Universal Regressors (Song et al., 2024, TMLR).
Decoding-based Regression (Song and Bahri, 2025, TMLR).

FAQ: Answering Common Questions About RegressLM

Here, I’ll address questions you might have, based on typical user curiosities.

What is text-to-text regression?

It’s a method where models predict numbers from text inputs using language model techniques, ideal for unstructured data.

How does RegressLM handle uncertainty?

By sampling multiple outputs, it estimates distributions, capturing both irreducible noise (aleatoric) and feature incompleteness (epistemic).

Can RegressLM predict multiple metrics at once?

Yes, via multi-objective support—set max_num_objs and use lists for y.

Is RegressLM suitable for long inputs?

Absolutely, with options like Mamba or Performer encoders for 100K+ tokens.

How does it compare to traditional regression?

It outperforms on complex data, e.g., 100x lower MSE on Borg, by avoiding feature flattening.

What if my data has new features?

Fine-tune on new examples; no need to restart from scratch.

How do I evaluate performance?

Use metrics like MSE, rank correlation, or log-likelihood on held-out data.

Can I use it without PyTorch?

Core usage is Python-based, but pretraining examples use PyTorch.

How-To: Fine-Tuning for a New Task

Load your data as Example objects: x as string, y as float or list.
Create RegressLM: reg_lm = rlm.RegressLM.from_default().
Fine-tune: reg_lm.fine_tune(examples).
Sample: samples = reg_lm.sample(queries, num_samples=100).
Aggregate: Average samples for point predictions or analyze variance.

This process enables quick adaptation, like shifting to a new compute cluster.

Wrapping Up: The Value of RegressLM

RegressLM simplifies handling regression from text, opening doors to applications in system simulation and beyond. By leveraging full data contexts, it achieves high accuracy where traditional methods falter. Whether you’re predicting efficiencies or exploring multi-task learning, it’s a tool worth trying.

Mastering Text-to-Text Regression: A Practical Guide to RegressLM for System Performance Prediction