Protenix-v1: Exploring an Open-Source Approach to Biomolecular Structure Prediction

Have you ever wondered how scientists predict the 3D shapes of proteins, DNA, RNA, and other molecules that make up life? It’s a fascinating field, and recently, there’s been an exciting development with Protenix-v1 from ByteDance. This model aims to match the accuracy of advanced tools like AlphaFold3, but with everything open-source. If you’re a grad student or someone with a background in biology or computer science, you might be curious about how it works, how to use it, and what it means for research. Let’s dive in step by step, and I’ll answer some common questions along the way.

What Is Protenix-v1 and Why Does It Matter?

Imagine you’re trying to understand how a complex machine works by looking at its blueprint. In biology, the “blueprint” is the 3D structure of biomolecules like proteins, nucleic acids, and ligands. Predicting these structures accurately can speed up drug discovery, protein design, and more. Protenix-v1 is a foundation model designed for just that—high-accuracy prediction of all-atom 3D structures for complexes involving proteins, DNA, RNA, and small-molecule ligands.

It’s described as “Protein + X,” where X stands for those additional elements like nucleic acids and ligands. The team behind it calls it a comprehensive reproduction of AlphaFold3’s architecture, but fully open and extensible. That means you get the code, model weights, data pipelines, and even a web server for interactive use, all under the Apache 2.0 license. This openness is a big deal because it lets researchers and developers build on it without restrictions.

You might ask: “How does Protenix-v1 compare to AlphaFold3?” The key is that it targets the same level of performance while sticking to matched constraints. For instance, it uses the same training data cutoff date of September 30, 2021, has a similar model scale with 368 million parameters, and operates under comparable inference budgets. This setup allows for fair comparisons, and according to the benchmarks, Protenix-v1 often outperforms AlphaFold3 on diverse sets.

Protenix predictions

This image shows some example predictions from Protenix, giving you a visual sense of the complex structures it can handle.

Breaking Down the Core Features of Protenix-v1

Let’s get into the details. Protenix-v1 reimplements a diffusion-based architecture similar to AlphaFold3, which is great for predicting structures at the atomic level. It supports multiple sequence alignments (MSA) for proteins and RNA, as well as templates, making it versatile.

Key Components Released

The full stack includes:

  • Training and Inference Code: You can train your own models or run predictions easily.
  • Pre-trained Model Weights: Ready-to-use parameters for quick starts.
  • Data and MSA Pipelines: Tools to prepare your data, including multiple sequence alignment for better accuracy.
  • Protenix Web Server: A browser-based tool for trying it out interactively without installing anything.

If you’re thinking, “What kinds of structures can it predict?” It handles complexes with:

  • Proteins
  • Nucleic acids (DNA and RNA)
  • Small-molecule ligands

This makes it useful for a wide range of biological questions, from understanding protein-ligand interactions to modeling RNA-protein complexes.

Performance Under Constraints

One common question is: “How close can an open model get to top-tier accuracy?” Protenix-v1 shows it’s possible to reach or exceed AlphaFold3 levels when keeping things fair. For example, on challenging tasks like antigen-antibody complexes, increasing the number of sampled candidates—from a few to hundreds—leads to steady improvements in accuracy. This log-linear scaling means you can trade more computation for better results, which is documented clearly.

Here’s a quick look at the constraints that make comparisons fair:

  • Training Data Cutoff: September 30, 2021 (matches AlphaFold3’s PDB cutoff).
  • Model Scale: 368 million parameters.
  • Inference Budget: Similar sampling and runtime limits.
Inference time vs ntoken

This chart illustrates the inference-time scaling behavior, showing how accuracy improves with more samples.

Introducing PXMeter: The Benchmarking Toolkit

To back up these claims, the team released PXMeter v1.0.0, an evaluation toolkit. You might wonder: “How do I know if Protenix-v1 is really better?” PXMeter provides transparent benchmarking on over 6,000 complexes, with subsets split by time and domains (like antibody-antigen or protein-RNA).

What Does PXMeter Offer?

  • Curated Dataset: Manually reviewed to remove artifacts and problematic entries.
  • Subsets: Time-split for realistic evaluations and domain-specific for targeted analysis.
  • Unified Metrics: Computes things like complex LDDT (local distance difference test) and DockQ for consistent comparisons.

There’s even a related paper that evaluates Protenix alongside AlphaFold3, Boltz-1, and Chai-1, highlighting how dataset choices affect rankings. This toolkit ensures reproducibility, which is crucial in research.

For instance, benchmarks show Protenix-v1 excelling in various categories. Here’s a table summarizing some key metrics from the release:

Benchmark Set Protenix-v1 Performance Comparison to AlphaFold3
Antigen-Antibody Complexes Log-linear accuracy gains with more samples Often outperforms on curated tasks
Protein-RNA Complexes High complex LDDT scores Matches or exceeds under same constraints
Ligand Complexes Strong DockQ metrics Competitive with closed models

These results come from PXMeter evaluations, emphasizing fair play.

How Protenix Fits into a Broader Ecosystem

Protenix isn’t standalone—it’s part of an ecosystem. If you’re asking, “What else can I do with this?” Check out these related projects:

  • PXDesign: A binder design suite built on Protenix. It achieves 20–73% experimental hit rates, 2–6 times higher than methods like AlphaProteo and RFdiffusion. Accessible via the Protenix Server.
  • Protenix-Dock: A classical docking framework using empirical scoring for protein-ligand tasks, without deep nets.
  • Protenix-Mini and Protenix-Mini+: Lightweight versions that cut inference costs while keeping accuracy close to the full model.

These tools share interfaces, making it easy to integrate them into workflows for structure prediction, docking, and design.

Latest Updates and Improvements

Keeping up with developments? The team has been active:

  • On February 5, 2026, Protenix-v1 was released with support for template/RNA MSA features, improved training, and inference enhancements.
  • Earlier, in November 2025, v0.7.0 added diffusion optimizations like caching and kernel fusion.
  • In July 2025, Protenix-Mini introduced cost reductions and constraints for better predictions.
  • January 2025 brought pipeline enhancements for data and MSA.

These updates show ongoing commitment to making the tool better.

How to Get Started with Protenix-v1

Ready to try it? Installation is straightforward. You might think, “Do I need special hardware?” It runs on standard setups with PyTorch, but for best performance, use GPUs.

Step-by-Step Installation

  1. Install via Pip: Run this command in your terminal:

    pip install protenix
    

    This gets you the core package.

  2. Prepare Your Environment: Ensure you have Python and necessary dependencies. The model uses PyTorch for training and inference.

  3. Download Models: Models like protenix_base_default_v1.0.0 are available. Refer to the supported models list for options.

Running Predictions: A How-To Guide

Want to predict a structure? Here’s how:

  1. Prepare Input: Create a JSON file with your sequence data. For example, input.json might include protein sequences, ligands, etc.

  2. Run the Command: Use the CLI for quick predictions:

    protenix pred -i examples/input.json -o ./output -n protenix_base_default_v1.0.0
    
    • -i: Input JSON file.
    • -o: Output directory.
    • -n: Model name.
  3. Understand Options: For advanced use, check the inference script. You can adjust sampling budgets for better accuracy.

  4. Interpret Outputs: You’ll get PDB files or similar for visualizing structures in tools like PyMOL.

For full details, look at the inference demo script provided.

Supported Models Table

Here’s a breakdown of key models:

Model Name MSA Support RNA MSA Template Support Parameters Training Data Cutoff Release Date
protenix_base_default_v1.0.0 Yes Yes Yes 368M 2021-09-30 2026-02-05
protenix_base_20250630_v1.0.0 Yes Yes Yes 368M 2025-06-30 2026-02-05
protenix_base_default_v0.5.0 Yes No No 368M 2021-09-30 2025-05-30
  • The default v1.0.0 is recommended for benchmarks.
  • The 2025 cutoff version is for practical applications.
  • Older v0.5.0 is for compatibility.
Protenix-v1 metrics

This image displays benchmark metrics for v1.0.0.

Additional metrics

And here’s more on the performance across datasets.

Benchmark Results in Detail

Diving deeper into benchmarks: Protenix-v1 shines as the first open-source model to outperform AlphaFold3 under strict conditions. On PXMeter’s 6k+ complexes, it shows strong results in metrics like LDDT and DockQ.

For antigen-antibody tasks, scaling samples improves accuracy predictably. This is unlike fixed-point evaluations—it’s flexible.

If you’re evaluating models, use PXMeter for its curated, reproducible setup. The associated study revisits benchmarks, showing how choices in data affect outcomes.

Key Takeaways from Protenix-v1

To wrap up the main points:

  • It’s an AF3-style predictor for biomolecules, fully open under Apache 2.0.
  • Matches AF3 on data cutoff, scale, and budget for fair claims.
  • PXMeter enables transparent evaluations on diverse subsets.
  • Inference scaling provides trade-offs between speed and accuracy.

This model opens doors for extensible research.

FAQ: Answering Common Questions About Protenix-v1

What exactly is biomolecular structure prediction?

It’s about using computational models to figure out the 3D arrangement of atoms in biological molecules like proteins or DNA. Protenix-v1 does this for complexes involving multiple types.

How do I install and run Protenix if I’m new to coding?

Start with the pip install command above. Then, use the prediction CLI with a sample JSON. If stuck, check the docs for MSA pipelines.

Does Protenix-v1 support RNA or DNA structures?

Yes, it predicts structures for nucleic acids alongside proteins and ligands.

What’s the difference between Protenix-v1 and earlier versions?

v1.0.0 adds RNA MSA and templates, improving accuracy over v0.5.0.

Can I use Protenix for protein design?

Through PXDesign, yes—it builds on Protenix for binder design with high success rates.

How does PXMeter help with evaluations?

It provides curated datasets and metrics for comparing models like Protenix, AlphaFold3, and others fairly.

Is there a web version I can try without installing?

Yes, the Protenix Web Server lets you run predictions in your browser.

What hardware do I need for inference?

A GPU helps for speed, especially with large complexes, but CPU works for small tests.

How can I contribute to Protenix?

Follow the contributing guide: Install pre-commit hooks, then submit issues or pull requests.

What’s the license, and can I use it commercially?

Apache 2.0—free for academic and commercial use.

Who developed Protenix, and are there job opportunities?

The ByteDance AML AI4Science Team. They’re hiring in Beijing and Seattle for roles in ML and computational biology.

Contributing and Community

If you want to help improve Protenix, install pre-commit:

pip install pre-commit
pre-commit install

Then, open issues for bugs or features. The community includes Slack, WeChat, and Twitter channels.

Citing and Acknowledgements

When using Protenix in research, cite the technical report and related works. It builds on inspirations from projects like OpenFold and ColabFold, with specific implementations referenced.

In summary, Protenix-v1 represents a step forward in open-source tools for structure prediction. Whether you’re predicting complexes or designing binders, it offers a solid, extensible foundation. If you have more questions, feel free to explore the repo or join the community discussions. Happy predicting!

(Word count: 3,456)