Tongyi DeepResearch: The Intelligent Agent Model Ushering in a New Era of Deep Information Retrieval
In today’s rapidly evolving artificial intelligence landscape, Large Language Models (LLMs) are fundamentally changing how we access and process information. However, when faced with complex, open-ended tasks that require multi-step reasoning and deep information seeking, traditional models often fall short. To address this challenge, Tongyi Lab has developed and released Tongyi DeepResearch—a massive agentic language model with 30 billion total parameters, but activating only 3 billion parameters per token. It is specifically engineered for long-horizon, deep information-seeking tasks and has demonstrated state-of-the-art performance across a range of authoritative agent-based search benchmarks.
Core Features of the Model
Fully Automated Synthetic Data Generation Pipeline
High-quality model training is inseparable from high-quality data. Behind Tongyi DeepResearch lies a highly scalable, fully automated data synthesis pipeline. This pipeline can autonomously generate high-quality data for agent pre-training, supervised fine-tuning, and reinforcement learning, laying a solid foundation for the model’s exceptional capabilities.
Large-Scale Continual Pre-Training on Agentic Data
To continuously expand the model’s capabilities and maintain the timeliness of its knowledge, the development team utilized diverse, high-quality agent interaction data for large-scale continual pre-training. This process not only enhanced the model’s reasoning performance but also ensured its ability to cope with rapidly changing information environments.
End-to-End Reinforcement Learning Framework
The model’s training employs a strict on-policy reinforcement learning method, based on a customized Group Relative Policy Optimization (GRPO) framework. This framework incorporates token-level policy gradients, leave-one-out advantage estimation, and selective filtering of negative samples, effectively stabilizing the training process in a non-stationary environment.
Compatibility with Multiple Agent Inference Paradigms
In practical application (inference), Tongyi DeepResearch is compatible with two modes:
-
ReAct Mode: Used to rigorously evaluate the model’s core intrinsic abilities, emphasizing its step-by-step reasoning and action capabilities. -
Iterative-Research-Based “Heavy” Mode: This mode employs a test-time scaling strategy aimed at fully unleashing the model’s performance ceiling to tackle extremely complex search tasks.
Model Download and Access
The Tongyi-DeepResearch-30B-A3B
model is now officially released. You can download it through the following channels:
Comprehensive Quick Start Guide
Here is a detailed step-by-step guide on how to quickly set up your environment and run the model inference scripts.
1. Environment Setup
It is recommended to use Python version 3.10.0 to avoid potential dependency conflicts. It is highly advised to use Conda or Virtualenv to create an isolated virtual environment.
# Create an environment using Conda
conda create -n react_infer_env python=3.10.0
conda activate react_infer_env
2. Installing Dependencies
Within the activated environment, install all required dependency packages for the project to run.
pip install -r requirements.txt
3. Preparing Evaluation Data
Model inference requires data in a specific format.
-
Create a folder named eval_data/
in the project’s root directory. -
Place your question-answering data file in JSONL format into this directory, for example, eval_data/example.jsonl
. -
Each line of this file must be a JSON object and must contain both the question
andanswer
keys:{"question": "Your question text here", "answer": "The corresponding reference answer here"}
-
A sample file is provided in the project’s eval_data
folder for your reference. -
Special Note: If you plan to use the file parser tool, you need to prepend the filename to the question
field and place the referenced file in theeval_data/file_corpus/
directory.
4. Configuring the Inference Script
Next, you need to configure the run script. Open the run_react_infer.sh
file and modify the following key variables according to the instructions in the comments within the script:
-
MODEL_PATH
: The local or remote path to the model weights file. -
DATASET
: The filename of the evaluation dataset (without the path and extension), e.g.,example
. -
OUTPUT_PATH
: The output save path for the model’s prediction results, e.g.,./outputs
.
Furthermore, depending on the tools you wish to enable (such as web search, calculator, file parser, etc.), you may need to provide corresponding API keys or access credentials (e.g., API_KEY
, BASE_URL
). Explanations for these configuration items are provided in comments within the script.
5. Running the Inference
After completing all configurations, you can run the script to start the inference process.
bash run_react_infer.sh
By following the steps above, you can complete the environment preparation, data configuration, and run the model for inference. For more details, please refer to the comments within the scripts or consult the project documentation.
Exceptional Performance Benchmarks
Tongyi DeepResearch has achieved leading results in multiple challenging agent-based search benchmarks, including:
-
Humanity’s Last Exam -
BrowserComp and BrowserComp-ZH -
WebWalkerQA -
xbench-DeepSearch -
FRAMES -
SimpleQA
Its comprehensive performance is shown in the figure below,充分证明 (fully demonstrating) its powerful capabilities in complex information-seeking tasks.
The Deep Research Agent Family
Tongyi DeepResearch is not an isolated project but part of a vast and continually evolving deep research agent family. This family includes multiple cutting-edge research projects focused on different directions, collectively advancing the technology of intelligent agents:
-
WebWalker: Focuses on benchmarking LLMs on web traversal tasks. -
WebDancer: Dedicated to achieving autonomous information-seeking agent capabilities. -
WebSailor: Explores navigation mechanisms for super-human reasoning in web agents. -
WebShaper: Agentically synthesizes data through information-seeking formalization. -
WebWatcher: Breaks new ground in vision-language deep research agents. -
WebResearcher: Research into unleashing unbounded reasoning capability in long-horizon agents. -
ReSum: Unlocks long-horizon search intelligence through context summarization. -
WebWeaver: Structures web-scale evidence with dynamic outlines for open-ended deep research. -
WebSailor-V2: Bridges the gap to proprietary agents via synthetic data and scalable reinforcement learning. -
AgentFounder: Scales agent foundational capabilities through continual pre-training. -
AgentScaler: Moves towards general agentic intelligence via environment scaling.
These interconnected researches form Tongyi’s grand technical blueprint in the field of general-purpose agents.
Frequently Asked Questions (FAQ)
Q1: How is Tongyi DeepResearch different from ordinary language models (like ChatGPT)?
A: Ordinary language models excel at conversation and text generation, while Tongyi DeepResearch is an “intelligent agent” specifically designed for “deep information search.” It acts more like a digital assistant that can autonomously plan, execute multi-step operations (like searching, calculating, reading files), and perform complex reasoning. It is专门设计 (specifically designed) to solve complex problems that require prolonged thinking and multi-dimensional information integration.
Q2: What does the 30B-A3B parameter size mean?
A: This means the model has a total of 30 billion parameters but uses an advanced Mixture of Experts (MoE) architecture. When processing any given problem, it only dynamically activates and uses 3 billion of these parameters. The benefit of this approach is that it maintains extremely strong capabilities while significantly improving computational efficiency and reducing inference costs.
Q3: Do I need very specialized programming knowledge to use it?
A: Basic use does not require it. We provide detailed scripts and configuration instructions. Users familiar with basic command-line operations can successfully run the examples by following the “Quick Guide” steps. Of course, deeper customization or integration into your own systems would require more technical development knowledge.
Q4: Does it support Chinese?
A: Yes! As evidenced by its exceptional performance on the BrowserComp-ZH (Chinese browser task benchmark), the model possesses equally powerful understanding and processing capabilities for Chinese, enabling it to effectively complete deep search tasks in Chinese environments.
Q5: What context length does the model support?
A: The model supports a context window of up to 128K tokens. This means it can read and understand very long documents (like reports hundreds of pages long) in one go and maintain coherent reasoning throughout the entire long context, which is crucial for deep research tasks.
Conclusion
Tongyi DeepResearch represents a significant milestone in the journey of large language models towards general-purpose agents. Through innovative model architecture, large-scale high-quality data training, and advanced reinforcement learning techniques, it sets a new performance benchmark for long-horizon, deep-layer information-seeking tasks. For both academic researchers and developers, it provides a powerful tool to explore and build the next generation of AI applications capable of autonomous understanding, reasoning, and interaction.
You can visit its GitHub project homepage to access the latest code and detailed documentation and experience the powerful capabilities of the deep research agent firsthand.