Agent Data Protocol (ADP): The Unified Standard Revolutionizing AI Agent Training

高效码农

2 months ago

Agent Data Protocol (ADP): The Revolutionary Solution Unifying AI Agent Training Data

Core Question This Article Addresses

How can we solve the fundamental problem of fragmented, inconsistently formatted AI agent training data? How does the ADP protocol integrate scattered training data from different formats into scalable training resources through a standardized representation language?

The Data Dilemma in Complex Tasks

In the AI large language model era, the pre-training phase benefits from abundant internet-scale data, but the post-training phase faces entirely different challenges. High-quality task-specific data requires careful curation, and agent application scenarios are particularly difficult because models must execute continuous actions and interact with environments iteratively.

Imagine when you want to train an AI agent capable of browsing websites, writing code, and debugging programs—you discover a frustrating reality: although many related datasets exist online, they come in different formats, use different interfaces, and follow different standards. Each dataset requires dedicated engineering work to process and integrate, much like trying to force different types of plugs into the same socket.

Agent Data Protocol (ADP): The Unified Standard Solution

ADP’s core idea is to decompose complex agent interactions into standardized sequences of actions and observations. Just as human-computer interaction always follows a “request-response” pattern, ADP unifies all agent behaviors into alternating sequences of Actions and Observations.

ADP’s Three Core Design Principles

Simplicity: ADP maintains a simple and intuitive structure, directly addressing the complexity of data curation challenges by providing a straightforward framework that eliminates the need for specialized per-dataset engineering.

Standardization: ADP provides unified representation that standardizes existing agent training datasets from various formats into a standardized format, addressing the challenge of heterogeneous dataset formats.

Expressiveness: ADP ensures that complex agent trajectories can be accurately expressed without losing critical information, solving the difficulty of analysis and comparison.

Technical Architecture: How Data Unification Works

ADP’s technical implementation is based on Pydantic schemas, where each standardized agent trajectory is represented as a Trajectory object containing:

✦ id: trajectory identifier
✦ content: alternating sequence of actions and observations representing agent interaction with users/environment
✦ details: flexible metadata dictionary for dataset-specific information

Action Type Classification

API Actions: Function calls with structured parameters and outputs capturing tool usage

✦ Include function (tool call name), kwargs (function argument dictionary), description (optional reasoning explanation)
✦ Example: goto(url=https://www.google.com) represented as APIAction(function=goto, kwargs={url: "https://www.google.com"})

Code Actions: Code generation and execution across programming languages

✦ Specify language (programming language), content (code to execute), description (optional reasoning explanation)
✦ Example: Python code block print("Hello World") represented as CodeAction(language=python, content="print(\"Hello World\")")

Message Actions: Natural language communication between agents and users

✦ Include content field documenting agent explanations, clarifications, and responses
✦ Example: MessageAction(content="How can I help you?")

Observation Type Classification

Text Observations: Capture text information from various sources, including user instructions and environmental feedback

✦ Include source (observation origin: “user” or “environment”), content (observed text)
✦ Example: Python execution output Execution result: Hello World converts to TextObservation(content="Hello World", source="environment")

Web Observations: Represent webpage state and content

✦ Include html (raw HTML content), axtree (webpage accessibility tree), url (current page URL), viewport size (browser viewport dimensions), image observation (optional screenshot data)

Practical Application: Unification Practice Across 13 Datasets

To verify ADP’s practical effectiveness, the research team implemented converters from 13 existing agent training datasets to ADP, covering programming, software engineering, API/tool use, and web browsing domains.

Dataset Diversity Analysis

Analysis of ADP-standardized datasets reveals significant diversity in trajectory lengths, action distributions, and reasoning patterns across different task domains:

Trajectory Length: Trajectory rounds vary dramatically across datasets, from 1 to 26.8 turns, averaging 10.1 turns. SWE datasets consistently exhibit longer trajectories, reflecting the inherent complexity of multi-step repository-level programming tasks.

Action Distribution Patterns: Web datasets (Mind2Web, NNetNav, Synatra) heavily favor API actions (80-100%) with minimal code execution, reflecting their focus on interface interaction. Conversely, coding datasets (Code-Feedback, CodeActInstruct) show high code usage (~60% code) with no API usage, emphasizing direct programming activities.

Function Reasoning Analysis: A striking finding is high function thought coverage across most datasets, with most achieving ≥90% coverage, indicating these training datasets consistently provide explanations for their actions. This characteristic is particularly valuable for interpretability and training agents with reasoning abilities.

Conversion Cost Analysis

ADP’s greatest practical value lies in significantly reducing data conversion costs. Traditional methods require custom converters for each dataset-agent pair, resulting in quadratic complexity O(D×A) workload. ADP linearizes costs to O(D+A):

Without ADP Scenario: 13 datasets totaling 4,892 lines of code, for 100 agent frameworks, total cost approximately 489,200 lines of code.

With ADP Scenario: Total cost approximately 12,592 lines of code, reducing workload by over 97%.

Experimental Results: Quantitative Proof of Performance Improvements

Significant Improvements Across Model Scales

ADP fine-tuning consistently improves performance across models, benchmarks, and agent frameworks. On SWE-Bench Verified, ADP training delivers remarkable improvements:

✦ 7B Models: Qwen-2.5-7B-Coder-Instruct improves from 0.4% to 20.2% (+19.8%)
✦ 14B Models: Achieves 34.4% (+32.4%)
✦ 32B Models: Reaches 40.3% (+38.1%), matching or exceeding Claude 3.5 Sonnet’s 33.6% performance

Effectiveness of Cross-Task Transfer

ADP not only performs excellently on specific tasks but more importantly avoids negative transfer effects from single-domain tuning on other tasks:

✦ SWE-Bench: ADP data training achieves 10.4%, while SWE-smith Only achieves only 1.0%
✦ WebArena: ADP-trained Qwen-2.5-7B-Instruct reaches 20.1%, compared to Go-Browse Only’s 16.0%
✦ GAIA: AgentInstruct Only results in 0.6% accuracy, while ADP improves it to 9.1%

These results clearly demonstrate that mixed ADP training yields better in-domain accuracy and stronger cross-task generalization than single-domain tuning.

Practical Application Value: Bridge from Research to Industry

Lowering Technical Barriers

ADP’s greatest value lies in democratizing access to agent training data. Previously, only organizations with substantial engineering resources could integrate multiple datasets; now any researcher can easily utilize standardized ADP data.

Reflection and Insights: In AI development, we often see technological innovation hindered by infrastructure complexity. ADP’s greatness lies not just in being a technical solution, but in being community infrastructure. It integrates scattered efforts into collective wisdom, allowing each contributor’s work to be reused by the entire community.

Accelerating Research Iteration

Traditional agent training requires writing specialized converters for each new dataset and framework, which is time-consuming and error-prone. ADP simplifies this process to “convert once, use everywhere.”

Lessons Learned: In technical standardization processes, simplicity is more important than functionality. ADP’s success lies not in how many complex concepts it can express, but in how simply it can express core concepts. This design philosophy is worth learning from for all projects involving data standardization.

Cross-Framework Adaptation: Seamless Integration with Multiple Agent Architectures

ADP’s key advantage is its framework-agnostic nature. The research team demonstrated how ADP data easily converts to three different agent architectures:

OpenHands Framework Adaptation

OpenHands is an open platform for building generalist AI agents that operate like software developers: writing code, using command lines, and browsing the web. ADP to OpenHands conversion involves:

✦ Mapping API actions to OpenHands tool calling interfaces
✦ Converting code actions to IPython execution environments
✦ Transforming web observations to OpenHands browser interface formats

SWE-Agent Framework Adaptation

SWE-Agent introduces a custom Agent-Computer Interface (ACI) enabling language model agents to autonomously perform software engineering tasks. ADP conversion process includes:

✦ Converting actions to structured bash commands and file operations
✦ Formatting observations to SWE-Agent understandable filesystem states
✦ Integrating test execution and result interpretation mechanisms

AgentLab Framework Adaptation

AgentLab is an open-source framework for developing, testing, and benchmarking web agents across diverse tasks. The conversion process focuses on:

✦ DOM-based web interaction mapping
✦ Standardized representation of accessibility tree structures
✦ Consistent interfaces across evaluation benchmarks

Data Quality Assurance: Ensuring Reliability After Standardization

ADP not only focuses on data format unification but also emphasizes data quality. The research team implemented multi-layered quality assurance mechanisms:

Automated Validation

✦ Tool Call Format Verification: Ensuring all API actions conform to expected formats
✦ Reasoning Coverage Checks: Verifying at least 80% of tool calls are paired with English thought processes
✦ Conversation Structure Integrity: Checking conversations end properly without dangling actions or observations

Domain-Specific Filtering

Based on different agent framework characteristics, ADP implements intelligent data filtering strategies:

OpenHands and SWE-Agent: Focus on the non-web portion of ADP training corpus, including datasets focused on code generation, software engineering, general agent instruction following, and API/tool use, excluding web browsing datasets to avoid interference from web-specific interaction patterns incompatible with command-line and programming environments.

AgentLab: Specifically use the web portion of ADP training corpus, including datasets focused on web navigation, browser-based task completion, and web-specific agent instruction following, ensuring models are optimized for web browsing patterns and UI element interaction.

Community Impact: Advancing Agent Training Democratization

Open Source Ecosystem Building

The ADP project not only released technical solutions but more importantly established a complete open-source ecosystem:

✦ ADP Schema Definitions: Clear data structure specifications
✦ Converter Code: Complete conversion implementations for 13 datasets
✦ Training Scripts: SFT conversion tools for different frameworks
✦ Evaluation Benchmarks: Standardized performance evaluation protocols

Community Contribution Mechanisms

ADP designed modular contribution processes:

Dataset Contributors: Only need to write one Raw→ADP converter per dataset
Framework Developers: Only need to maintain one ADP→SFT conversion script
Researchers: Can immediately access all standardized data

This design creates positive feedback loops: more dataset contributions → richer training resources → better agent performance → more framework support → attracting more contributors.

Future Prospects: Multimodal and Standardized Evaluation

Multimodal Extensions

The ADP team plans to extend the protocol beyond text to include:

✦ Image Understanding: How agents process and respond to visual information
✦ Screen Recordings: Recording and replaying complex GUI interactions
✦ Audio Processing: Voice interaction and audio tool use
✦ Sensor Data: Agent perception in IoT environments

Standardized Evaluation

Drawing from ADP’s success in data standardization, the team proposes extending standardization concepts to evaluation and environment settings:

✦ Unified Evaluation Metrics: Consistent performance measures across different task domains
✦ Standardized Testing Environments: Reproducible agent evaluation infrastructure
✦ Benchmark Datasets: Standard test suites covering diverse agent capabilities

Cross-Dataset Analysis: Revealing Patterns and Trends

Trajectory Complexity Patterns

The standardized ADP format enables systematic analysis across previously incompatible datasets. Key findings include:

Task Domain Specialization: Clear domain-specific preferences emerge from action distributions after standardization. Web datasets heavily favor API actions (80-100%) with minimal code execution, while coding datasets show high code usage (~60%) with no API usage. Software engineering datasets demonstrate mixed patterns, relying on API actions like file writes while also using code actions for generation and execution.

Reasoning Consistency: A striking finding is the high function thought coverage across most datasets, with most achieving ≥90% coverage. This indicates that these training datasets consistently provide explanations for their actions, which is particularly valuable for interpretability and training agents with reasoning abilities.

Trajectory Length Variations: Trajectory rounds vary dramatically across datasets, from 1 to 26.8 turns, with an average of 10.1 turns. SWE datasets consistently exhibit longer trajectories, reflecting the inherent complexity of multi-step repository-level programming tasks.

Performance Scaling: Model Size and Capability Relationships

Consistent Gains Across Scales

The experimental results demonstrate clear monotonic gains with model size and consistent improvements from ADP training across agents and tasks. ADP-trained models outperform their base counterparts at every scale:

7B Models: Show substantial improvements across all benchmarks, with gains ranging from 16.5% to 23.6% depending on the specific task and agent framework.

14B Models: Demonstrate even greater improvements, with some benchmarks showing gains over 30%, indicating that ADP’s standardized approach becomes more effective with larger model capacities.

32B Models: Achieve state-of-the-art or near-state-of-the-art performance across multiple domains, with some results matching or exceeding proprietary models like Claude 3.5 Sonnet.

Cross-Task Generalization Benefits

Perhaps most importantly, ADP training enables strong cross-task generalization that single-domain tuning cannot achieve. This is crucial for building versatile agents that can handle diverse real-world scenarios without requiring specialized training for each task type.

Technical Implementation Insights: Lessons from Real-World Deployment

Conversion Pipeline Efficiency

The three-stage conversion pipeline (Raw to Standardized, Standardized to SFT, Quality Assurance) has proven highly effective in practice:

Stage 1 – Raw to Standardized: This stage unifies original dataset formats into the ADP standardized schema. Each dataset is extracted in its raw format and converted to the ADP schema by mapping dataset-specific actions and observations to ADP’s standardized action and observation space.

Stage 2 – Standardized to SFT: This stage converts ADP standardized trajectories into supervised fine-tuning formats suitable for training language models. Different agent frameworks operate with distinct action spaces and observation formats, requiring framework-specific conversion scripts.

Stage 3 – Quality Assurance: This stage ensures data correctness and consistency through automated validation, including tool call format verification, reasoning coverage checks, and conversation structure integrity validation.

Engineering Best Practices

From the implementation experience, several best practices emerge:

Modular Design: Keeping conversion logic modular allows for easy maintenance and extension. Each dataset converter and each framework adapter can be developed and tested independently.

Automated Testing: Comprehensive automated testing of conversion pipelines ensures data quality and prevents regression as the system evolves.

Documentation Standards: Clear documentation of ADP schemas and conversion processes enables community participation and reduces the learning curve for new contributors.

Economic Impact: Quantifying the Value Proposition

Development Cost Reduction

The economic impact of ADP becomes clear when considering development costs:

Before ADP: Organizations needed to hire specialized engineers to build custom converters for each dataset-framework combination. This created a significant barrier to entry for smaller organizations and research groups.

After ADP: The same organizations can leverage community-contributed converters and focus their engineering resources on core agent capabilities rather than data integration infrastructure.

Research Acceleration

ADP accelerates research by enabling:

✦ Faster Experimentation: Researchers can quickly test ideas across multiple datasets without extensive preprocessing
✦ Better Reproducibility: Standardized formats make it easier to reproduce and build upon existing work
✦ Increased Collaboration: Shared data formats facilitate collaboration between research groups

Commercial Applications

For commercial applications, ADP enables:

✦ Rapid Prototyping: Companies can quickly prototype new agent capabilities using standardized data
✦ Cost Reduction: Reduced infrastructure costs for data processing and management
✦ Scalability: Easier scaling of agent training as new datasets and frameworks become available

Quality Control Mechanisms: Ensuring Data Integrity

Validation Framework

ADP implements a comprehensive validation framework to ensure data quality:

Schema Validation: All ADP-converted data must conform to the defined Pydantic schemas, ensuring structural consistency.

Content Validation: Automated checks verify that actions and observations are semantically consistent and follow expected patterns.

Cross-Reference Validation: Ensures that tool calls have corresponding observations and that conversation flows are logical.

Human-in-the-Loop Quality Assurance

While automation handles most quality checks, human oversight remains important for:

✦ Edge Case Handling: Unusual scenarios that automated systems might miss
✦ Domain Expertise: Ensuring technical accuracy in specialized domains
✦ Continuous Improvement: Identifying patterns that suggest systematic issues in conversion processes

Community Adoption: Building a Sustainable Ecosystem

Open Source Strategy

The decision to release ADP as open source has been crucial for community adoption:

✦ Transparency: Researchers can inspect and understand the standardization process
✦ Extensibility: The community can extend ADP to new domains and use cases
✦ Trust: Open development builds trust and encourages participation

Contribution Guidelines

Clear contribution guidelines help maintain quality while encouraging participation:

✦ Coding Standards: Consistent code style and documentation requirements
✦ Testing Requirements: All contributions must include appropriate tests
✦ Review Process: Peer review ensures quality and catches issues early

Documentation and Support

Comprehensive documentation and community support are essential for adoption:

✦ Getting Started Guides: Step-by-step tutorials for new users
✦ API Documentation: Detailed documentation of ADP schemas and conversion processes
✦ Community Forums: Platforms for questions, discussions, and collaboration

Real-World Case Studies: ADP in Action

Case Study 1: Academic Research Acceleration

A university research group used ADP to combine datasets from web browsing, code generation, and tool use research. Before ADP, integrating these datasets required months of engineering work. With ADP, they achieved integration in days and could focus their efforts on novel research questions rather than data preprocessing.

Case Study 2: Startup Product Development

A startup building AI agents for customer service used ADP to rapidly prototype and test different agent capabilities. The standardized data format allowed them to quickly iterate on different training approaches and compare results across multiple benchmarks.

Case Study 3: Enterprise AI Integration

A large enterprise used ADP to integrate agent training data from multiple departments and use cases. The standardization enabled them to build more capable agents that could handle diverse tasks while maintaining consistent performance standards.

Technical Challenges and Solutions

Challenge 1: Handling Diverse Data Sources

Problem: Different datasets used vastly different formats, conventions, and structures.
Solution: ADP’s flexible schema design allows for mapping diverse data types to standardized representations while preserving semantic meaning.

Challenge 2: Maintaining Data Quality During Conversion

Problem: Converting between formats risks introducing errors or losing information.
Solution: Multi-layered validation and automated testing ensure conversion quality while preserving all relevant information.

Challenge 3: Balancing Standardization with Flexibility

Problem: Over-standardization might lose important domain-specific information.
Solution: ADP’s design includes extensibility mechanisms that allow for domain-specific extensions while maintaining core standardization.

Future Research Directions

Automated Conversion

Future work will focus on developing more automated conversion tools that can handle new datasets with minimal human intervention, further reducing the barrier to entry for new contributors.

Quality Metrics

Developing better automated quality metrics will help ensure that converted data maintains high standards while enabling faster validation processes.

Performance Optimization

Optimizing conversion pipelines for performance will enable real-time or near-real-time data processing, opening up new use cases for dynamic agent training.

Practical Summary and Action Checklist

ADP Implementation Steps

Assess Existing Data: Identify agent training datasets that need standardization
Design Converters: Develop Raw→ADP conversion scripts for each dataset
Quality Validation: Run automated validation to ensure conversion quality
Framework Adaptation: Develop ADP→SFT converters for target agent frameworks
Training Integration: Integrate standardized data into training pipelines

Key Success Factors

✦ Maintain Simplicity: Avoid over-complicating ADP schemas
✦ Automate Validation: Implement comprehensive quality check mechanisms
✦ Encourage Community Collaboration: Promote open source contributions and standardization practices
✦ Iterate Continuously: Improve protocols based on usage feedback

Getting Started Guide

Explore Existing Converters: Review community-contributed converters for similar datasets
Start Small: Begin with one dataset and one target framework
Validate Thoroughly: Use automated tools to verify conversion quality
Contribute Back: Share your converters with the community
Scale Gradually: Expand to additional datasets and frameworks as you gain experience

Frequently Asked Questions (FAQ)

Q1: How does ADP differ from other data standardization solutions?
ADP focuses on the specific needs of agent training data, particularly the unified representation of action-observation sequences, rather than just general data format standardization.

Q2: What is the cost of converting existing datasets to ADP?
Based on practical experience, converting 13 datasets totals approximately 4,892 lines of code, averaging about 376 lines per dataset, reducing workload by over 97% compared to traditional many-to-many conversion approaches.

Q3: Does ADP support real-time data conversion?
Current ADP primarily targets offline data standardization, but the design architecture supports real-time conversion scenarios and can be extended based on specific requirements.

Q4: How is data quality ensured after ADP conversion?
ADP implements multi-layered quality assurance mechanisms, including format validation, reasoning coverage checks, and conversation structure integrity validation through automated checks.

Q5: How does ADP handle code actions for different programming languages?
ADP’s CodeAction schema supports all mainstream programming languages, specifying language type through the language field and storing specific code content in the content field.

Q6: How can one participate in ADP community contributions?
Community contributions can be made through open source repositories by submitting new dataset converters, reporting issues, improving documentation, or extending ADP schema definitions.

Q7: What is ADP’s value in commercial applications?
ADP significantly reduces technical barriers and costs for agent training, enabling more organizations to utilize standardized training data and improve agent development efficiency.

Q8: What are ADP’s future development directions?
Main development directions include multimodal support, standardized evaluation system extensions, stronger automated validation mechanisms, and integration with more agent frameworks.

Author’s Reflection: In the history of AI development, we have witnessed multiple revolutionary changes brought about by standardization. From HTTP protocol unifying internet communication to JSON format simplifying data exchange, standardization has always been a key factor in promoting technology popularization and innovation. ADP is likely to become the “HTTP protocol” of the agent era. It not only solves current technical problems but more importantly lays the foundation for future agent ecosystems. The value of such foundational work often takes time to fully manifest, but its impact will be profound.