How to Build a Self-Validating AI-Assisted Programming Workflow

高效码农

2 months ago

Getting AI to Execute Smooth Combos: Coding, Deployment, Self-Testing, and Bug Fixing

In the increasingly popular field of AI-assisted programming, many developers have noticed an interesting phenomenon: AI can generate code rapidly, but this code often contains various minor issues that require repeated manual inspection and modification. This is akin to an intern who writes extremely fast but never self-reviews, consistently submitting work full of flaws. We refer to this as the “last mile” problem in AI programming.

The Dilemma of AI Programming: Why is Generated Code Never Perfect?

Imagine this scenario: You describe a functional requirement to an AI, and it quickly provides the code implementation. While initially pleased, you begin testing the code only to discover problem after problem. At this point, you face two choices:

Option One: Manual Modification Mode

Manually inspect each line of AI-generated code
Personally locate the source of problems
Manually fix each bug
Test repeatedly until everything passes

Option Two: Conversational Repair Mode

“There’s a bug here, please fix it.”
“That’s still not right, it should use the XXX method.”
“Try again, the logic is off.”
After a dozen rounds of dialogue, complete exhaustion sets in.

The fundamental issue with both these modes is the lack of an automated acceptance and iteration mechanism. Recall our human development process: Code → Deploy → Self-Test → Fix Bugs → Re-Test. This is a complete quality assurance cycle. However, in current AI programming practice, we often only complete the first step before letting the AI “clock off.”

The “Last Mile” Problem in AI Programming

Breaking Through the Dilemma: A Test-Driven AI Programming Workflow

Based on these observations, we designed a test-driven, closed-loop AI programming workflow. The core idea is simple: use clear test cases as acceptance criteria, enabling the AI to independently judge task completion quality and automatically iterate and fix issues when expectations are not met.

Overall Architecture Design

This workflow’s tech stack includes:

Core Tool: iFlow CLI
AI Model: qwen3-coder-plus
Deployment Component: java-dev-project-deploy Agent
Testing Tool: HSF Debugging Tool

Experimental Design: Building a Closed-Loop Verification AI Workflow

Detailed Breakdown of Core Components

1. Deployment Agent: Automated Environment Deployment

The core task of the Deployment Agent is to allow the AI to autonomously complete project environment deployment and perceive the deployment status in real-time through a polling mechanism.

Deployment Process Steps:

Environment Information Acquisition
- Read the project environment ID from the configuration file .iflow/dev/progressInfo.json
- If it doesn’t exist, prompt the user to supplement the relevant information
Application Environment Identification
- Call the group_env_apres_list tool
- Obtain the application environment ID
- Update the relevant field in the configuration file
Deployment Execution
- Call the apre_deploy tool to initiate the deployment process
- Record the deployment start time and metadata
Status Monitoring
- Check the deployment status every 50 seconds
- Monitor changes in the selfStatus field
- A transition from DEPLOYING to RUNNING indicates success
- Implement a 10-minute timeout protection mechanism
Result Logging
- Regardless of success or failure, record deployment information in a log file
- Includes timestamp, environment information, branch version, and final result

2. HSF Debugging Tool: Standardized Testing Interface

The HSF debugging tool is encapsulated as an mcp tool (hsf-invoke), enabling standardized testing through HSF generic calls.

Standard Call Parameter Format:

{
  "serviceName": "com.taobao.mercury.services.FavoriteCountService",
  "methodName": "getFavoriteCount",
  "paramTypes": ["long"],
  "paramValues": [88888888],
  "targetIp": "33.4.XX.XX"
}

3. Automated Debugging Command: Intelligent Problem Location and Repair

Automated debugging is the core of the entire workflow, endowing the AI with self-diagnosis and repair capabilities.

Automated Debugging Execution Steps:

Document Verification
- Check the requirement document (prd.md) at the specified path
- Verify the technical solution document (techDoc.md)
- Confirm the test case document (testCase.md) exists and is complete
Test Execution
- Parse test scenarios from the test case document
- Use the hsf-invoke tool to call HSF interfaces
- Record execution results in the debug log file
Result Analysis
- Compare actual results with expected results
- Calculate differences and identify test cases that do not meet expectations
- Conduct problem analysis combining requirement documents and technical solutions
Code Repair
- Locate the problematic code segment
- Modify code logic errors (shortcuts like mocks are prohibited)
- Ensure the fix aligns with the original requirements
Code Submission
- Verify the code compiles successfully
- Submit code to the version management system
- Adhere to standardized commit message formats
Automatic Deployment
- Invoke the java-dev-project-deploy Agent
- Deploy the fixed code to the project environment
- Monitor the deployment process until completion
Verification Iteration
- Re-execute test cases after successful deployment
- Record verification results in the debug log
- If results still don’t meet expectations, repeat the repair process

Practical Exercise: Favorites Feature Auto-Repair Case Study

To validate the practical effectiveness of this workflow, we designed a relatively simple yet representative test scenario: repair the favorites item count statistics feature to ensure Fliggy items are correctly excluded.

Test Environment Preparation

Requirement Document (prd.md) Content:

Requirement: Count of items in favorites, exclude Fliggy items count.

Technical Solution (techDoc.md) Content:

In the `com.taobao.mercury3.hsfprovider.hsf.HsfFavoriteCountService.getFavoriteCount` interface, remove logic related to Fliggy item statistics.

Test Case (testCase.md) Content:

# Test Cases

## Test Case 1

### Test Steps
1. Call HSF service: `com.taobao.mercury.services.FavoriteCountService`
2. Call HSF interface: `getFavoriteCount`
3. Target IP: `33.4.XX.XX`
4. Input Parameter Type: Primitive data type `long`
5. Input Parameter Value: `888888`
6. Expected Return Result: `3951`

Environment Configuration Information:

{
  "groupEnvId": "4355970",
  "apreEnvId": ""
}

Full Automated Repair Process

Execute the command in iFlow: /auto-debugging .iflow/dev/requirements/FavoritesItemCountExcludeFliggy. The entire repair process runs fully automatically.

Step One: Problem Discovery and Location

The AI first verifies the integrity of all documents, then executes the test cases. It calls the HSF interface to get the actual return result and compares it with the expected value, identifying inconsistencies.

The test results show a difference between the actual return value and the expected value. The AI automatically identifies this as a problem requiring repair. The system begins analyzing the root cause, conducting a comprehensive diagnosis combining the requirement document, technical solution, and code logic.

Step Two: Code Analysis and Repair

The AI locates the problematic code segment, finding that the Fliggy item statistics logic was not correctly removed from the getFavoriteCount interface. Based on the technical solution requirements, the AI makes precise modifications to the code, deleting the relevant Fliggy item statistics code.

After modification, the AI ensures the code compiles normally and submits it according to standards. The commit message clearly records the fix content and reason for future tracking and understanding.

Step Three: Automatic Deployment and Verification

After code submission, the AI automatically invokes the Deployment Agent to deploy the fixed code to the project environment. The deployment process includes environment validation, application deployment, and status monitoring, all without human intervention.

After successful deployment, the AI immediately re-executes the test cases to verify the fix. This marks the beginning of a new cycle, ensuring the problem is thoroughly resolved.

Step Four: Result Confirmation and Cyclic Verification

The second test run shows the actual return value perfectly matches the expected value, confirming a successful repair. If discrepancies remained, the system would automatically initiate a new repair cycle until all test cases pass.

Workflow Value and Key Insights

This relatively simple experimental scenario successfully validated the feasibility of the test-driven AI programming workflow. The core value lies in demonstrating that with clear acceptance criteria and feedback mechanisms, AI can indeed possess self-acceptance and iteration capabilities.

Key Elements for Success

Clear Acceptance Criteria
Test cases translate abstract requirements into specific, verifiable standards. They act as both a “translator” for the AI to understand requirements and a “measure” for accepting results.

Complete Feedback Loop
A full closed loop is designed, from code generation to test execution, and from problem diagnosis to repair verification. This cycle mimics the human developer’s working mode but achieves full automation.

Standardized Workflow
Human development experience is solidified into repeatable, automated processes. Each component has clear responsibility boundaries and interaction protocols, ensuring system reliability and maintainability.

Practical Application Effects

In this case study, the AI system demonstrated impressive capabilities:

Autonomously understanding business requirements and technical solutions
Accurately executing test cases and identifying problems
Precisely locating code defects and performing repairs
Fully automated deployment and verification throughout the process
Achieving a successful fix in a single iteration

Future Optimization Directions and Development Prospects

While the current workflow performs well in simple scenarios, enhancing it to handle more complex real-world development situations requires improvements in multiple areas.

Enhanced Testing Capabilities

Automated Test Case Generation: Generate test cases based on requirement documents, covering normal and edge scenarios.
Complex Parameter Handling: Support testing with complex inputs like structured data and collection types.
Experimental Environment Adaptation: Handle special testing environment needs like experiment whitelisting and traffic scheduling.
Integration of Professional Testing Tools: Incorporate professional toolchains from testing teams to improve test coverage and accuracy.

Strengthened Problem Diagnosis Capabilities

Multi-dimensional Log Analysis: Conduct comprehensive problem analysis combining diagnostic logs and SLS logs.
Real-time Traffic Capture: Use network packet capture tools to obtain real-time data for problem reproduction.
Intelligent Root Cause Analysis: Perform deeper error diagnosis based on technical solutions and requirement documents.
Optimized Repair Strategies: Build a library of repair patterns for common issues to improve repair efficiency.

Task Decomposition and Planning

Complex Requirement Breakdown: Decompose large requirements into logically clear, well-bounded sub-tasks.
Dependency Management: Identify and manage dependencies between tasks, optimizing execution order.
Dynamic Priority Adjustment: Adjust repair priorities based on problem severity and impact scope.
Progress Visualization: Provide visual displays of task execution progress for easier monitoring and understanding.

Engineering Efficiency Improvement

Hot Deployment Support: Integrate hot deployment APIs to reduce deployment wait times.
Build Process Optimization: Automatically diagnose and fix build errors by obtaining build logs via MCP.
Quality Gate Integration: Integrate quality checkpoints like code review and performance testing into the workflow.
Resource Utilization Monitoring: Monitor system resource usage to optimize performance.

Quality Assurance System

Code Review Agent: Automatically perform code standard checks and design pattern validation.
Performance Optimization Agent: Analyze code performance bottlenecks and provide optimization suggestions.
Security Detection Agent: Identify potential security vulnerabilities and risks.
Compatibility Verification Agent: Ensure code compatibility across different environments.

Frequently Asked Questions (FAQ)

What kind of development scenarios is this workflow suitable for?

This test-driven AI programming workflow is particularly suitable for development tasks with clearly defined inputs and outputs, such as API interface development, business logic implementation, and bug fixing. For tasks with clear requirements and writable test cases, the workflow can deliver maximum benefits.

What information needs to be prepared for the AI to work properly?

To enable the AI to successfully complete automated programming tasks, three core documents need to be provided:

Requirement Document: Clearly describes the business needs and goals.
Technical Solution: Explains the technical implementation approach and architectural design.
Test Cases: Define specific test steps and expected results.

What happens if the AI cannot fix the problem automatically?

The workflow designs multiple safeguard mechanisms. If the AI cannot resolve the issue within a specified number of cycles, the system records detailed diagnostic information and notifies a human developer to intervene. Meanwhile, all repair attempts are fully logged, providing sufficient context for human intervention.

Will this workflow add extra maintenance costs?

Initial time investment is required to establish standardized workflows and toolchains. However, once operational, it can significantly reduce maintenance costs. Automated problem discovery and repair reduce manual debugging time, and standardized processes lower communication overhead.

How is the quality of AI-repaired code ensured?

The system ensures code quality through multiple stages: test case verification for functional correctness, compilation checks for syntactic validity, and deployment verification for runtime environment compatibility. Code review and performance testing quality gates will also be integrated in the future.

Is this solution only applicable to Java development?

The current implementation is based on a Java technology stack, but the core workflow design and concepts can be migrated to other programming languages. Different languages require adaptation of corresponding deployment tools, testing frameworks, and debugging methods.

Conclusion

The test-driven AI programming workflow represents a new mode of human-machine collaboration. It no longer simply uses AI as a code generation tool but elevates it to a development partner capable of self-verification and iteration. This model solves the “last mile” problem in AI programming, enabling AI to truly assume full-process responsibility for development tasks.

Although the current implementation has much room for optimization, the direction is clear: through sound engineering design and standardized processes, we can unlock AI’s greater potential in the software development field. Future software development may increasingly feature this collaborative mode where humans set goals and AI autonomously implements them, allowing developers to focus more on creative architectural design and business innovation.

This article summarizes practical work by Jiexiang, Moye, and Changji from the Taotian Group – User Message and Social Team. Our team focuses on building user message and social experiences within the Taobao ecosystem and continuously explores and practices the application of AI technology in R&D processes.