Rogue in Production: Stress-Test AI Agents with A2A Red Teaming

1 months ago 高效码农

t’s 2 a.m. Slack is screaming. Your customer-support agent just gave a 15-year-old a vape-discount code, the legal team is drafting headlines, and your unit tests are still green. Sound familiar? Traditional QA wasn’t built for conversational, policy-bound, stochastically creative creatures. That’s exactly why Qualifire open-sourced Rogue—an A2A-native red-team that turns written policies into CI/CD gates. Below is the full field manual: install it, abuse it, ship with confidence. 1. The Gap No One Talks About What classic tests check What agents actually break Single-turn intent accuracy Multi-turn memory loss Static prompt answers Policy circumvention Scalar “LLM-as-Judge” score Audit-trail vacuum …