Stanford’s MedAgentBench: The Real-World Test Lab for Healthcare AI Assistants

5 hours ago 高效码农

For years, the conversation around artificial intelligence in medicine has centered on one question: “Can it pass the test?” Large language models (LLMs) like GPT and Claude have dazzled us by acing the US Medical Licensing Exam (USMLE), proving they possess an encyclopedic knowledge of medical facts. But passing a written exam is only the first hurdle. The true, and far more critical, challenge is this: Can AI reliably do the job? Imagine an AI not just telling you the treatment for pneumonia, but actually logging into a hospital’s electronic health record (EHR) system, checking the patient’s specific allergies and …