From GPT-4 to GPT-5: Advancements and Challenges in Medical AI Introduction The rapid evolution of artificial intelligence (AI) has transformed healthcare, with large language models (LLMs) like GPT playing a pivotal role. A recent 2025 report by Stanford’s CRFM introduces MedHELM, a benchmark designed to evaluate AI’s medical capabilities. This article breaks down the key findings of GPT-5’s performance, highlighting its strengths, limitations, and implications for clinical practice. What is MedHELM? MedHELM is a comprehensive testing framework that evaluates AI models across eight critical medical tasks: Task Purpose Example MedCalc-Bench Numerical calculations Drug dosage, lab value analysis Medec Error detection …