A new framework aims to improve the assessment of medical AI benchmarks, addressing key shortcomings.