A new framework aims to improve the assessment of medical AI benchmarks, addressing key shortcomings.
3 results for: evaluation framework
New LLM Framework Enhances Mathematical Reasoning Evaluation
A novel LLM-based framework provides flexible evaluation of mathematical reasoning, addressing limitations of symbolic methods.
RepIt Framework Enables Concept-Specific Refusal in Language Models
A new framework exposes vulnerabilities in language model safety evaluations through concept-specific manipulations.