A novel LLM-based framework provides flexible evaluation of mathematical reasoning, addressing limitations of symbolic methods.
2 results for: evaluation frameworks
RepIt Framework Enables Concept-Specific Refusal in Language Models
A new framework exposes vulnerabilities in language model safety evaluations through concept-specific manipulations.