ATBench unveils domain-specific benchmarks, ATBench-Claw and ATBench-Codex, enhancing trajectory safety evaluation.
2 results for: risk assessment
AI Models Show Risks for Biological Misuse Amid Evolving Safeguards
Recent benchmarks reveal AI models may enable biological weaponization by low-expertise users, raising urgent policy concerns.