A new framework exposes vulnerabilities in language model safety evaluations through concept-specific manipulations.
2 results for: vulnerabilities
Firefox 150 Fixes 271 Vulnerabilities Found Using Claude Mythos Preview
Mozilla patched 271 vulnerabilities after an initial security evaluation that used an early Claude Mythos Preview in collaboration with Anthropic.