A new framework exposes vulnerabilities in language model safety evaluations through concept-specific manipulations.
6 results for: safety
Anthropic updates Claude Opus 4.7 system prompt with new tools and tighter safety guidance
Anthropic revised the Claude Opus 4.7 system prompt to add a PowerPoint agent, expand child-safety rules, and change interaction guidance.
Commission Marks One-Year Milestone of AI Continent Action Plan with Two Reports
The European Commission published two reports on AI adoption and policymaking to mark the first year of its AI Continent Action Plan, signalling sustained regulatory attention.
Making AI operational in constrained public sector environments
What is new here is that making AI operational in constrained public sector environments.
Second GPAI Signatory Taskforce meeting - Copyright Chapter
What is new here is that second GPAI Signatory Taskforce meeting - Copyright Chapter.
OpenAI Expands Agents SDK to Help Enterprises Build Safer Agents
OpenAI updated its Agents SDK to broaden enterprise agent-building capabilities as interest in agentic systems grows.