A new arXiv paper introduces DeepTrap, a black-box framework that finds execution-context attacks against OpenClaw and publishes a 42-case benchmark and code.
18 results for: Red
Google Releases Gemini-SQL2; Gemini 3.1 Pro Scores 80.04% on BIRD
Google Research announced Gemini-SQL2, a Gemini 3.1 Pro-powered text-to-SQL capability that posted 80.04% execution accuracy on the BIRD single-model leaderboard.
Africa CDC and WHO launch $518M continental Ebola response plan
A six-month 'One Response' plan targets the Bundibugyo Ebola outbreak with unified coordination, surveillance, clinical care and community engagement across affected countries.
DKPS method cuts model-evaluation queries using cached responses
An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.
MPMMine standardizes benchmarks for constraint-acquisition research
An arXiv preprint introduces MPMMine, a benchmark suite built to supply the domain artifacts and structured data constraint-acquisition methods need for reproducible evaluation.
OpenAI, Thrive and Crete Build Self‑Improving Tax Agent Using Codex
OpenAI and partners built a Codex-powered tax agent they say automates filings, improves accuracy, and accelerates tax workflows for developers and operators.
New Study Reveals Limits of Model-Level Evaluations in Alignment Assessments
A recent paper argues that alignment evaluation cannot solely rely on model-level assessments.
NATO Calls for Governance Standards on AI-Enhanced Geospatial Intelligence
NATO emphasizes the need for policies to govern the sharing of AI-powered geospatial intel to enhance allied operations.
OpenAI-Microsoft AGI Clause Ends, Changing IP Landscape
The unique AGI clause between OpenAI and Microsoft has been redefined, impacting IP rights.
Sierra Acquires YC-Backed AI Startup Fragment to Enhance Customer Service
Sierra, founded by Bret Taylor, has acquired French AI startup Fragment, bolstering its customer service capabilities.
AI's Growth Demands Robust Data Fabric for Business Impact
As AI technologies proliferate in enterprises, the need for a strong data fabric becomes crucial.
Gas-Powered Data Centers May Emit More GHG Than Nations
Emerging gas-powered data centers linked to major tech firms could release over 129 million tons of greenhouse gases annually.
Xiaomi Launches MiMo-V2.5-Pro and MiMo-V2.5 at Lower Costs
Xiaomi's new MiMo models achieve frontier benchmarks while reducing token costs significantly.
Qwen 3.6-27B Model Surpasses Previous Coding Benchmarks
The new Qwen 3.6-27B model delivers superior coding performance with a significantly reduced size.
RARE Introduces Framework for Evaluating High-Similarity Document Retrieval
The RARE framework addresses evaluation flaws in redundancy-heavy document retrieval, particularly in legal and financial sectors.
OpenAI Adds Codex-Powered Workspace Agents to ChatGPT
OpenAI introduced workspace agents in ChatGPT: Codex-powered cloud agents designed to automate complex workflows and scale team work across tools securely.
NVIDIA releases NVbandwidth to profile GPU interconnect and memory throughput
NVIDIA published NVbandwidth, a developer tool for measuring data-transfer and memory performance in CUDA-powered single- and multi-GPU systems.
Merge GNN Predictions with LLM Reasoning in GLOW for Open-World QA
GLOW pairs a pre-trained GNN with an LLM to answer questions over incomplete knowledge graphs and ships GLOW-BENCH, a 1,000-question evaluation.