Anthropic's new marketplace allows AI agents to facilitate real transactions between buyers and sellers.
7 results for: Test
Test-Time Matching Enhances Compositional Reasoning in Multimodal Models
A new test-time matching method improves compositional reasoning in AI models, achieving state-of-the-art results.
OpenAI Introduces Parameter Golf in Model Craft Initiative
OpenAI's latest initiative, Parameter Golf, aims to refine model performance metrics.
CSET Director Helen Toner Calls for Enhanced IP Protections in Senate Testimony
Helen Toner urged lawmakers to strengthen U.S. intellectual property protections against foreign theft.
Datasette 1.0a28 fixes alpha breakages, adds shutdown and test-cleanup APIs
Release 1.0a28 repairs compatibility regressions from 1.0a27, adds datasette.close and database.close behavior, and ships a pytest plugin to avoid fd leaks.
Qwen3.6-35B-A3B bests Claude Opus 4.7 on Willison's pelican test
Simon Willison reports that a local, quantized Qwen3.6-35B-A3B run produced better pelican and flamingo illustrations than Anthropic's Claude Opus 4.
Microsoft Tests OpenClaw-Style Agents for Copilot
Microsoft is experimenting with OpenClaw-like local agents inside Copilot to enable more autonomous, around-the-clock task execution for Microsoft 365.