Qwen3.6-35B-A3B bests Claude Opus 4.7 on Willison's pelican test

Posted on Apr 16, 2026 by CurrentLens in Models

AI Quick Take

Willison's quick comparison favors Alibaba's Qwen3.6-35B-A3B over Anthropic's Opus 4.7 on two whimsical image-generation prompts.
The Qwen run used a 20.

Simon Willison reports that Qwen3.6-35B-A3B produced preferable illustrations to Anthropic's Claude Opus 4.7 on his informal 'pelican riding a bicycle' test and on a separate SVG flamingo-on-a-unicycle prompt. The comparison reflects direct prompt outputs and visual judgement rather than formal metrics.

The Qwen result came from a 20.9GB gguf quantized model by Unsloth, run locally on a MacBook Pro M5 through LM Studio and the llm-lmstudio plugin. Willison also ran Opus 4.7 and retried it with thinking_level set to max; his follow-up did not close the gap in these creative examples.

Willison emphasizes that the pelican benchmark is intentionally absurd and not a robust evaluation, though he notes past informal correlation between pelican quality and broader model usefulness. He also expresses skepticism that labs specifically train for this benchmark, even as the outcome nudges that suspicion.

For practitioners, the post is a narrow datapoint: it suggests quantized local inference of a 35B model can yield strong creative outputs, but it does not replace comprehensive benchmarks or controlled comparisons. Watch for repeatable, standardized tests and larger sample sets before changing deployment or procurement choices based on this anecdote.

Latest
Trending

Models & Launches

Xiaomi Launches MiMo-V2.5-Pro and MiMo-V2.5 at Lower Costs

CurrentLens
Apr 23, 2026

Xiaomi's new MiMo models achieve frontier benchmarks while reducing token costs significantly.

Models & Launches

OpenAI Makes ChatGPT Free for Verified U.S. Healthcare Professionals

CurrentLens
Apr 23, 2026

OpenAI has announced that verified U.S. physicians, nurse practitioners, and pharmacists can now access ChatGPT for Clinicians at no charge.

Models & Launches

RepIt Framework Enables Concept-Specific Refusal in Language Models

CurrentLens
Apr 23, 2026

A new framework exposes vulnerabilities in language model safety evaluations through concept-specific manipulations.

Models & Launches

OpenAI Adds Codex-Powered Workspace Agents to ChatGPT

CurrentLens
Apr 22, 2026

OpenAI introduced workspace agents in ChatGPT: Codex-powered cloud agents designed to automate complex workflows and scale team work across tools securely.