Sunday, June 7, 2026
  • x
  • facebook
  • instagram

CurrentLens.com

Insight Today. Impact Tomorrow.

  • Home
  • Models
  • Agents
  • Coding
  • Creative
  • Policy
  • Infrastructure
  • Topics
    • Enterprise
    • Open Source
    • Science
    • Education
    • AI & Warfare
Latest News
  • Africa CDC and WHO launch $518M continental Ebola response plan
  • HASC adds right-to-repair language to FY27 defense policy bill
  • Startups Pull Users Off Phones With In-Person Games and DIY Cyberdecks
  • MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution
  • DKPS method cuts model-evaluation queries using cached responses
  • Pentagon Seeks JWCC Follow-On to Build Three-Tier Cloud Marketplace
  • Africa CDC and WHO launch $518M continental Ebola response plan
  • HASC adds right-to-repair language to FY27 defense policy bill
  • Startups Pull Users Off Phones With In-Person Games and DIY Cyberdecks
  • MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution
  • DKPS method cuts model-evaluation queries using cached responses
  • Pentagon Seeks JWCC Follow-On to Build Three-Tier Cloud Marketplace
  • Home
  • Models & Launches
  • Qwen3.6-35B-A3B bests Claude Opus 4.7 on Willison's pelican test

Qwen3.6-35B-A3B bests Claude Opus 4.7 on Willison's pelican test

Posted on Apr 16, 2026 by CurrentLens in Models
Qwen3.6-35B-A3B bests Claude Opus 4.7 on Willison's pelican test

Photo by Andrey Matveev on Unsplash

AI Quick Take

  • Willison's quick comparison favors Alibaba's Qwen3.6-35B-A3B over Anthropic's Opus 4.7 on two whimsical image-generation prompts.
  • The Qwen run used a 20.

Simon Willison reports that Qwen3.6-35B-A3B produced preferable illustrations to Anthropic's Claude Opus 4.7 on his informal 'pelican riding a bicycle' test and on a separate SVG flamingo-on-a-unicycle prompt. The comparison reflects direct prompt outputs and visual judgement rather than formal metrics.

The Qwen result came from a 20.9GB gguf quantized model by Unsloth, run locally on a MacBook Pro M5 through LM Studio and the llm-lmstudio plugin. Willison also ran Opus 4.7 and retried it with thinking_level set to max; his follow-up did not close the gap in these creative examples.

Willison emphasizes that the pelican benchmark is intentionally absurd and not a robust evaluation, though he notes past informal correlation between pelican quality and broader model usefulness. He also expresses skepticism that labs specifically train for this benchmark, even as the outcome nudges that suspicion.

For practitioners, the post is a narrow datapoint: it suggests quantized local inference of a 35B model can yield strong creative outputs, but it does not replace comprehensive benchmarks or controlled comparisons. Watch for repeatable, standardized tests and larger sample sets before changing deployment or procurement choices based on this anecdote.

Posted in Models & Launches | Tags: qwen, anthropic, model-release, benchmark, quantization, inference, llm, Claude
  • Latest
  • Trending
DKPS method cuts model-evaluation queries using cached responses
  • Models & Launches

DKPS method cuts model-evaluation queries using cached responses

  • CurrentLens
  • Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.

Read More: DKPS method cuts model-evaluation queries using cached responses
PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans
  • Models & Launches

PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans

  • CurrentLens
  • Jun 2, 2026

A physics-informed foundation model called PIGMENT learns a universal microstructure prior and adapts zero-shot to individual diffusion MRI scans, enabling reliable maps from sparse and heterogeneous data.

Read More: PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans
ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025
  • Models & Launches

ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025

  • CurrentLens
  • May 27, 2026

A new ATOM analysis of about 1,500 open language models maps downloads, derivatives, inference share and performance, and reports Chinese models surpassed U.S.

Read More: ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025
Authors Release OpenEval and Demand Item-Level Benchmark Standards
  • Models & Launches

Authors Release OpenEval and Demand Item-Level Benchmark Standards

  • CurrentLens
  • May 25, 2026

A position paper argues AI evaluation must publish item-level benchmark responses and ships OpenEval - 10M model responses across 155k items - to prove the point.

Read More: Authors Release OpenEval and Demand Item-Level Benchmark Standards
Authors Release OpenEval and Demand Item-Level Benchmark Standards
  • Models & Launches

Authors Release OpenEval and Demand Item-Level Benchmark Standards

  • CurrentLens
  • May 25, 2026

A position paper argues AI evaluation must publish item-level benchmark responses and ships OpenEval - 10M model responses across 155k items - to prove the point.

Read More: Authors Release OpenEval and Demand Item-Level Benchmark Standards
ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025
  • Models & Launches

ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025

  • CurrentLens
  • May 27, 2026

A new ATOM analysis of about 1,500 open language models maps downloads, derivatives, inference share and performance, and reports Chinese models surpassed U.S.

Read More: ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025
PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans
  • Models & Launches

PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans

  • CurrentLens
  • Jun 2, 2026

A physics-informed foundation model called PIGMENT learns a universal microstructure prior and adapts zero-shot to individual diffusion MRI scans, enabling reliable maps from sparse and heterogeneous data.

Read More: PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans
DKPS method cuts model-evaluation queries using cached responses
  • Models & Launches

DKPS method cuts model-evaluation queries using cached responses

  • CurrentLens
  • Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.

Read More: DKPS method cuts model-evaluation queries using cached responses

Categories

  • Models & Launches›
  • Agents & Automation›
  • AI in Coding›
  • AI Creative›
  • Policy & Safety›
  • Chips & Infrastructure›
  • Enterprise AI›
  • Open Source & Research›
  • Science & Healthcare›
  • AI in Education›
  • AI Defense & Warfare›
CurrentLens.com

Navigate

  • Home
  • Topics
  • About
  • Contact
  • Privacy Policy
  • Terms of Use

Coverage

  • Models & Launches
  • Agents & Automation
  • AI in Coding
  • AI Creative
  • Policy & Safety
  • Chips & Infrastructure

Newsletter

AI news that matters, straight to your inbox.

© 2026 CurrentLens.comAll rights reserved