A livestream claim that the model leap equals GPT‑3→GPT‑5 met a hands‑on comparison against gpt-image-1, Gemini's Nano Banana, and Claude.
AI Quick Take
- Sam Altman framed gpt-image-2 as a large generational step on the launch livestream.
- Simon Willison tested the model with a Where’s‑Waldo style prompt and compared outputs from gpt-image-1, Gemini and Claude; results varied.
OpenAI released ChatGPT Images 2.0 and promoted it on a livestream where Sam Altman described the jump from gpt-image-1 as comparable to a GPT‑3 → GPT‑5 generational shift. Simon Willison ran a practical comparison using a Where’s‑Waldo style prompt - “where is the raccoon holding a ham radio” - and shared results from gpt-image-1, Gemini’s Nano Banana variants, and Anthropic’s Claude Opus 4.7.
Willison noted the OpenAI Python client hadn’t been updated to include gpt-image-2, but the client doesn’t validate model IDs, so he invoked the new model by passing its identifier directly. In his informal trial, Nano Banana 2 produced a clearly findable raccoon, Nano Banana Pro produced the weakest result, Claude acknowledged a raccoon was present but had trouble locating it, and the gpt-image-1 baseline left the creature hard to spot. Willison published the returned images and his observations rather than a formal benchmark.
The episode matters operationally because it shows practitioners can test new image models quickly even when SDKs lag, but it also underlines that single‑prompt demonstrations don’t prove broad superiority. Teams evaluating image generation should expect follow‑up: OpenAI documentation and client updates, independent benchmarks, and more diverse prompt testing to validate the livestream’s generational claim and to understand where gpt-image-2 actually changes production behavior.