Google Releases Gemini-SQL2; Gemini 3.1 Pro Scores 80.04% on BIRD

Posted on Jun 13, 2026 by CurrentLens in Models

AI Quick Take

Gemini-SQL2, powered by Gemini 3.1 Pro, achieved 80.04% execution accuracy on the BIRD single-model leaderboard.
Google has not published full implementation or evaluation details; readers should watch for released configs, APIs, or multi-model comparisons.

Google Research announced Gemini-SQL2 on June 12, 2026: a text-to-SQL capability powered by Gemini 3.1 Pro that posted 80.04% execution accuracy on the BIRD single-model leaderboard. The announcement highlights that execution-accuracy number as the core performance metric and frames Gemini-SQL2 as a schema-grounded text-to-SQL approach intended for practical database query generation.

The reported 80.04% figure is the BIRD benchmark’s execution-accuracy result for Gemini 3.1 Pro on the single-model leaderboard; the source material explains what that metric measures and how leaderboard placement is assessed. At the same time, Google has not yet published full implementation or evaluation details needed to reproduce the benchmark or immediately integrate the capability into production systems. The release also discusses use cases and a schema-grounded implementation pattern, which suggests recommended integration approaches even in the absence of full technical disclosure.

For engineers and product teams, the immediate takeaway is that Gemini-SQL2 establishes a public performance point for Gemini 3.1 Pro on a recognized text-to-SQL benchmark, but adoption decisions will depend on further details. Expect to watch for follow-up releases from Google that provide evaluation configurations, prompts or templates, APIs or SDKs, and multi-model comparisons that enable reproducible testing and practical deployment planning.

Latest
Trending

Models & Launches

DKPS method cuts model-evaluation queries using cached responses

CurrentLens
Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.

Models & Launches

PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans

CurrentLens
Jun 2, 2026

A physics-informed foundation model called PIGMENT learns a universal microstructure prior and adapts zero-shot to individual diffusion MRI scans, enabling reliable maps from sparse and heterogeneous data.

Models & Launches

ATOM Report Finds Chinese Open Models Overtook Western Peers in 2025

CurrentLens
May 27, 2026

A new ATOM analysis of about 1,500 open language models maps downloads, derivatives, inference share and performance, and reports Chinese models surpassed U.S.

Models & Launches

Authors Release OpenEval and Demand Item-Level Benchmark Standards

CurrentLens
May 25, 2026

A position paper argues AI evaluation must publish item-level benchmark responses and ships OpenEval - 10M model responses across 155k items - to prove the point.