Search: Reasoning | CurrentLens.com

Open Source & Research

Experts Assess LLM Performance on Japanese Bar Exam's Open-Ended Tasks

CurrentLens
Apr 29, 2026

A new study evaluates LLMs' legal reasoning using the Japanese bar exam's writing component.

Science & Healthcare

New LLM Framework Enhances Mathematical Reasoning Evaluation

CurrentLens
Apr 28, 2026

A novel LLM-based framework provides flexible evaluation of mathematical reasoning, addressing limitations of symbolic methods.

Models & Launches

Test-Time Matching Enhances Compositional Reasoning in Multimodal Models

CurrentLens
Apr 27, 2026

A new test-time matching method improves compositional reasoning in AI models, achieving state-of-the-art results.

Open Source & Research

Evaluates LLMs on Vietnamese legal text with a dual-aspect framework

CurrentLens
Apr 21, 2026

An arXiv paper introduces a quantitative-plus-error-analysis benchmark for Vietnamese legal text, comparing GPT-4o, Claude 3 Opus, Gemini 1.5 Pro and Grok-1.

Agents & Automation

OpenAI Launches GPT-Rosalind to Accelerate Life‑Sciences Research

CurrentLens
Apr 17, 2026

OpenAI introduced GPT‑Rosalind, a frontier reasoning model aimed at speeding drug discovery, genomics, protein reasoning, and scientific workflows.

Open Source & Research

Merge GNN Predictions with LLM Reasoning in GLOW for Open-World QA

CurrentLens
Apr 16, 2026

GLOW pairs a pre-trained GNN with an LLM to answer questions over incomplete knowledge graphs and ships GLOW-BENCH, a 1,000-question evaluation.

Models & Launches

DeepMind Ships Gemini Robotics‑ER 1.6 for Physical Robot Reasoning

CurrentLens
Apr 15, 2026

Gemini Robotics‑ER 1.6 adds instrument-reading plus improved visual, spatial and planning skills to DeepMind's embodied-reasoning model for robots.

Latest
Trending

Science & Healthcare

Africa CDC and WHO launch $518M continental Ebola response plan

CurrentLens
Jun 6, 2026

A six-month 'One Response' plan targets the Bundibugyo Ebola outbreak with unified coordination, surveillance, clinical care and community engagement across affected countries.

Policy & Safety

HASC adds right-to-repair language to FY27 defense policy bill

CurrentLens
Jun 6, 2026

The House Armed Services Committee inserted right-to-repair provisions into its FY27 defense policy draft, aiming to ease barriers that limit troops' ability to fix equipment.

AI Creative

Startups Pull Users Off Phones With In-Person Games and DIY Cyberdecks

CurrentLens
Jun 6, 2026

TechCrunch highlights founders building physical social products: Board raised funding for in-person games, and cyberdeck DIYs are going viral.

Agents & Automation

MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution

CurrentLens
Jun 6, 2026

Simon Willison published an alpha MicroPython-in-WASM sandbox (micropython-wasm) and a Datasette plugin (datasette-agent-micropython) to run plugin code with constrained access.

Models & Launches

DKPS method cuts model-evaluation queries using cached responses

CurrentLens
Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.