A new study evaluates LLMs' legal reasoning using the Japanese bar exam's writing component.
7 results for: Reasoning
New LLM Framework Enhances Mathematical Reasoning Evaluation
A novel LLM-based framework provides flexible evaluation of mathematical reasoning, addressing limitations of symbolic methods.
Test-Time Matching Enhances Compositional Reasoning in Multimodal Models
A new test-time matching method improves compositional reasoning in AI models, achieving state-of-the-art results.
Evaluates LLMs on Vietnamese legal text with a dual-aspect framework
An arXiv paper introduces a quantitative-plus-error-analysis benchmark for Vietnamese legal text, comparing GPT-4o, Claude 3 Opus, Gemini 1.5 Pro and Grok-1.
OpenAI Launches GPT-Rosalind to Accelerate Life‑Sciences Research
OpenAI introduced GPT‑Rosalind, a frontier reasoning model aimed at speeding drug discovery, genomics, protein reasoning, and scientific workflows.
Merge GNN Predictions with LLM Reasoning in GLOW for Open-World QA
GLOW pairs a pre-trained GNN with an LLM to answer questions over incomplete knowledge graphs and ships GLOW-BENCH, a 1,000-question evaluation.
DeepMind Ships Gemini Robotics‑ER 1.6 for Physical Robot Reasoning
Gemini Robotics‑ER 1.6 adds instrument-reading plus improved visual, spatial and planning skills to DeepMind's embodied-reasoning model for robots.