Search: arxiv | CurrentLens.com

Models & Launches

DKPS method cuts model-evaluation queries using cached responses

CurrentLens
Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.

Models & Launches

PIGMENT extends quantitative diffusion MRI to sparse, multi-site and low-field scans

CurrentLens
Jun 2, 2026

A physics-informed foundation model called PIGMENT learns a universal microstructure prior and adapts zero-shot to individual diffusion MRI scans, enabling reliable maps from sparse and heterogeneous data.

Open Source & Research

MPMMine standardizes benchmarks for constraint-acquisition research

CurrentLens
May 27, 2026

An arXiv preprint introduces MPMMine, a benchmark suite built to supply the domain artifacts and structured data constraint-acquisition methods need for reproducible evaluation.

Open Source & Research

Paper Proposes Three-Step Framework for Knowledge-Work Benchmarks

CurrentLens
May 25, 2026

An arXiv paper argues that LLM evaluation still mirrors traditional NLP tasks and offers a three-step method to align benchmarks with real workplace activity.

Models & Launches

Authors Release OpenEval and Demand Item-Level Benchmark Standards

CurrentLens
May 25, 2026

A position paper argues AI evaluation must publish item-level benchmark responses and ships OpenEval - 10M model responses across 155k items - to prove the point.

Open Source & Research

Evaluates LLMs on Vietnamese legal text with a dual-aspect framework

CurrentLens
Apr 21, 2026

An arXiv paper introduces a quantitative-plus-error-analysis benchmark for Vietnamese legal text, comparing GPT-4o, Claude 3 Opus, Gemini 1.5 Pro and Grok-1.

Models & Launches

Full fine-tuning concentrates LLM attribution in code-compliance models

CurrentLens
Apr 21, 2026

An arXiv study uses perturbation-based attribution to compare FFT, LoRA, and quantized LoRA across model sizes and finds FFT yields more focused interpretive patterns.

Open Source & Research

Merge GNN Predictions with LLM Reasoning in GLOW for Open-World QA

CurrentLens
Apr 16, 2026

GLOW pairs a pre-trained GNN with an LLM to answer questions over incomplete knowledge graphs and ships GLOW-BENCH, a 1,000-question evaluation.

Latest
Trending

Science & Healthcare

Africa CDC and WHO launch $518M continental Ebola response plan

CurrentLens
Jun 6, 2026

A six-month 'One Response' plan targets the Bundibugyo Ebola outbreak with unified coordination, surveillance, clinical care and community engagement across affected countries.

Policy & Safety

HASC adds right-to-repair language to FY27 defense policy bill

CurrentLens
Jun 6, 2026

The House Armed Services Committee inserted right-to-repair provisions into its FY27 defense policy draft, aiming to ease barriers that limit troops' ability to fix equipment.

AI Creative

Startups Pull Users Off Phones With In-Person Games and DIY Cyberdecks

CurrentLens
Jun 6, 2026

TechCrunch highlights founders building physical social products: Board raised funding for in-person games, and cyberdeck DIYs are going viral.

Agents & Automation

MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution

CurrentLens
Jun 6, 2026

Simon Willison published an alpha MicroPython-in-WASM sandbox (micropython-wasm) and a Datasette plugin (datasette-agent-micropython) to run plugin code with constrained access.

Models & Launches

DKPS method cuts model-evaluation queries using cached responses

CurrentLens
Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.