Search: dataset | CurrentLens.com

Agents & Automation

MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution

CurrentLens
Jun 6, 2026

Simon Willison published an alpha MicroPython-in-WASM sandbox (micropython-wasm) and a Datasette plugin (datasette-agent-micropython) to run plugin code with constrained access.

Open Source & Research

MPMMine standardizes benchmarks for constraint-acquisition research

CurrentLens
May 27, 2026

An arXiv preprint introduces MPMMine, a benchmark suite built to supply the domain artifacts and structured data constraint-acquisition methods need for reproducible evaluation.

Open Source & Research

Paper Proposes Three-Step Framework for Knowledge-Work Benchmarks

CurrentLens
May 25, 2026

An arXiv paper argues that LLM evaluation still mirrors traditional NLP tasks and offers a three-step method to align benchmarks with real workplace activity.

Models & Launches

Authors Release OpenEval and Demand Item-Level Benchmark Standards

CurrentLens
May 25, 2026

A position paper argues AI evaluation must publish item-level benchmark responses and ships OpenEval - 10M model responses across 155k items - to prove the point.

Open Source & Research

OpenClassGen Provides Extensive Python Classes for LLM Research

CurrentLens
May 3, 2026

OpenClassGen introduces a comprehensive dataset of Python classes, enhancing LLM evaluation.

Open Source & Research

Experts Assess LLM Performance on Japanese Bar Exam's Open-Ended Tasks

CurrentLens
Apr 29, 2026

A new study evaluates LLMs' legal reasoning using the Japanese bar exam's writing component.

Chips & Infrastructure

AI and GPUs Accelerate Cosmic Data Analysis This Spring Astronomy Day

CurrentLens
Apr 24, 2026

AI technologies and GPUs are streamlining the analysis of vast cosmic datasets for astronomers.

Open Source & Research

Hugging Face Releases ml-intern to Automate LLM Post‑Training Workflows

CurrentLens
Apr 23, 2026

ml-intern is an open-source agent that automates literature review, dataset discovery, training script runs, and iterative evaluation for LLM post-training work.

AI in Coding

Datasette 1.0a28 fixes alpha breakages, adds shutdown and test-cleanup APIs

CurrentLens
Apr 17, 2026

Release 1.0a28 repairs compatibility regressions from 1.0a27, adds datasette.close and database.close behavior, and ships a pytest plugin to avoid fd leaks.

Science & Healthcare

EVE Releases Open-Source 24B Earth-Intelligence LLM and Benchmarks

CurrentLens
Apr 16, 2026

EVE publishes EVE-Instruct, a 24B Mistral-based model and a suite of Earth-science datasets, benchmarks, and tooling for domain-specific LLM deployment.

Latest
Trending

Agents & Automation

OpenAI Launches Three Academy Courses on Agents and Workflows

CurrentLens
Jun 13, 2026

OpenAI released three Academy courses focused on practical AI skills, building repeatable workflows, and applying agents in everyday work.

Models & Launches

Google Releases Gemini-SQL2; Gemini 3.1 Pro Scores 80.04% on BIRD

CurrentLens
Jun 13, 2026

Google Research announced Gemini-SQL2, a Gemini 3.1 Pro-powered text-to-SQL capability that posted 80.04% execution accuracy on the BIRD single-model leaderboard.

Science & Healthcare

Africa CDC and WHO launch $518M continental Ebola response plan

CurrentLens
Jun 6, 2026

A six-month 'One Response' plan targets the Bundibugyo Ebola outbreak with unified coordination, surveillance, clinical care and community engagement across affected countries.

Policy & Safety

HASC adds right-to-repair language to FY27 defense policy bill

CurrentLens
Jun 6, 2026

The House Armed Services Committee inserted right-to-repair provisions into its FY27 defense policy draft, aiming to ease barriers that limit troops' ability to fix equipment.

AI Creative

Startups Pull Users Off Phones With In-Person Games and DIY Cyberdecks

CurrentLens
Jun 6, 2026

TechCrunch highlights founders building physical social products: Board raised funding for in-person games, and cyberdeck DIYs are going viral.

Agents & Automation

MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution

CurrentLens
Jun 6, 2026

Simon Willison published an alpha MicroPython-in-WASM sandbox (micropython-wasm) and a Datasette plugin (datasette-agent-micropython) to run plugin code with constrained access.

Models & Launches

DKPS method cuts model-evaluation queries using cached responses

CurrentLens
Jun 6, 2026

An arXiv paper introduces a DKPS-based approach that uses cached model outputs to predict benchmark scores while substantially reducing the number of queries.

11 results for: dataset

MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution

MPMMine standardizes benchmarks for constraint-acquisition research

Paper Proposes Three-Step Framework for Knowledge-Work Benchmarks

Datasette Adds Extensible 'Jump to' Menu in 1.0a30

Authors Release OpenEval and Demand Item-Level Benchmark Standards

OpenClassGen Provides Extensive Python Classes for LLM Research

Experts Assess LLM Performance on Japanese Bar Exam's Open-Ended Tasks

AI and GPUs Accelerate Cosmic Data Analysis This Spring Astronomy Day

Hugging Face Releases ml-intern to Automate LLM Post‑Training Workflows

Datasette 1.0a28 fixes alpha breakages, adds shutdown and test-cleanup APIs

EVE Releases Open-Source 24B Earth-Intelligence LLM and Benchmarks

OpenAI Launches Three Academy Courses on Agents and Workflows

Google Releases Gemini-SQL2; Gemini 3.1 Pro Scores 80.04% on BIRD

Africa CDC and WHO launch $518M continental Ebola response plan

HASC adds right-to-repair language to FY27 defense policy bill

Startups Pull Users Off Phones With In-Person Games and DIY Cyberdecks

MicroPython WASM Sandbox Enables Safer Datasette Plugin Execution

DKPS method cuts model-evaluation queries using cached responses

MiniMax Open-Sources M2.7, Its First Self-Evolving Agent

OpenAI pushes to lock users and expand enterprise in internal memo

NVIDIA Launches Ising AI Models to Tackle Noisy Qubits

Microsoft Tests OpenClaw-Style Agents for Copilot

Anthropic Briefed Trump Administration on Mythos, Co‑Founder Confirms