AI Agents for Discovery in the Wild

Overview

AI agents are increasingly used to search over code, experiments, and designs to produce candidate discoveries. This workshop focuses on agents that operate beyond benchmarks — under expensive evaluations, noisy measurements, and real deployment constraints.

We invite work on discovery agents in science, engineering, and infrastructure, where validation is hard and human oversight matters.

Discovery

How can AI agents search, hypothesize, and optimize across science, engineering, and infrastructure?

Evaluation

How do you validate agent-driven discoveries when ground truth is limited and experiments are expensive?

Deployment

What breaks when discovery agents move from benchmarks to real-world settings?

Topics of Interest

We welcome submissions on building, evaluating, and deploying AI agents that operate under real-world constraints. These areas are representative, not exhaustive.

Search and Optimization with LLM Agents

Agents that explore design spaces, generate candidate solutions, and optimize over code, configurations, or experimental parameters — including evolutionary, iterative, and tool-augmented approaches.
Agents for Systems Infrastructure

AI agents applied to scheduling, capacity planning, database tuning, network architecture, cluster management, performance debugging, and other infrastructure operations.
Agents for Scientific Discovery

Agents that assist with hypothesis generation, experiment design, proof construction, data analysis, or literature synthesis — across domains such as mathematics, natural sciences, and engineering.
Evaluation Under Real-World Constraints

Methods for validating agent behavior when ground truth is limited, experiments are expensive, feedback is noisy, or reliability requirements are strict.
Deployment Reports and Failure Analysis

Case studies from applied settings, including adoption challenges, failure modes, operational lessons, and what didn't work.
Human-Agent Collaboration and Oversight

Workflows where domain experts direct, audit, or intervene in agent-driven processes — including trust calibration, delegation boundaries, and expert-in-the-loop design.

Call for Papers

We invite submissions on AI agents for discovery in real-world settings. Deployment reports, case studies, and lessons from applied systems — including failures — are especially welcome.

Paper submission

Submission

4-page short papers and 9-page long papers, using the official paper format.

OpenReview link coming soon Paper Template

Important dates · All deadlines follow the Anywhere on Earth (AoE) timezone.

Submission deadline May 1, 2026

Notification May 13, 2026

Camera-ready May 24, 2026

Workshop day May 26, 2026

Submission details

Formats: 4-page short papers and 9-page long papers.
File format: Single PDF, up to 50 MB.
Template: Official paper template.
Length: References and appendices do not count toward the page limit.
Review: Double blind, with at least two reviews per submission.
Criteria: Relevance, technical quality, novelty, and discussion potential.

Presentation and policies

Presentation: All accepted papers will be presented as posters, and a small number will be selected for oral presentations.
Award: The workshop will present a Best Paper Award.
Anonymization: Review PDFs must not reveal author identity, including linked or supplementary material.
Non-archival: Concurrent submissions and recently accepted work are welcome if venue policies permit.
Visibility: Submissions and reviews will not be public; only accepted papers will be made public.

Program Committee

Lakshya AgrawalUC Berkeley

Batu ElStanford

Mert CemriUC Berkeley

Alex DimakisUC Berkeley / Bespoke Labs

Shu LiuUC Berkeley

Tentative Schedule

May 26, 2026 · San Jose, CA · subject to change

8:30 Opening Remarks

8:45 Invited Talk 1

9:15 Invited Talk 2

9:45 Paper Session

10:15 Coffee Break

10:30 Invited Talk 3

11:00 Poster Session 1

11:45 Lunch

13:30 Invited Talk 4

14:00 Invited Talk 5

14:30 Invited Talk 6

15:00 Panel Discussion

15:45 Coffee Break

16:00 Poster Session 2

16:45 Open Discussion

17:15 Closing Remarks

Invited Speakers

Aditya Akella

UT Austin

Homepage

Mohammad Alizadeh

MIT / Glia AI

Homepage

Joseph E. Gonzalez

UC Berkeley

Homepage

Azalia Mirhoseini

Stanford / Ricursive Intelligence

Homepage

Graham Neubig

CMU / OpenHands

Homepage

Ion Stoica

UC Berkeley

Homepage

James Zou

Stanford

Homepage

Organizers

Shubham Agarwal

UC Berkeley

Lakshya Agrawal

UC Berkeley

Mert Cemri

UC Berkeley

Alex Dimakis

UC Berkeley / Bespoke Labs

Batu El

Stanford

Alex Krentsel

UC Berkeley

Eric Liang

Databricks

Shu Liu

UC Berkeley

Rui Meng

Google

Sylvia Ratnasamy

UC Berkeley

Ion Stoica

UC Berkeley

Matei Zaharia

UC Berkeley / Databricks

Overview

Discovery

Evaluation

Deployment

Topics of Interest

Search and Optimization with LLM Agents

Agents for Systems Infrastructure

Agents for Scientific Discovery

Evaluation Under Real-World Constraints

Deployment Reports and Failure Analysis

Human-Agent Collaboration and Oversight

Call for Papers

Submission

Submission details

Presentation and policies

Program Committee

Tentative Schedule

Invited Speakers

Aditya Akella

Mohammad Alizadeh

Joseph E. Gonzalez

Azalia Mirhoseini

Graham Neubig

Ion Stoica

James Zou

Organizers

Shubham Agarwal

Lakshya Agrawal

Mert Cemri

Alex Dimakis

Batu El

Alex Krentsel

Eric Liang

Shu Liu

Rui Meng

Sylvia Ratnasamy

Ion Stoica

Matei Zaharia

Sponsors

Bespoke Labs

Contact