Agent Runbook Auditor: A BYOK Launch Review Tool for Agent Workflows

A build log for Agent Runbook Auditor, a BYOK OpenAI demo that reviews agent runbooks for launch risk, trace design, eval cases, guardrails, and rollout readiness.

Monday, June 22, 2026Omid Saffari

Agent Runbook Auditor is a small BYOK demo for teams shipping agents into real workflows. Paste a runbook, PRD, or rough automation idea and it returns the launch risks, trace fields, eval cases, guardrails, and rollout checklist the team should review before production.

What shipped

This release turns the omidsaffari/ai-app-starter template into a focused evals and observability tool. The app keeps the template's trust boundary intact: the visitor pastes their own OpenAI key in the browser, the key is held in session storage, and the key is sent only as a per-request header to the app's /api/run route. No author-owned runtime key exists in the project.

The capability uses the OpenAI Responses API with gpt-5.4-mini. The prompt treats the submitted runbook as untrusted design material and asks for a practical production-readiness audit instead of generic agent advice. The output format is deliberately operational: system summary, launch risk map, trace schema, eval suite, guardrails, and rollout checklist.

Why this demo

Most agent prototypes fail at the handoff between "it works on my prompt" and "we can observe, test, and control it in production." The missing artifacts are usually not more UI polish. They are boring but essential: what fields should traces capture, what approval points are mandatory, which tool calls are risky, and which eval cases should fail before a release is allowed.

Agent Runbook Auditor gives builders a fast second pass on those artifacts. It is useful for support agents, internal ops assistants, RAG workflows, sales research agents, and any tool-using automation where mistakes can leak data, duplicate work, or trigger an external side effect.

Build notes

The generated repo started private from omidsaffari/ai-app-starter, then only the allowed project surface changed: src/lib/capability.ts, src/lib/project.ts, src/app/page.tsx, and README.md, plus the required OpenAI SDK dependency and lockfile update. The frozen BYOK files, route proxy, tests, and CI workflow were left untouched.

The demo UI includes a sample B2B support triage runbook, a larger editing pane, streaming output, stop and clear controls, and a compact two-column audit workspace. It is intentionally a working tool on the first screen rather than a landing page.

Gates

The release passed the GitHub Actions gate on commit e337c9e. The workflow covered dependency install, Biome lint, production Next.js build, TypeScript typecheck, gitleaks secret scan, Playwright browser install, and the BYOK no-leak smoke test.

The demo was deployed to Vercel as its own project and the production alias returns HTTP 200. The repo is public after the green CI gate, and the demo remains BYOK only.

Next improvements

The next useful additions are Markdown export, a trace schema copy button, structured JSON output for review pipelines, and comparison mode for two competing agent designs. Those are follow-on features; the current release already exercises the full BYOK loop from key entry to streamed model output without storing or logging visitor credentials.

Last Updated

Jun 22, 2026

More from Evals & Observability

View all Evals & Observability articles
Newsletter

One letter, every week. Working systems — not hot takes.

Build logs, agentic engineering decisions, agent failures, evals, and what survives real users. Sent weekly, never more.

Weekly. No spam. Unsubscribe anytime.