- Writing
- Evals & Observability
Evals & Observability
What separates demos from deployed — Langfuse and LangSmith, OpenAI tracing, golden datasets, regression testing, and cost telemetry. The instrumentation every production AI system needs.
All Articles
Trace-to-Eval Builder Build Log
A build log for Trace-to-Eval Builder, a BYOK app that turns agent traces into replayable eval packs.
Agent Runbook Auditor: A BYOK Launch Review Tool for Agent Workflows
A build log for Agent Runbook Auditor, a BYOK OpenAI demo that reviews agent runbooks for launch risk, trace design, eval cases, guardrails, and rollout readiness.

OpenAI Agents SDK Tracing: What It Shows in Production
Use OpenAI Agents SDK tracing as run inspection, not full observability. Configure sensitive data, flushing, trace exports, evals, and approvals.

Langfuse vs LangSmith for Production Observability
Choose Langfuse for self-hosted, framework-neutral traces. Choose LangSmith for managed LangChain evals, review, alerting, and deployment.
One letter, every week. Working systems — not hot takes.
Build logs, agentic engineering decisions, agent failures, evals, and what survives real users. Sent weekly, never more.