OpenAI Agents SDK vs Pydantic AI for Production Agents

Choose OpenAI Agents SDK for OpenAI-native runs. Choose Pydantic AI when typed Python, provider flexibility, and durable approvals matter.

Tuesday, June 16, 2026

Omid Saffari

OpenAI Agents SDK vs Pydantic AI for Production Agents

Choose OpenAI Agents SDK when your agent runtime should stay close to OpenAI's Responses stack, tracing, handoffs, guardrails, and approval flow. Choose Pydantic AI when your Python system needs typed business logic, provider portability, Logfire or OpenTelemetry observability, and durable approval workflows you can own.

The Verdict

OpenAI Agents SDK is the better first choice when OpenAI is your production runtime boundary. It gives you a built-in agent loop, function tools, guardrails, sessions, MCP server tool calling, handoffs, human-in-the-loop interruptions, sandbox agents, realtime agents, and tracing in one small Python package. If the agent's job is to run mostly on OpenAI models, use Responses features, inspect traces in OpenAI, and pause sensitive tool calls for approval, the SDK removes a lot of runtime assembly.

Pydantic AI is the better first choice when the agent is really a typed Python service that happens to call models. It is model-agnostic, built around Pydantic validation, dependency injection, typed outputs, Logfire or OpenTelemetry observability, evals, deferred tools, durable execution, and graph support. If the system must move between OpenAI, Anthropic, Gemini, Grok, Mistral, OpenRouter, Bedrock, or another provider without rewriting the app layer, Pydantic AI gives the cleaner ownership boundary.

The production decision is not "which framework has more features." The decision is where you want failure ownership to live. OpenAI Agents SDK pushes more of the loop into an OpenAI-native runtime. Pydantic AI keeps more of the loop in your application, where type checking, provider selection, and workflow durability are explicit.

If your unresolved decision is still inside OpenAI's own SDK surface, compare the language/runtime split in OpenAI Agents SDK TypeScript vs Python. If your alternative is graph orchestration rather than typed Python services, use OpenAI Agents SDK vs LangGraph for Production Agents.

The Axis That Separates Them

The real axis is runtime ownership: OpenAI Agents SDK owns more of the agent loop, while Pydantic AI makes the application own more of it. That sounds abstract until production traffic arrives.

With OpenAI Agents SDK, the package is designed around a small set of primitives: agents, agents as tools or handoffs, and guardrails. The SDK's built-in loop handles tool invocation, passes results back to the model, and continues until the task is complete. Tracing is enabled by default and captures LLM generations, tool calls, handoffs, guardrails, and custom events. Sensitive tool calls can pause as interruptions, then resume from RunState after approval or rejection.

That is useful when you want a managed loop. It is also the contract you must operate. If your provider path is not OpenAI-native, the docs explicitly push you to validate feature differences across structured outputs, multimodal input, hosted file search, hosted web search, and tools before production.

With Pydantic AI, the framework starts from typed Python. Agents are generic over dependencies and output types, tools receive typed context, and outputs are validated with Pydantic. Model classes provide a vendor-SDK-agnostic API, so swapping providers is an app-level decision instead of a framework rewrite. Deferred tools and durable execution make long-running work, approvals, and external results part of the application protocol.

The line is simple: if you want OpenAI's runtime to be the control surface, choose OpenAI Agents SDK. If you want your Python service to be the control surface, choose Pydantic AI.

Production Comparison

Use the table as a deployment decision, not a feature checklist. The winner depends on what your team needs to observe and recover when a run gets stuck, calls the wrong tool, hits a provider gap, or waits on approval.

Production axis	OpenAI Agents SDK	Pydantic AI	Choose OpenAI Agents SDK when	Choose Pydantic AI when	What to verify before launch
Runtime loop	Built-in agent loop manages tool calls, results, guardrails, handoffs, and sessions	App-owned Python agent with typed dependencies, tools, outputs, capabilities, and optional graph support	You want the SDK to run the loop	You want your service code to own the workflow	Retry rules, idempotency, session storage, and who can resume a failed run
Model/provider support	OpenAI Responses path by default, Chat Completions path, custom providers, per-agent models, and beta adapter routes	Model-agnostic providers with OpenAI, Anthropic, Gemini, Grok, Mistral, Bedrock, OpenRouter, Vercel, and more	OpenAI model behavior is the product boundary	Provider portability is a release requirement	Structured output, tool calling, usage metrics, and streaming support per provider
Typed outputs	Function tools use automatic schema generation and Pydantic-powered validation	Pydantic validation is the center of the agent API, including typed dependencies and typed outputs	Typed output is needed but not the app architecture	Typed domain objects define the app contract	Invalid output retries, schema drift, and typed error reporting
Observability	Built-in tracing, OpenAI Traces dashboard, custom trace processors, and `flush_traces()`	Logfire integration, OpenTelemetry-compatible backends, eval monitoring, behavior tracing, and cost tracking	OpenAI trace UI is enough for the first operating layer	You already operate OTel or Logfire across the service	Sensitive data capture, trace sampling, cost attribution, and export guarantees
Human approval	`needs_approval`, interruptions, `RunState`, approval/rejection, and resumable runs	Deferred tools, approval requests, `DeferredToolResults`, message history, and handler-based resolution	A paused SDK run is the right approval unit	Approval belongs in your app or UI protocol	Reviewer payload, timeout behavior, rejection messages, and audit logs
Durable execution	`RunState` persists paused approval state and SDK runtime metadata	Durable agents with Temporal, DBOS, Prefect, and Restate support	Pauses mostly come from approval gates	Runs must survive worker restarts and long async jobs	Versioning of pending work, schema migration, and replay safety
Failure ownership	Provider feature mismatch is mostly discovered when an SDK request shape meets a backend	Provider feature mismatch is surfaced through model/provider classes and app-owned workflow code	You accept OpenAI-native constraints	You need explicit fallback and portability logic	The exact model backend you plan to ship

OpenAI Agents SDK: Best When OpenAI Is the Runtime Boundary

OpenAI Agents SDK wins when the agent is mostly an OpenAI Responses workflow with tools, approvals, trace inspection, and handoffs. The SDK uses the Responses API by default for OpenAI models and adds the runtime layer around the model call.

That matters when the agent has to coordinate multiple OpenAI-native pieces. The overview lists function tools, MCP server tool calling, guardrails, sessions, sandbox agents, human-in-the-loop mechanisms, and tracing as first-class surfaces. In a support, research, coding, or operations agent, those are the surfaces that usually turn a prototype into a shipped system.

The strongest production feature is the approval and resume path. A tool can declare needs_approval=True, or decide per call whether approval is required. When the model emits a sensitive tool call, execution pauses and the run returns interruptions with the agent name, tool name, and arguments. The app converts the result to RunState, approves or rejects the pending item, and resumes the original top-level run. That same interruption flow applies across the current agent, handoffs, nested Agent.as_tool() calls, local MCP servers, hosted MCP tools, shell tools, and apply-patch tools according to each tool's support.

Python

from agents import Agent, Runner, function_tool

@function_tool(needs_approval=True)
async def issue_refund(invoice_id: str) -> str:
    return f"Refund issued for {invoice_id}"

agent = Agent(
    name="Billing operations agent",
    instructions="Resolve billing requests. Ask for approval before refunds.",
    tools=[issue_refund],
)

result = await Runner.run(agent, "Refund invoice inv_123")

if result.interruptions:
    state = result.to_state()
    for interruption in result.interruptions:
        state.reject(interruption, rejection_message="Refund requires finance approval.")
    result = await Runner.run(agent, state)

That is the right shape when the approval unit is the SDK run itself. The reviewer sees a pending tool call, the state is resumable, and rejection flows back into the model-visible context.

Tracing is the second reason to choose it. The SDK traces Runner.run, agent spans, LLM generations, function tool calls, guardrails, handoffs, and custom spans. In background workers, the default processor exports in batches and flushes on process exit, and flush_traces() can force delivery after a unit of work closes. For production, call it at the end of queue jobs where the trace must be visible before the worker acknowledges completion.

The caveat is data policy and provider shape. Tracing is unavailable for organizations operating under a Zero Data Retention policy using OpenAI APIs. The default trace setting captures sensitive generation and function call inputs and outputs unless trace_include_sensitive_data is disabled. Non-OpenAI providers can be wired through custom clients, ModelProvider, per-agent model objects, or adapters, but the docs are direct about feature differences. If the provider does not support structured outputs or a specific tool surface, the agent can fail at the exact place production needs determinism.

Choose OpenAI Agents SDK when your near-term production question is: "Can we run this OpenAI-native agent with tool approvals, traces, sessions, and handoffs without building a custom runtime first?"

Pydantic AI: Best When Typed Python Owns the Workflow

Pydantic AI wins when the agent is part of a larger Python application whose contracts should be typed, tested, and portable across providers. It is not just "Pydantic for outputs." The framework is built around typed dependencies, typed outputs, tools, dynamic instructions, model/provider abstraction, Logfire, evals, deferred tools, durable execution, streamed outputs, and graph support.

Provider portability is the obvious advantage. Pydantic AI's docs list built-in or OpenAI-compatible support across OpenAI, Anthropic, Gemini, DeepSeek, Grok, Cohere, Mistral, Perplexity, Azure AI Foundry, Amazon Bedrock, Google Cloud, Ollama, LiteLLM, Groq, OpenRouter, Together AI, Fireworks AI, Cerebras, Hugging Face, GitHub, Heroku, Vercel, Nebius, OVHcloud, Alibaba Cloud, and SambaNova. The important part is not the length of the list. The important part is that model classes implement a vendor-SDK-agnostic API, so a single agent can move between providers by swapping the model object.

The deeper advantage is typed domain logic. A production agent usually needs account context, permissions, tool policies, retrieval filters, billing state, and output contracts. Pydantic AI passes those through typed dependencies and validates final output with Pydantic. That makes it easier to test a run without hitting a real model, and easier to fail closed when the output is malformed.

Python

from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext

@dataclass
class ApprovalContext:
    actor_id: str
    can_refund: bool

class BillingDecision(BaseModel):
    action: str = Field(description="The approved billing action")
    needs_review: bool
    reason: str

agent = Agent(
    "openai:gpt-5.2",
    deps_type=ApprovalContext,
    output_type=BillingDecision,
    instructions="Return a billing decision. Do not execute payment actions.",
)

@agent.instructions
async def add_policy(ctx: RunContext[ApprovalContext]) -> str:
    return "Refund permission is present." if ctx.deps.can_refund else "Refund permission is absent."

This shape is easier to fit into an existing service because the agent boundary looks like normal Python application code. The model call is one dependency. The domain contract is another.

Pydantic AI's approval model is also more app-owned. Deferred tools cover calls that need approval and calls executed externally. A tool can require approval at declaration time with requires_approval=True, or raise ApprovalRequired dynamically based on arguments and context. The run can resolve pending calls inline with HandleDeferredToolCalls, or end with DeferredToolRequests so a UI, queue, or service can collect approvals and resume with DeferredToolResults plus message history.

That is the right shape when approval does not belong inside a model runtime. A product UI can show the pending tool call, a policy service can attach reviewer metadata, a queue can wait for an external result, and the follow-up run can continue with the same message history.

Durability is the other production advantage. Pydantic AI durable agents can preserve progress across transient API failures, application errors, restarts, long-running asynchronous workflows, and human-in-the-loop workflows. The official durable integrations are Temporal, DBOS, Prefect, and Restate. If your agent has work that may sit in a queue, wait for a reviewer, call a slow external system, or survive a deploy, that matters more than a neat local demo.

Choose Pydantic AI when your production question is: "Can we make the agent a typed, testable, observable Python service that can swap providers and survive long-running work?"

The Production Implementation Rule

The safe implementation rule is to wire the framework around the operating loop before the first external user sees it. The minimum loop is run state, trace, eval, approval, cost, and fallback.

For OpenAI Agents SDK, start with an OpenAI-native path and keep it boring:

Pin the runtime boundary
Use the Responses model path unless you have a specific reason to use Chat Completions or a non-OpenAI provider. If you route through another provider, validate structured outputs, tools, usage metrics, and streaming with the exact backend before launch.
Set trace policy before real data
Tracing is enabled by default, and sensitive generation and function data are captured by default. Decide whether to disable sensitive data capture, replace the trace processor, or disable tracing for policy reasons before production prompts contain customer data.
Make approvals resumable
Use needs_approval on sensitive tools and store RunState in a database or queue when a run pauses. Version the agent definition next to the serialized state so a pending approval can resume against compatible code.
Flush traces for background work
For queue workers and background tasks, close the trace context and call flush_traces() before acknowledging the job when the trace must be visible for review.

For Pydantic AI, start with the app contract:

Define dependencies and outputs first
Make account context, permissions, retrieval filters, and final outputs typed. The model should fill an application contract, not invent the contract at runtime.
Use model abstraction deliberately
Pick the initial provider, then write a test path with TestModel or FunctionModel. If portability is part of the promise, test the second provider before launch, not after the first outage.
Route deferred work through the app
Use deferred tools for approvals and external work. Decide which calls resolve inline and which calls end the run for a UI, queue, or policy service to resolve.
Choose durability when waits can outlive a process
Use a durable execution integration when the run may wait on human approval, external work, deploys, retries, or worker restarts. Treat the durable workflow as production infrastructure, not a code sample.

The implementation line that flips the decision is approvals. If approval is simply "pause this SDK run, show the tool call, resume the SDK run," OpenAI Agents SDK is cleaner. If approval is "send this to our product UI, attach policy metadata, wait for a reviewer, maybe execute externally, then continue with typed history," Pydantic AI is cleaner.

What Breaks First

OpenAI Agents SDK breaks first when teams assume OpenAI-native features travel cleanly to every backend. The docs support non-OpenAI paths, but they also warn about feature differences across structured outputs, multimodal input, hosted file search, hosted web search, and tools. A provider-compatible model is not the same thing as a provider-compatible agent run. Validate the full request shape, including tools, output schema, streaming, usage, trace export, retries, and failure semantics.

It also breaks when trace policy is decided late. Tracing is enabled by default, and sensitive data capture is enabled by default. That is a good developer default and a risky production default if prompts, tool arguments, or outputs include customer data. Under Zero Data Retention policy, OpenAI tracing is unavailable, so the trace plan has to move to a custom processor or another observability system.

Pydantic AI breaks first when teams mistake typed code for an operating layer. Typed outputs reduce a class of runtime mistakes, but they do not approve tool calls, triage failures, attribute spend, or close the loop on bad answers by themselves. Deferred tools and durable execution are strong primitives only if the app supplies the reviewer experience, queue semantics, timeout policy, and audit log.

It also breaks when provider portability is treated as automatic. Pydantic AI can swap model objects, but provider behavior still varies. Tool schemas, structured output strictness, finish reasons, safety filters, latency, and usage reporting must be verified per provider. Use FallbackModel for resilience, but only after you define which failures are safe to replay.

The production smell is the same in both frameworks: an agent can take a real action, but nobody can answer who approved it, what the model saw, what tool arguments were used, how much the run cost, why the output passed, and how to replay the state.

Decision Rule

Choose OpenAI Agents SDK if your system is OpenAI-first, your team wants a small runtime, and the most valuable built-ins are trace inspection, guardrails, handoffs, sessions, MCP tools, and resumable tool approval.

Choose Pydantic AI if your system is Python-first, provider-flexible, domain-typed, and likely to need deferred tools, app-owned approval UX, Logfire or OpenTelemetry, eval monitoring, and durable execution.

Use both only when each owns a clear layer. A sensible hybrid is Pydantic AI for typed domain services and provider portability, with OpenAI Agents SDK used for an OpenAI-native specialist workflow that benefits from its run loop and trace surface. Do not stack them because both say "agents." Stack them only when the operational boundary is obvious.

Is Pydantic AI better than OpenAI Agents SDK?

Pydantic AI is better when typed Python control, provider portability, durable execution, and app-owned approval flows matter more than OpenAI-native runtime integration. OpenAI Agents SDK is better when the agent should stay close to the OpenAI Responses stack, tracing, handoffs, guardrails, and resumable SDK runs.

Can OpenAI Agents SDK use non-OpenAI models?

Yes. The docs describe set_default_openai_client, ModelProvider, per-agent Agent.model, and third-party adapters for non-OpenAI or mixed-provider setups. Production teams still need to validate structured outputs, tools, usage, streaming, and provider-specific request behavior before launch.

Does Pydantic AI support OpenAI models?

Yes. Pydantic AI supports OpenAI models and OpenAI-compatible providers, and it can also use providers such as Anthropic, Gemini, Grok, Mistral, Bedrock, OpenRouter, Vercel, and others through its model/provider system.

Which is better for human approval gates?

OpenAI Agents SDK is better when the approval unit is a paused SDK run with RunState. Pydantic AI is better when approvals belong in your product UI, policy service, external worker, or durable workflow and need to resume through DeferredToolResults and message history.

Scope Your Agent Build

Design the agent runtime, approval flow, evals, observability, and production handoff before real users depend on it.

Last Updated

Jun 16, 2026

CategoryAgents

OpenAI Agents SDK vs Pydantic AI for Production Agents

The Verdict

The Axis That Separates Them

Production Comparison

OpenAI Agents SDK: Best When OpenAI Is the Runtime Boundary

Pydantic AI: Best When Typed Python Owns the Workflow

The Production Implementation Rule

Pin the runtime boundary

Set trace policy before real data

Make approvals resumable

Flush traces for background work

Define dependencies and outputs first

Use model abstraction deliberately

Route deferred work through the app

Choose durability when waits can outlive a process

What Breaks First

Decision Rule

Scope Your Agent Build

More from Agents

Context Engineering vs Prompt Engineering for Production Agents

Agent Memory for Production AI Systems

Google ADK vs LangGraph for Production Agents

OpenAI Agents SDK TypeScript vs Python for Production Agents

LangChain vs LangGraph for Production Agents

OpenAI Agents SDK vs LangGraph for Production Agents

One letter, every week. Working systems — not hot takes.