OpenAI Agents SDK TypeScript vs Python for Production Agents

Choose TypeScript or Python for OpenAI Agents SDK by production ownership: product runtime, worker path, tracing, guardrails, handoffs, MCP, and evals.

Tuesday, June 9, 2026

Omid Saffari

OpenAI Agents SDK TypeScript vs Python for Production Agents

Choose the OpenAI Agents SDK language by where the agent has to live in production: TypeScript for customer-facing product surfaces and edge/API runtimes, Python for data-heavy workers, offline eval loops, and teams already running Python orchestration. The wrong choice is not a syntax problem. It is where tracing, guardrails, handoffs, MCP lifecycle, and deployment ownership drift away from the service that owns the user.

The Verdict

Use TypeScript when the agent is part of the product path. Use Python when the agent is part of the worker path. That line matters more than which language your team prefers.

The OpenAI Agents SDK exists in both ecosystems. The TypeScript SDK installs as @openai/agents with zod; the Python SDK installs as openai-agents. Both are designed around the same core production primitives: agents, handoffs, guardrails, function tools, MCP server tool calling, sessions, human-in-the-loop mechanisms, tracing, sandbox agents, and realtime agents.

The production difference is ownership. A customer-facing SaaS agent embedded in a Next.js, Node, edge, or API product should usually stay in TypeScript so the same service owns auth, request context, streaming, errors, tracing flushes, and user handoff. A data repair agent, eval worker, batch analyst, or internal automation loop should usually stay in Python so the same service owns datasets, replay jobs, scoring, notebooks, and worker queues.

If the bigger question is whether to use OpenAI Agents SDK or LangGraph at all, start with OpenAI Agents SDK vs LangGraph for Production Agents. This piece is narrower: you have chosen OpenAI's SDK, and now you need to decide which implementation owns production.

OpenAI Agents SDK TypeScript documentation — OpenAI Agents SDK TypeScript

Production Comparison

The SDKs are close enough on primitives that the winner is the runtime boundary, not feature parity.

Axis	TypeScript SDK	Python SDK	Production decision	What breaks first
Install path	`npm install @openai/agents zod`; Zod v4 is required	`pip install openai-agents`	Match the service runtime that owns the agent loop	Split dependencies create duplicated config and drift
Tool validation	TypeScript functions become tools with automatic schema generation and Zod-powered validation	Python functions become tools with automatic schema generation and Pydantic-powered validation	Use the validator your app already trusts at the boundary	Tool inputs pass in one runtime but fail in the other
Product path	Strong fit for web/API, streaming UI, edge handlers, and TypeScript product teams	Works, but may become a sidecar service for web products	Keep customer-facing handoff, auth, and streaming close to the product	User context gets copied across services without a clear owner
Worker path	Works for Node workers and API jobs	Strong fit for data pipelines, eval workers, notebooks, and Python orchestration	Put batch eval, replay, and repair near the data stack	Offline evals lag because the runtime is not near the data
Tracing	Built-in tracing records generations, tool calls, handoffs, guardrails, and custom events	Same tracing categories, with `Runner.run`, `Runner.run_sync`, and `Runner.run_streamed` wrapped by default	Ensure trace export behavior matches the deployment target	Missing traces make failures unreplayable
MCP	Hosted MCP, Streamable HTTP, and stdio are supported; legacy SSE exists but Streamable HTTP or stdio is preferred	Hosted MCP, Streamable HTTP, HTTP with SSE, and stdio are supported	Choose transport by where tool calls execute and who approves them	Tool discovery latency and name collisions appear under load
Human approval	Built-in human-in-the-loop mechanisms and hosted MCP approval callbacks	Built-in human-in-the-loop mechanisms and local MCP approval patterns	Put approval where the operator actually works	Approval becomes a Slack message with no durable audit trail

Both SDKs use the same OpenAI idea: the SDK manages the control loop when you want turns, tool execution, guardrails, handoffs, or sessions handled by a runtime. The Responses API is still the right choice when you want to own the loop, tool dispatch, and state yourself, or when the workflow is short-lived and mainly returns a model response.

The mistake is using the SDK as a second hidden application. If your TypeScript product calls a Python agent service for every user turn, decide whether that Python service owns auth, approvals, trace metadata, and user-visible errors. If it does not, the TypeScript product is still the real owner, and the agent loop should probably live there.

Use TypeScript When The Agent Is In The Product Path

TypeScript is the right default when the agent sits in front of users. It keeps the agent loop close to the product runtime that already owns identity, request metadata, rate limits, streaming UI, and support escalation.

The official TypeScript install path is:

Bash

npm install @openai/agents zod

The SDK requires Zod v4, which is a useful signal. The TypeScript path is built for teams that want the model's tool inputs to line up with the same schema discipline they already use for product APIs.

TypeScript

import { Agent, run } from "@openai/agents";

const agent = new Agent({
  name: "Support Triage",
  instructions:
    "Route billing, refund, and account questions to the right internal path.",
});

const result = await run(agent, "A customer needs help with a refund request.");
console.log(result.finalOutput);

That snippet is not production by itself. The production version wraps the run with user identity, request ID, org ID, product tier, tool allow-list, trace group ID, and an approval policy for any action that changes customer state.

TypeScript is especially clean when your agent needs to stream through the same API route as the rest of the product. The service can keep a single request context from HTTP ingress through model call, tool call, handoff, and UI response. When the agent fails, the same product logs and error reporting path sees the failure.

Tracing is the catch. The TypeScript SDK records generations, tool calls, handoffs, guardrails, and custom events, but deployment runtime affects export behavior. In supported server runtimes, traces export on a regular interval. In Cloudflare Workers, the automatic export loop is unavailable even though tracing is enabled, so the request lifecycle should call getGlobalTraceProvider().forceFlush() before the worker is torn down. Browser tracing is disabled by default.

Use TypeScript when the public agent needs product-native streaming, UI handoff, tenant-aware tool filtering, product auth, and Zod-shaped inputs. Skip TypeScript when the agent mostly reads datasets, grades offline outputs, repairs batch jobs, or runs inside Python-owned infrastructure.

Use Python When The Agent Is In The Worker Path

Python is the right default when the agent is closer to data, evals, and long-running work than to the browser. It keeps the loop near the systems that already own datasets, scoring scripts, notebooks, job queues, and model evaluation.

The official Python install path is:

Bash

pip install openai-agents

The Python SDK uses the Responses API by default for OpenAI models, but it adds the higher-level runtime around model calls. That matters when the agent should manage turns, tool execution, guardrails, handoffs, sessions, artifacts, coordinated steps, real workspaces, or resumable execution through Sandbox agents.

Python

from agents import Agent, Runner

agent = Agent(
    name="Evaluation Triage",
    instructions="Classify failed agent runs and assign the next repair action.",
)

result = Runner.run_sync(agent, "A tool call failed after a policy check.")
print(result.final_output)

Python is the better owner for an offline evaluator that replays traces, grades outputs, groups failures, and proposes prompt or tool-policy changes. It is also the better owner for a batch agent that writes artifacts, inspects files, or runs inside worker infrastructure already built around Python.

The Python tracing defaults fit that shape. Tracing is enabled by default and can be disabled globally with OPENAI_AGENTS_DISABLE_TRACING=1, in code with set_tracing_disabled(True), or per run with RunConfig.tracing_disabled=True. The default tracing wraps Runner.run, Runner.run_sync, and Runner.run_streamed, and creates spans for agent runs, generations, function tools, guardrails, handoffs, and voice-related spans. The default batch trace processor exports in the background every few seconds, exports sooner when the in-memory queue reaches its trigger, and flushes on process exit.

There is one hard compliance boundary: for organizations operating under a Zero Data Retention policy using OpenAI APIs, tracing is unavailable. If your production agent needs ZDR, your observability plan cannot depend on OpenAI-hosted traces. You need application-owned logs, redaction, replay IDs, and evaluator records outside the SDK trace dashboard.

Use Python when the operational loop is worker-owned: eval suites, replay jobs, data enrichment, internal code or document agents, and batch repair flows. Skip Python as the primary runtime when every user turn has to cross back into a TypeScript product service for auth, streaming, and approval.

OpenAI Agents SDK Python documentation — OpenAI Agents SDK Python

Guardrails Decide The Runtime Owner

Guardrails are where a language split becomes visible. The SDK gives you input guardrails, output guardrails, and tool guardrails, but they do not protect the same surface.

Input guardrails run on the initial user input. Output guardrails run on the final agent output. Tool guardrails run on every function-tool invocation, with input checks before execution and output checks after execution. In the TypeScript SDK, runInParallel: true is the input guardrail default. That minimizes latency, but the model may already consume tokens or run tools before the guardrail trips. runInParallel: false blocks the model call first, which is the safer setting when a rejected request must not spend tokens or touch tools.

The production rule is simple: guardrail placement follows risk.

Gate The First User Input
Use input guardrails for policy, abuse, tenant eligibility, and request-shape checks that should run before the agent begins work. If a blocked request must not spend model tokens or run tools, configure the guardrail to run before the model call.
Gate Every Dangerous Tool
Use tool guardrails for write operations, payment changes, account changes, data exports, and other actions where the input and output of the tool matter more than the final answer text.
Gate The Final User Response
Use output guardrails for final-answer policy, disclosure, formatting, citation, or structured-output checks. Do not rely on output guardrails to stop a tool that has already executed.

One boundary is easy to miss: tool guardrails apply to function tools defined with tool(), but they do not apply to handoff calls, hosted tools, built-in execution tools, or agent.asTool(). Handoffs are presented to the model as function-like tools, but they run through the SDK's handoff path rather than the normal function-tool pipeline.

That means a multi-agent workflow needs explicit handoff policy. Handoffs are represented as tools to the LLM, and a handoff to an agent named Refund Agent is named transfer_to_refund_agent by default. By default, a handoff receives the entire conversation history unless TypeScript inputFilter or Python input_filter changes what the receiving agent sees.

For a customer-facing product, this usually argues for TypeScript ownership: the product runtime already knows which user, tenant, plan, and approval state is attached to the request. For a worker agent, this usually argues for Python ownership: the worker already knows which dataset, job, trace, and evaluation policy applies.

MCP Is A Deployment Choice

MCP support is not just a checkbox. The transport you choose decides who calls the tool, where latency appears, how approval works, and whether the model sees too many tools.

The TypeScript SDK supports Hosted MCP server tools, Streamable HTTP MCP servers, and stdio MCP servers. It also includes legacy SSE support, but the docs prefer Streamable HTTP or stdio for new integrations because SSE has been deprecated by the MCP project. The Python SDK understands Hosted MCP server tools, Streamable HTTP, HTTP with SSE, and stdio.

Hosted MCP pushes the round trip into the model path: the OpenAI Responses API invokes the remote endpoint and streams the result back to the model. That can be clean for public, remote tools. It also means approval policy has to be deliberate. Hosted MCP supports requireApproval: "always" or a fine-grained object mapping tool names to never and always, plus onApproval for programmatic approval or rejection.

Local Streamable HTTP and stdio MCP are different. The agent runtime calls the MCP server, discovers tools, and handles failures locally. In TypeScript, each Agent run may call list_tools() for Streamable HTTP and stdio servers; cacheToolsList: true can cache the list when the tool set is stable. In Python, every agent run calls list_tools() on each MCP server, remote servers can add noticeable latency, and caching can reduce tool-list calls.

The operational failure mode is not mysterious. A team exposes every internal tool to an agent, skips server-prefixed names, and then discovers that tool discovery is slow and duplicate names make tool choice ambiguous. The fix is boring and necessary: expose only the needed tools, prefix local MCP tool names by server when collisions are possible, cache stable tool lists, and put approval on irreversible operations.

Use TypeScript MCP when the MCP tools are part of the product request path and the product service owns approval. Use Python MCP when the tools are worker, filesystem, data, or batch operations. Do not hide a production write behind hosted MCP without an approval policy and a durable audit record.

The Bridge Pattern

Many production teams should use both SDKs, but not inside the same control loop. The bridge pattern keeps the public agent and the evaluation worker separate.

Put the live customer path in TypeScript:

Accept the user request.
Attach tenant, user, trace group, and product metadata.
Run the agent with product-owned tools and guardrails.
Require approval for state-changing tools.
Persist the trace ID, user-visible output, tool calls, and guardrail outcomes.

Put replay and evaluation in Python:

Pull stored traces, tool outcomes, user ratings, and support escalations.
Reconstruct failed runs from persisted metadata.
Score answers and tool choices against eval datasets.
Propose prompt, tool, or guardrail changes.
Feed approved changes back into the TypeScript runtime through versioned configuration.

This split keeps the user path fast and owned by the product team while giving the AI engineering team a Python-native lane for evals and batch repair. It also prevents the common failure where a Python sidecar owns the agent logic but the TypeScript product owns the user, so neither service has a complete picture when something breaks.

For teams building production agents, the broader agents writing lane should read like this: keep the control loop observable, keep approval close to risk, and keep evals close to the data that proves whether the agent worked.

What To Log Before Launch

The language decision is not done until the logs prove ownership. Log the same production record whether the SDK runtime is TypeScript or Python.

At minimum, persist:

agent_runtime: typescript or python
workflow_name: the logical workflow attached to the trace
trace_id and group_id: the run and conversation linkage
model_input_redaction_policy: whether sensitive data was captured, redacted, or excluded
guardrail_results: input, output, and tool guardrail outcomes
handoff_events: source agent, target agent, handoff reason, and filtered history policy
mcp_server: transport, server name, tool name, approval status, and tool-list cache policy
human_approval: approver, decision, reason, and resulting tool call state
final_output: user-visible answer or structured artifact reference

This is the point where the TypeScript versus Python answer becomes clear. If your product service can log that record cleanly without handing off half the state, TypeScript should own the live loop. If your worker service can log it cleanly and the user path is not involved, Python should own the loop.

FAQ

Is OpenAI Agents SDK TypeScript production-ready?

Yes, when it is wired as a product runtime instead of a demo script. The TypeScript SDK includes agent loops, sandbox execution, handoffs, guardrails, function tools, MCP server tool calling, sessions, human-in-the-loop mechanisms, tracing, and realtime agents, but production readiness depends on trace export, approval policy, tool filtering, and deployment ownership.

Is OpenAI Agents SDK Python better than TypeScript?

Python is better when the agent is owned by data, eval, worker, or batch infrastructure. TypeScript is better when the agent is owned by the product/API runtime. The SDK primitives overlap enough that runtime ownership should decide.

Can one production system use both SDKs?

Yes. Keep the live customer control loop in one runtime and put replay, eval, or repair work in the other. The bad version is two services sharing one live agent loop with no single owner for traces, approvals, and user-visible errors.

Should I use the Agents SDK or the Responses API directly?

Use the Responses API directly when you want to own the loop, tool dispatch, and state handling yourself, or when the workflow is short-lived and mainly returns a model response. Use the Agents SDK when the runtime should manage turns, tool execution, guardrails, handoffs, sessions, artifacts, or coordinated steps.

Does the OpenAI Agents SDK support MCP?

Yes. The TypeScript SDK supports Hosted MCP, Streamable HTTP, and stdio MCP server tools. The Python SDK supports Hosted MCP, Streamable HTTP, HTTP with SSE, and stdio. Treat MCP as a deployment and approval decision, not only a tool-loading feature.

Scope Your Agent Build

Design the production agent loop, runtime boundary, guardrails, MCP tools, approvals, tracing, and eval path before traffic hits it.

Last Updated

Jun 9, 2026

CategoryAgents

OpenAI Agents SDK TypeScript vs Python for Production Agents

The Verdict

Production Comparison

Use TypeScript When The Agent Is In The Product Path

Use Python When The Agent Is In The Worker Path

Guardrails Decide The Runtime Owner

Gate The First User Input

Gate Every Dangerous Tool

Gate The Final User Response

MCP Is A Deployment Choice

The Bridge Pattern

What To Log Before Launch

FAQ

Scope Your Agent Build

More from Agents

Context Engineering vs Prompt Engineering for Production Agents

Agent Memory for Production AI Systems

OpenAI Agents SDK vs Pydantic AI for Production Agents

Google ADK vs LangGraph for Production Agents

LangChain vs LangGraph for Production Agents

OpenAI Agents SDK vs LangGraph for Production Agents

One letter, every week. Working systems — not hot takes.