MCP vs Function Calling: The Production Decision Rule
Use function calling for app-local tools. Build MCP when a capability must be shared, discovered, approved, logged, and reused across agents.

Use function calling when the tool belongs to one application loop. Build an MCP server when the capability must be discovered, reused, governed, approved, and logged across agents, clients, or teams.
The Short Rule: Function Calling Is Invocation, MCP Is a Capability Boundary
Function calling is the right default for a narrow tool that only your application should execute. The model emits structured arguments, your code validates them, your application runs the function, and the result goes back into the conversation. That is a clean pattern for app-local behavior: quote a price, check a feature flag, classify an inbound request, or fetch a record that only this product flow needs.
MCP is the right boundary when the tool is no longer just a helper inside one app. A custom MCP server turns an internal capability into a discoverable service that can be called by compatible clients while keeping execution, credentials, authorization, and audit logic behind one server boundary. That boundary matters more than the protocol shape.
The mistake is treating MCP as a replacement for function calling. Function calling answers, "How does the model ask my application to do something?" MCP answers, "How do we expose a governed capability to multiple AI clients without copying the integration into every app?"

For a production team, the decision line is simple:
- Keep direct function calls for app-local tools with one owner, one runtime, and one approval path.
- Build MCP for shared capabilities that need discovery, scope-aware access, review, logging, or reuse across clients.
- Use both when an agent needs local workflow logic and shared internal tools.
That hybrid architecture is usually the durable one. Direct functions stay close to the product loop. MCP servers sit at the boundary of systems that need governance.
The Comparison That Matters in Production
The useful comparison is not syntax. It is where ownership, control, and failure handling live when the tool starts touching real systems.
This table is the decision. If the tool is a private implementation detail of one app, keep it direct. If it is a production capability that other agents, IDEs, workflows, or teams will consume, put it behind MCP and operate it like infrastructure.
What Function Calling Gives You
Function calling gives you the shortest controlled loop between a model and your application code. OpenAI defines function calling, also called tool calling, as a way to connect models to data and actions provided by your application. Function tools are defined by JSON Schema, and the model can return a tool call that your code executes application-side.

The production value is locality. Your application already has the user session, database transaction, feature flags, rate limit state, and product-specific validation. You can gate the function next to the code that understands the risk.
For example, an incident assistant inside your own ops dashboard can keep these as direct functions:
const tools = [
{
type: "function",
name: "create_incident_note",
description: "Append a note to the active incident timeline.",
parameters: {
type: "object",
additionalProperties: false,
properties: {
incident_id: { type: "string" },
note: { type: "string" },
visibility: { type: "string", enum: ["internal", "customer"] }
},
required: ["incident_id", "note", "visibility"]
},
strict: true
}
];That tool is not a platform capability. It is one product action with one local permission model. Keep it in the app, put strict: true on the schema, validate the user and incident state before execution, and log the result in the same event stream as every other incident change.
Function calling starts to strain when the callable set grows into an integration surface. OpenAI's docs state that callable function definitions count against the model context limit and are billed as input tokens. If you send a large list of tools on every request, the cost and latency tax becomes part of every run. If the same schemas are copied across products, each copy becomes a place for drift.
The local loop should still have production controls:
Validate before execution
Treat tool arguments as untrusted input. Schema adherence is necessary, but it is not authorization. Check tenant, actor, state, and side-effect policy before calling the function.
Record the attempted action
Log
run_id,model,tool_name,arguments_hash,actor_id,tenant_id,approval_state,latency_ms, anderror_type. Store redacted arguments only when the data policy allows it.Gate side effects
Use a local approval branch for actions that send messages, change records, cancel orders, trigger deploys, or touch customer-visible data.
Function calling is not less serious than MCP. It is just a smaller boundary. The risk is pretending a local helper is still local after three other agents start calling a copied version of it.
What MCP Gives You That Function Calling Does Not
MCP gives you a reusable interface for capabilities, not just a different way to describe a function. The current MCP specification defines an open protocol using JSON-RPC 2.0 messages between hosts, clients, and servers. Servers can expose tools, resources, and prompts. That is the key distinction: MCP is not only about executable actions.
Tools are functions the model can execute. MCP clients discover them with tools/list and invoke them with tools/call. Tool definitions include a name, description, inputSchema, optional outputSchema, annotations, and execution metadata. The inputSchema must be a valid JSON Schema object.
Resources are contextual data identified by URI. A server can expose files, database schemas, or application-specific information with resources/list and resources/read. Prompts are reusable structured messages and workflows that clients can discover with prompts/list and retrieve with prompts/get.
That broader surface changes the architecture. A "query customer data" capability can expose:
- a tool for
search_accounts - a resource for the current account schema
- a prompt for a standard account-risk review workflow
- an authorization boundary that scopes the user to the accounts they are allowed to inspect
- logs that show which client requested what and whether a human approved it
Direct function calling can implement the same backend behavior, but every app has to wire it. MCP makes the capability discoverable and reusable without embedding the integration in each agent loop.
OpenAI's remote MCP support shows the operational shape. The Responses API can use connectors and remote MCP servers through the mcp built-in tool type. A remote MCP server requires a server_url and may require OAuth authorization. The API can emit mcp_list_tools when it imports available tools and mcp_call when the model calls one. It also supports allowed_tools, which matters because OpenAI's docs warn that exposing many MCP tools can increase cost and latency.

The approval behavior is also a production clue. OpenAI requests approval by default before data is shared with a connector or remote MCP server, and recommends reviewing and optionally logging all data shared with a remote MCP server. That is the right posture. A remote tool boundary should make data movement visible.
MCP authorization is not magic. The spec says authorization is optional for implementations. When HTTP authorization is supported, the current spec defines OAuth 2.1 based flows, OAuth 2.0 Protected Resource Metadata, bearer token usage, and standard error behavior such as 401, 403, and 400. In practice, that means a production MCP server still needs real identity, scopes, tenant checks, token validation, audit logging, and a refusal path.
The payoff is a cleaner operating model:
- Clients discover only the tools they are allowed to see.
- Tool calls carry enough identity to enforce policy at the server.
- A shared capability can be updated once instead of copied into every app.
- Security review can focus on one server boundary and its allowed actions.
- Observability can show tool usage across agents, IDEs, workflows, and products.
Build MCP when that operating model is worth more than the extra runtime boundary.
The Architecture We Ship for Internal Tools
A custom MCP server should be treated like an internal API product for AI clients. The minimum useful design is not "wrap every endpoint and publish it." It is a scoped tool surface with explicit permissions, approvals, and logs.
For an internal engineering platform, the first server often exposes a small set of high-value capabilities:
- search runbooks
- read service ownership metadata
- query deploy status
- open an incident draft
- propose a rollback plan
The sensitive action is not the read. The sensitive action is the transition from "propose" to "execute." Keep that distinction in the tool design.
{
"server": "engineering-ops",
"tools": [
{
"name": "search_runbooks",
"risk": "read",
"approval": "none",
"scopes": ["runbooks:read"]
},
{
"name": "create_incident_draft",
"risk": "write",
"approval": "review",
"scopes": ["incidents:write"]
},
{
"name": "request_rollback",
"risk": "production_side_effect",
"approval": "required",
"scopes": ["deployments:rollback:request"]
}
]
}That policy object is not the MCP spec. It is the control layer your production system needs around the spec. The article belongs in the MCP lane because this is where most teams need help: the protocol is the connection standard, but the production work is selecting, constraining, approving, and observing the capabilities.
The server should emit one audit event for every attempted tool call, not just successful calls:
{
"event": "mcp_tool_call_attempted",
"run_id": "run_...",
"client": "incident-agent",
"server": "engineering-ops",
"tool": "request_rollback",
"actor_id": "user_...",
"tenant_id": "tenant_...",
"scope_result": "allowed",
"approval_state": "pending",
"arguments_hash": "sha256:...",
"latency_ms": null,
"error_type": null
}For side effects, store the approval state separately from the model message. OpenAI's Agents guidance frames human review as the approval path for tool calls: the run pauses, a person or policy approves or rejects the action, and execution resumes from state. Even if your agent runtime is not OpenAI's Agents SDK, the pattern holds. Approval is state, not a prompt instruction.
For private systems, do not expose an internal MCP server publicly just to make an AI product reach it. OpenAI's Secure MCP Tunnel is one pattern: it connects private MCP servers to supported OpenAI products without opening inbound firewall ports, using an outbound HTTPS path that pulls queued work, forwards requests locally, and returns responses through the same tunnel. The broader rule is vendor-independent: keep the server inside the trust boundary that already protects the underlying systems.
This is also where observability links back to the rest of the AI stack. If the MCP server becomes the capability boundary, traces and evals need to capture tool selection quality, approval outcomes, refusal rates, and error classes. For the observability tool decision, see the related comparison at /blog/langfuse-vs-langsmith-production-observability.
What Breaks First
Function calling breaks first through invisible growth. The local helper starts as a clean schema and a dispatch function. Then another agent copies it. Then a second provider needs a different wrapper. Then the schema has five subtly different versions. Then a sensitive action is guarded in one app and not another. The failure is not that function calling is weak. The failure is that the capability outgrew its boundary.
The concrete warning signs:
- the same tool schema appears in multiple repos
- credentials for different systems live in one agent runtime
- approval checks are implemented differently per app
- tool descriptions are tuned per model instead of per capability
- logs cannot answer which model asked for which action
- cost rises because every request carries tool definitions that are rarely used
MCP breaks first in a different way. It makes capabilities easier to expose, so teams expose too much. A server with a broad tool list becomes expensive and confusing for the model. A server without scopes becomes a soft permission bypass. A server without approval state turns "model-controlled" into "model-can-try-anything." A server without logs becomes another hidden integration layer.
The concrete warning signs:
- clients import broad tool lists instead of using
allowed_tools - tool annotations are trusted without server provenance
- OAuth scopes do not map to tool risk
- approval prompts exist in the UI but not in server-side enforcement
- resource reads are logged less carefully than tool calls
- network errors are retried without idempotency rules
- server inventory is missing, so nobody knows which agents can reach which tools
MCP's own spec is clear that tools represent arbitrary code execution and should be treated with caution. It also says hosts must obtain explicit user consent before invoking any tool, and implementors should build robust consent and authorization flows. That does not happen automatically when a server starts.
The operational answer is a small control plane:
Use direct calls when these controls are naturally local. Use MCP when centralizing them is the safer architecture.
Migration Path: Promote One Tool, Not the Whole System
The clean migration path is to promote the most reused or highest-risk capability first. Do not convert the entire agent system to MCP because the protocol is available. Move the tool that already behaves like shared infrastructure.
Find the boundary
Choose a tool that is copied across clients, uses sensitive credentials, or needs shared audit logs. Leave app-local behavior as function calling.
Define the minimum server surface
Expose the few tools, resources, and prompts that represent the capability. Do not mirror every backend endpoint.
Add policy before rollout
Map each tool to scopes, approval requirements, idempotency behavior, and redaction rules before connecting production clients.
Instrument discovery and calls
Log
tools/list, imported tools, refused tools, approvals, tool calls, errors, and latency. Tool discovery is part of the production surface.Keep direct functions for hot paths
If a tool is latency-sensitive, app-specific, and owned by one runtime, do not promote it just for symmetry.
The decision should reduce duplicated integration code or improve control. If MCP only adds a network hop and another deployment without solving ownership, discovery, or governance, keep the function direct.
The Final Decision Rule
Use function calling for product-local behavior. Build a custom MCP server for shared internal capabilities.
That rule holds across model providers and agent frameworks because it is an ownership rule. Function calling is a good application pattern. MCP is a good platform boundary. The best production systems use both, with direct calls for local behavior and MCP servers for capabilities that need to be discovered, governed, approved, reused, and observed.
For deeper MCP implementation patterns, the /writing/mcp lane covers custom server design, security boundaries, and production rollout decisions.
Is MCP just function calling with extra steps?
No. Function calling lets a model request a function from your application. MCP standardizes how capabilities are discovered, invoked, authorized, and reused across AI clients through a server boundary.
Does MCP replace function calling?
No. Keep direct function calls for app-local behavior. Use MCP for shared or governed capabilities. A mature production agent stack usually has both.
When should we build a custom MCP server?
Build one when a capability needs multiple clients, server-side auth, scoped access, resource discovery, approval controls, or audit logs. Those are platform concerns, not prompt concerns.
Does MCP add latency?
It can. Local stdio adds a process boundary and remote HTTP adds a network boundary. Keep latency-sensitive app-local tools direct unless the governance benefit is worth the hop.
Scope Your MCP Server
Design and ship a custom MCP server with scoped tools, approvals, audit logs, and production-ready client integration.
Jun 3, 2026




