MCP vs Function Calling: The Production Decision Rule

Use function calling for app-local tools. Build MCP when a capability must be shared, discovered, approved, logged, and reused across agents.

Wednesday, June 3, 2026Omid Saffari
MCP vs Function Calling: The Production Decision Rule

Use function calling when the tool belongs to one application loop. Build an MCP server when the capability must be discovered, reused, governed, approved, and logged across agents, clients, or teams.

The Short Rule: Function Calling Is Invocation, MCP Is a Capability Boundary

Function calling is the right default for a narrow tool that only your application should execute. The model emits structured arguments, your code validates them, your application runs the function, and the result goes back into the conversation. That is a clean pattern for app-local behavior: quote a price, check a feature flag, classify an inbound request, or fetch a record that only this product flow needs.

MCP is the right boundary when the tool is no longer just a helper inside one app. A custom MCP server turns an internal capability into a discoverable service that can be called by compatible clients while keeping execution, credentials, authorization, and audit logic behind one server boundary. That boundary matters more than the protocol shape.

The mistake is treating MCP as a replacement for function calling. Function calling answers, "How does the model ask my application to do something?" MCP answers, "How do we expose a governed capability to multiple AI clients without copying the integration into every app?"

Model Context Protocol specification
The current MCP specification defines hosts, clients, servers, JSON-RPC 2.0 messaging, and server features such as tools, resources, and prompts.

For a production team, the decision line is simple:

  • Keep direct function calls for app-local tools with one owner, one runtime, and one approval path.
  • Build MCP for shared capabilities that need discovery, scope-aware access, review, logging, or reuse across clients.
  • Use both when an agent needs local workflow logic and shared internal tools.

That hybrid architecture is usually the durable one. Direct functions stay close to the product loop. MCP servers sit at the boundary of systems that need governance.

The Comparison That Matters in Production

The useful comparison is not syntax. It is where ownership, control, and failure handling live when the tool starts touching real systems.

Production axisFunction callingMCP serverUse function calling whenUse MCP when
OwnershipTool schema, dispatch, credentials, and retries live in the appTool schema, execution, auth, and logs live behind a server boundaryOne product team owns the whole pathSeveral teams or clients need the same capability
DiscoveryThe app sends callable tools in the requestClients can discover tools with tools/listThe tool set is stable for that requestTools change or vary by user, role, or environment
SchemaFunction tools are JSON Schema definitions in the API requestMCP tools expose inputSchema and optional outputSchema through the serverThe schema is small and app-specificThe schema should be reused and versioned once
ContextContext is passed by the app in the conversation loopMCP adds resources and prompts in addition to toolsThe app already owns all needed contextClients need files, database schemas, docs, or prompt templates from the same boundary
ApprovalYour app implements review around the functionClients and platforms can require MCP approval before data or actions flowApproval is a simple local branchSensitive data or side effects need a review trail
AuthorizationThe app holds service credentialsHTTP MCP can use OAuth 2.1, protected resource metadata, bearer tokens, and scopesOne backend credential is enoughScope, tenant, user, or client identity must be enforced server-side
ObservabilityYou log inside each applicationYou can centralize tool usage at the MCP server boundaryOne app log is enoughYou need to answer which client called which tool with which approval state
CostTool definitions count against context and are billed as input tokensOpenAI remote MCP charges tokens for imported tool definitions and calls, with no extra per-tool-call feeThe schema is small and always neededTool lists are large enough to require filtering with allowed_tools
Failure modeApp bugs, schema drift, hidden credentialsServer sprawl, auth drift, tool list bloat, network failureSimplicity is worth tight couplingCentral control is worth another process or network hop

This table is the decision. If the tool is a private implementation detail of one app, keep it direct. If it is a production capability that other agents, IDEs, workflows, or teams will consume, put it behind MCP and operate it like infrastructure.

What Function Calling Gives You

Function calling gives you the shortest controlled loop between a model and your application code. OpenAI defines function calling, also called tool calling, as a way to connect models to data and actions provided by your application. Function tools are defined by JSON Schema, and the model can return a tool call that your code executes application-side.

OpenAI function calling guide
Function calling is an application-owned loop: define tools, receive a tool call, execute code, return output, then let the model continue.

The production value is locality. Your application already has the user session, database transaction, feature flags, rate limit state, and product-specific validation. You can gate the function next to the code that understands the risk.

For example, an incident assistant inside your own ops dashboard can keep these as direct functions:

TypeScript
const tools = [
  {
    type: "function",
    name: "create_incident_note",
    description: "Append a note to the active incident timeline.",
    parameters: {
      type: "object",
      additionalProperties: false,
      properties: {
        incident_id: { type: "string" },
        note: { type: "string" },
        visibility: { type: "string", enum: ["internal", "customer"] }
      },
      required: ["incident_id", "note", "visibility"]
    },
    strict: true
  }
];

That tool is not a platform capability. It is one product action with one local permission model. Keep it in the app, put strict: true on the schema, validate the user and incident state before execution, and log the result in the same event stream as every other incident change.

Function calling starts to strain when the callable set grows into an integration surface. OpenAI's docs state that callable function definitions count against the model context limit and are billed as input tokens. If you send a large list of tools on every request, the cost and latency tax becomes part of every run. If the same schemas are copied across products, each copy becomes a place for drift.

The local loop should still have production controls:

  1. Validate before execution

    Treat tool arguments as untrusted input. Schema adherence is necessary, but it is not authorization. Check tenant, actor, state, and side-effect policy before calling the function.

  2. Record the attempted action

    Log run_id, model, tool_name, arguments_hash, actor_id, tenant_id, approval_state, latency_ms, and error_type. Store redacted arguments only when the data policy allows it.

  3. Gate side effects

    Use a local approval branch for actions that send messages, change records, cancel orders, trigger deploys, or touch customer-visible data.

Function calling is not less serious than MCP. It is just a smaller boundary. The risk is pretending a local helper is still local after three other agents start calling a copied version of it.

What MCP Gives You That Function Calling Does Not

MCP gives you a reusable interface for capabilities, not just a different way to describe a function. The current MCP specification defines an open protocol using JSON-RPC 2.0 messages between hosts, clients, and servers. Servers can expose tools, resources, and prompts. That is the key distinction: MCP is not only about executable actions.

Tools are functions the model can execute. MCP clients discover them with tools/list and invoke them with tools/call. Tool definitions include a name, description, inputSchema, optional outputSchema, annotations, and execution metadata. The inputSchema must be a valid JSON Schema object.

Resources are contextual data identified by URI. A server can expose files, database schemas, or application-specific information with resources/list and resources/read. Prompts are reusable structured messages and workflows that clients can discover with prompts/list and retrieve with prompts/get.

That broader surface changes the architecture. A "query customer data" capability can expose:

  • a tool for search_accounts
  • a resource for the current account schema
  • a prompt for a standard account-risk review workflow
  • an authorization boundary that scopes the user to the accounts they are allowed to inspect
  • logs that show which client requested what and whether a human approved it

Direct function calling can implement the same backend behavior, but every app has to wire it. MCP makes the capability discoverable and reusable without embedding the integration in each agent loop.

OpenAI's remote MCP support shows the operational shape. The Responses API can use connectors and remote MCP servers through the mcp built-in tool type. A remote MCP server requires a server_url and may require OAuth authorization. The API can emit mcp_list_tools when it imports available tools and mcp_call when the model calls one. It also supports allowed_tools, which matters because OpenAI's docs warn that exposing many MCP tools can increase cost and latency.

OpenAI MCP and connectors guide
Remote MCP introduces discovery, filtering, approvals, and trace items such as mcp_list_tools and mcp_call.

The approval behavior is also a production clue. OpenAI requests approval by default before data is shared with a connector or remote MCP server, and recommends reviewing and optionally logging all data shared with a remote MCP server. That is the right posture. A remote tool boundary should make data movement visible.

MCP authorization is not magic. The spec says authorization is optional for implementations. When HTTP authorization is supported, the current spec defines OAuth 2.1 based flows, OAuth 2.0 Protected Resource Metadata, bearer token usage, and standard error behavior such as 401, 403, and 400. In practice, that means a production MCP server still needs real identity, scopes, tenant checks, token validation, audit logging, and a refusal path.

The payoff is a cleaner operating model:

  • Clients discover only the tools they are allowed to see.
  • Tool calls carry enough identity to enforce policy at the server.
  • A shared capability can be updated once instead of copied into every app.
  • Security review can focus on one server boundary and its allowed actions.
  • Observability can show tool usage across agents, IDEs, workflows, and products.

Build MCP when that operating model is worth more than the extra runtime boundary.

The Architecture We Ship for Internal Tools

A custom MCP server should be treated like an internal API product for AI clients. The minimum useful design is not "wrap every endpoint and publish it." It is a scoped tool surface with explicit permissions, approvals, and logs.

For an internal engineering platform, the first server often exposes a small set of high-value capabilities:

  • search runbooks
  • read service ownership metadata
  • query deploy status
  • open an incident draft
  • propose a rollback plan

The sensitive action is not the read. The sensitive action is the transition from "propose" to "execute." Keep that distinction in the tool design.

JSON
{
  "server": "engineering-ops",
  "tools": [
    {
      "name": "search_runbooks",
      "risk": "read",
      "approval": "none",
      "scopes": ["runbooks:read"]
    },
    {
      "name": "create_incident_draft",
      "risk": "write",
      "approval": "review",
      "scopes": ["incidents:write"]
    },
    {
      "name": "request_rollback",
      "risk": "production_side_effect",
      "approval": "required",
      "scopes": ["deployments:rollback:request"]
    }
  ]
}

That policy object is not the MCP spec. It is the control layer your production system needs around the spec. The article belongs in the MCP lane because this is where most teams need help: the protocol is the connection standard, but the production work is selecting, constraining, approving, and observing the capabilities.

The server should emit one audit event for every attempted tool call, not just successful calls:

JSON
{
  "event": "mcp_tool_call_attempted",
  "run_id": "run_...",
  "client": "incident-agent",
  "server": "engineering-ops",
  "tool": "request_rollback",
  "actor_id": "user_...",
  "tenant_id": "tenant_...",
  "scope_result": "allowed",
  "approval_state": "pending",
  "arguments_hash": "sha256:...",
  "latency_ms": null,
  "error_type": null
}

For side effects, store the approval state separately from the model message. OpenAI's Agents guidance frames human review as the approval path for tool calls: the run pauses, a person or policy approves or rejects the action, and execution resumes from state. Even if your agent runtime is not OpenAI's Agents SDK, the pattern holds. Approval is state, not a prompt instruction.

For private systems, do not expose an internal MCP server publicly just to make an AI product reach it. OpenAI's Secure MCP Tunnel is one pattern: it connects private MCP servers to supported OpenAI products without opening inbound firewall ports, using an outbound HTTPS path that pulls queued work, forwards requests locally, and returns responses through the same tunnel. The broader rule is vendor-independent: keep the server inside the trust boundary that already protects the underlying systems.

This is also where observability links back to the rest of the AI stack. If the MCP server becomes the capability boundary, traces and evals need to capture tool selection quality, approval outcomes, refusal rates, and error classes. For the observability tool decision, see the related comparison at /blog/langfuse-vs-langsmith-production-observability.

What Breaks First

Function calling breaks first through invisible growth. The local helper starts as a clean schema and a dispatch function. Then another agent copies it. Then a second provider needs a different wrapper. Then the schema has five subtly different versions. Then a sensitive action is guarded in one app and not another. The failure is not that function calling is weak. The failure is that the capability outgrew its boundary.

The concrete warning signs:

  • the same tool schema appears in multiple repos
  • credentials for different systems live in one agent runtime
  • approval checks are implemented differently per app
  • tool descriptions are tuned per model instead of per capability
  • logs cannot answer which model asked for which action
  • cost rises because every request carries tool definitions that are rarely used

MCP breaks first in a different way. It makes capabilities easier to expose, so teams expose too much. A server with a broad tool list becomes expensive and confusing for the model. A server without scopes becomes a soft permission bypass. A server without approval state turns "model-controlled" into "model-can-try-anything." A server without logs becomes another hidden integration layer.

The concrete warning signs:

  • clients import broad tool lists instead of using allowed_tools
  • tool annotations are trusted without server provenance
  • OAuth scopes do not map to tool risk
  • approval prompts exist in the UI but not in server-side enforcement
  • resource reads are logged less carefully than tool calls
  • network errors are retried without idempotency rules
  • server inventory is missing, so nobody knows which agents can reach which tools

MCP's own spec is clear that tools represent arbitrary code execution and should be treated with caution. It also says hosts must obtain explicit user consent before invoking any tool, and implementors should build robust consent and authorization flows. That does not happen automatically when a server starts.

The operational answer is a small control plane:

ControlFunction calling implementationMCP implementation
Tool allowlistSelect tools per requestFilter imported server tools with allowed_tools and server-side policy
ApprovalLocal workflow branch before executionApproval object checked by server before sensitive tool execution
IdentityApp session and service credentialsActor, client, tenant, server, scopes, and token validation
LoggingApp event log around function executionCentral audit event for list, call, refusal, approval, and error
CostCount repeated schema tokensCount imported tool definitions, tool calls, and filtered tool lists
ReliabilityLocal retries and validationTransport errors, protocol errors, tool errors, idempotency keys, and circuit breakers

Use direct calls when these controls are naturally local. Use MCP when centralizing them is the safer architecture.

Migration Path: Promote One Tool, Not the Whole System

The clean migration path is to promote the most reused or highest-risk capability first. Do not convert the entire agent system to MCP because the protocol is available. Move the tool that already behaves like shared infrastructure.

  1. Find the boundary

    Choose a tool that is copied across clients, uses sensitive credentials, or needs shared audit logs. Leave app-local behavior as function calling.

  2. Define the minimum server surface

    Expose the few tools, resources, and prompts that represent the capability. Do not mirror every backend endpoint.

  3. Add policy before rollout

    Map each tool to scopes, approval requirements, idempotency behavior, and redaction rules before connecting production clients.

  4. Instrument discovery and calls

    Log tools/list, imported tools, refused tools, approvals, tool calls, errors, and latency. Tool discovery is part of the production surface.

  5. Keep direct functions for hot paths

    If a tool is latency-sensitive, app-specific, and owned by one runtime, do not promote it just for symmetry.

The decision should reduce duplicated integration code or improve control. If MCP only adds a network hop and another deployment without solving ownership, discovery, or governance, keep the function direct.

The Final Decision Rule

Use function calling for product-local behavior. Build a custom MCP server for shared internal capabilities.

That rule holds across model providers and agent frameworks because it is an ownership rule. Function calling is a good application pattern. MCP is a good platform boundary. The best production systems use both, with direct calls for local behavior and MCP servers for capabilities that need to be discovered, governed, approved, reused, and observed.

For deeper MCP implementation patterns, the /writing/mcp lane covers custom server design, security boundaries, and production rollout decisions.

Is MCP just function calling with extra steps?

No. Function calling lets a model request a function from your application. MCP standardizes how capabilities are discovered, invoked, authorized, and reused across AI clients through a server boundary.

Does MCP replace function calling?

No. Keep direct function calls for app-local behavior. Use MCP for shared or governed capabilities. A mature production agent stack usually has both.

When should we build a custom MCP server?

Build one when a capability needs multiple clients, server-side auth, scoped access, resource discovery, approval controls, or audit logs. Those are platform concerns, not prompt concerns.

Does MCP add latency?

It can. Local stdio adds a process boundary and remote HTTP adds a network boundary. Keep latency-sensitive app-local tools direct unless the governance benefit is worth the hop.

Last Updated

Jun 3, 2026

CategoryMCP
Newsletter

One letter, every week. Working systems — not hot takes.

Build logs, agentic engineering decisions, agent failures, evals, and what survives real users. Sent weekly, never more.

Weekly. No spam. Unsubscribe anytime.