How to Build an MCP Server for Production Internal APIs

Build an MCP server with typed tools, HTTP transport, OAuth, approval gates, logs, and a release checklist for internal APIs.

Wednesday, July 1, 2026

Omid Saffari

How to Build an MCP Server for Production Internal APIs

Build the MCP server like a production API boundary for agents: narrow typed tools, read-only resources, explicit prompts, local stdio for development, and Streamable HTTP behind OAuth, approvals, logs, and release gates for anything remote.

The Production Shape

An MCP server is worth building when an internal capability needs to be shared safely across AI hosts, not when one app needs one private function call. If the tool only belongs inside one backend, keep it as local tool calling. If the same capability should work from Claude Desktop, Claude Code, ChatGPT Apps, Codex, an internal agent console, or a future host, put it behind MCP and treat it like a productized interface.

MCP has three moving parts. The host is the AI application. The client is the connection the host opens to one MCP server. The server is the program that exposes context and actions. The official architecture docs describe local stdio servers as typically serving one MCP client, while remote Streamable HTTP servers typically serve many clients.

That distinction should drive the build:

Decision	Use this	Why
Local prototype	stdio	Fastest path, one host, no network auth surface
Developer workstation tool	stdio	Keeps secrets in the local environment and avoids exposing an HTTP endpoint
Shared internal API	Streamable HTTP	Many clients need the same server, so you need auth, sessions, rate limits, and logs
Sensitive mutation	MCP tool plus approval	The model can request the action, but the user or policy gate can deny it
Read-only context	MCP resource	The application chooses context without letting the model execute an action

The production rule is simple: MCP is not the agent. MCP is the contract between the agent surface and the system it wants to use. The contract needs names, schemas, auth, approval rules, trace IDs, and rollback. The deeper build-vs-local-tool line is covered in the MCP versus function calling decision rule; this guide starts after that decision is made.

Choose The Primitive Before You Write Code

The first implementation decision is whether each capability is a tool, a resource, or a prompt. Most weak MCP servers expose everything as tools. That gives the model too much agency and gives the operator too little structure.

Primitive	Production use	Control owner	Internal API example
Tool	Execute an operation	Model requests, client approves	`lookup_order`, `refund_payment`, `create_ticket`
Resource	Provide read-only context	Application selects and injects	`orders://schema`, `runbook://refund-policy`, `account://profile/{id}`
Prompt	Start a reusable workflow	User explicitly chooses	`investigate_failed_refund`, `draft_escalation`

Use tools for narrow actions with typed inputs and outputs. A tool should do one thing, accept one explicit schema, and return a compact result that the model can reason about. Do not expose a generic query_database tool to a broad production host. Expose lookup_customer_invoice, list_recent_failed_payments, or open_support_ticket with exact argument names, policy checks, and output schemas.

Use resources for context the application can read without asking the model to act. A database schema, a permissions matrix, an account profile, or a runbook is usually a resource before it is a tool. That separation matters because a resource is application-driven context, while a tool is model-invoked execution.

Use prompts for workflows that should begin with human intent. If the operation has a named business process, make it a prompt that guides the user and model through known resources and tools. For example, investigate_failed_refund can read a refund-policy resource, call a read-only transaction lookup tool, and stop before a mutation until approval is granted.

Build The First Server Around One Internal API

Start with one read-only or low-risk API wrapper, not the whole internal platform. A good first server exposes one domain, such as orders, incidents, invoices, feature flags, or customer records. The goal is to prove the protocol boundary, not to mirror every route from the backing service.

The official TypeScript build path uses McpServer, Zod schemas, and stdio for the starter server. This minimal example wraps an internal order lookup as a typed tool and returns structured content for the model and client:

TypeScript

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

type Order = {
  id: string;
  status: "pending" | "paid" | "fulfilled" | "refunded";
  totalCents: number;
  customerTier: "standard" | "priority";
};

const demoOrders = new Map<string, Order>([
  [
    "ord_10001",
    {
      id: "ord_10001",
      status: "paid",
      totalCents: 12900,
      customerTier: "priority",
    },
  ],
]);

async function findOrder(orderId: string): Promise<Order | null> {
  return demoOrders.get(orderId) ?? null;
}

const server = new McpServer({
  name: "orders-mcp",
  version: "1.0.0",
});

server.registerTool(
  "lookup_order",
  {
    title: "Lookup order",
    description: "Return a compact order status by order id.",
    inputSchema: {
      orderId: z.string().min(8).describe("Internal order id, for example ord_10001"),
    },
    outputSchema: {
      orderId: z.string(),
      status: z.enum(["pending", "paid", "fulfilled", "refunded"]),
      totalCents: z.number().int().nonnegative(),
      customerTier: z.enum(["standard", "priority"]),
    },
  },
  async ({ orderId }) => {
    const order = await findOrder(orderId);

    if (!order) {
      return {
        isError: true,
        content: [{ type: "text", text: `Order ${orderId} was not found.` }],
      };
    }

    return {
      structuredContent: {
        orderId: order.id,
        status: order.status,
        totalCents: order.totalCents,
        customerTier: order.customerTier,
      },
      content: [
        {
          type: "text",
          text: `Order ${order.id} is ${order.status}.`,
        },
      ],
    };
  },
);

const transport = new StdioServerTransport();
await server.connect(transport);
console.error("orders-mcp running on stdio");

This is intentionally small. The tool name is stable, the argument is typed, the output is structured, and missing data is a tool execution error with isError: true, not a protocol crash. The latest tools spec says clients should provide tool execution errors to language models so they can self-correct, which is exactly what you want when a user pastes the wrong order ID.

Do not write ordinary logs to stdout in a stdio server. stdout carries JSON-RPC messages, and the transport spec says the server must not write anything there that is not a valid MCP message. Use stderr locally, and move to structured application logs when the server becomes HTTP.

Wrap one internal API
Pick a read-only route, define the smallest useful input schema, and return the smallest structured output the model needs. Do not expose your backing API shape directly if it includes secrets, internal IDs, or fields the model should not see.
Add policy before mutation
For write tools, put the policy check inside the tool handler and return a denial as a tool execution error. A model should never learn that a write succeeded when the backing policy rejected it.
Test with a real host
Connect the local stdio server to an MCP host or the MCP Inspector, call the tool with valid and invalid input, and confirm the model sees useful correction data without seeing secrets.
Freeze the contract
Keep the tool name, input schema, and output schema stable once another host depends on them. Add new tools or new optional fields instead of changing the meaning of existing fields.

Use The Right Transport For The Trust Boundary

Use stdio until the server needs to serve more than one client or live outside the developer machine. Stdio is the right local development transport because it avoids a network listener and keeps credentials in the local environment. It is also brittle in one specific way: stdout is protocol traffic, so any accidental console.log can corrupt the session.

Use Streamable HTTP when the MCP server becomes a shared internal service. The current transport spec defines Streamable HTTP as an independent process that accepts JSON-RPC messages at a single MCP endpoint. It supports remote communication, streaming, session management, and standard HTTP authentication methods.

The remote shape should look like this:

Txt

AI host
  -> MCP client
  -> HTTPS MCP endpoint
  -> auth middleware
  -> tool router
  -> policy check
  -> internal API client
  -> structuredContent + content + logs

For HTTP, the transport spec adds operational rules that should be release blockers, not nice-to-haves. Validate the Origin header to reduce DNS rebinding risk. Bind local servers to 127.0.0.1 rather than 0.0.0.0. Require authentication. After initialization, clients include MCP-Protocol-Version on subsequent HTTP requests, and unsupported versions should fail cleanly.

Add Authorization Before You Expose The Endpoint

Remote MCP authorization is OAuth infrastructure, not a shared API key in a config file. The current authorization spec says HTTP-based transports should conform to the MCP authorization rules, while stdio transports should not follow that flow and should retrieve credentials from the environment.

For a protected remote server, the roles are familiar:

Role	MCP meaning
Resource server	The protected MCP server accepting access tokens
OAuth client	The MCP client making requests on behalf of the resource owner
Authorization server	The identity system that authenticates the user and issues access tokens

The MCP server must implement OAuth 2.0 Protected Resource Metadata so clients can discover the authorization server. Access tokens belong in the Authorization header, not the URI query string. The server must validate that the token was issued specifically for that MCP server as the intended audience.

That audience check is the line between "the user is logged in somewhere" and "this client is allowed to call this server." If the same bearer token can be replayed against multiple internal MCP servers, the boundary is weak. The deeper authorization flow belongs in the MCP authorization production guide, but the build rule is straightforward: no remote endpoint ships until auth discovery, token validation, audience checks, tenant checks, and denial logs are working.

For internal APIs, put authorization in three places:

At the transport edge: reject unauthenticated HTTP requests before they reach the MCP router.
At the tool boundary: check user, tenant, role, resource, and action before calling the backing API.
At the backing service: keep the original API authorization in place so MCP is not the only gate.

The server should log every authorization decision with a trace ID, tool name, subject, tenant, resource, action, allow or deny result, and policy version. The log does not need raw payloads. It needs enough structure to answer: who asked, what did the model request, what did the server permit, what did the backing API do, and where did a human approve.

Ship The Release Gate

The release gate is what turns an MCP tutorial into a production server. Before launch, every tool needs a contract, a threat check, an approval rule, an eval, and an operational owner.

Use this gate for each exposed tool:

Gate	Pass condition
Name	One user intent, no generic admin verbs
Input schema	Valid JSON Schema, typed, narrow, tenant-safe
Output schema	Compact `structuredContent` with no secrets
Authz	Subject, tenant, resource, action, and audience checks
Approval	Sensitive operations require human confirmation or a policy hold
Idempotency	Retried calls do not create duplicate writes
Logging	Trace ID ties host request, tool call, policy decision, backing API call, and result
Evals	Golden prompts cover success, denial, bad input, ambiguous input, and retry
Rollback	Tool can be disabled without redeploying every host

The latest tools spec says there should always be a human in the loop with the ability to deny tool invocations, and clients should prompt for confirmation on sensitive operations. Treat that as the minimum. For high-risk actions, add a server-side approval state too. A client prompt protects the user interface. A server approval gate protects the system.

For example, do not let refund_payment call the payment API directly:

TypeScript

server.registerTool(
  "request_refund",
  {
    title: "Request refund",
    description: "Create a refund request for human approval.",
    inputSchema: {
      orderId: z.string().min(8),
      reason: z.string().min(10).max(500),
    },
    outputSchema: {
      requestId: z.string(),
      status: z.enum(["pending_approval"]),
    },
  },
  async ({ orderId, reason }) => {
    const request = await createRefundApprovalRequest({ orderId, reason });

    return {
      structuredContent: {
        requestId: request.id,
        status: "pending_approval",
      },
      content: [
        {
          type: "text",
          text: `Refund request ${request.id} is pending approval.`,
        },
      ],
    };
  },
);

That pattern gives the model a useful next step without handing it direct money movement. The model can explain what happened, the user can approve or deny in a trusted surface, and the audit trail can show exactly which request moved from suggestion to action.

The same gate should catch prompt injection. Treat tool annotations and user-provided instructions as untrusted unless they come from a trusted server. Validate inputs server-side even when the client already validated them. Return clear execution errors for invalid requests, but keep internal policy details out of the model-visible text.

What To Log And Evaluate

Log the lifecycle of the tool call, not just the HTTP request. An MCP server sits between a probabilistic caller and deterministic internal systems, so the useful trace starts before the backing API call and ends after the model receives the result.

Minimum log fields:

trace_id
host
mcp_client_id
protocol_version
tool_name
tool_version
subject_id
tenant_id
resource_id
policy_version
approval_state
authz_result
input_schema_version
output_schema_version
backing_api_status
is_error
latency_ms

Keep raw secrets, access tokens, payment details, and unnecessary customer content out of logs. If the support or security team needs payload inspection, store a redacted payload with a retention policy and a break-glass path.

The eval set should be small and sharp at launch:

Eval case	Expected result
Valid read	Tool returns structured content and concise text
Unknown resource	Tool returns `isError: true` with a model-correctable message
Cross-tenant request	Server denies before backing API call
Mutation without approval	Server creates approval request or denies
Prompt injection in argument	Server ignores instruction-like content and validates the field
Retry after timeout	Tool is idempotent or reports pending state

This is also where cost control belongs. An MCP server usually is not expensive because of the SDK wrapper. It gets expensive when broad tools create long model loops, retries, large structured payloads, or manual review queues with no ownership. Compact outputs, narrow tools, pagination, rate limits, and clear errors reduce both model cost and operational cost.

The Build Order

The clean production order is narrower than most teams expect:

Pick one internal API domain.
Split capabilities into tools, resources, and prompts.
Build a local stdio server with one read-only tool.
Add structured outputs and execution errors.
Add policy checks inside the handler.
Test the server through a real MCP host or Inspector.
Move to Streamable HTTP only when more than one client needs it.
Add OAuth discovery, audience validation, tenant checks, and denial logs.
Add approval gates for every sensitive write.
Add evals, dashboards, rate limits, and a kill switch.

Stop before adding a generic admin tool. The best MCP servers are boring in exactly the right way: obvious tool names, small schemas, predictable results, explicit approvals, and logs that make incidents reconstructable.

Can I build my own MCP server?

Yes. Start with one internal capability, expose it as a typed MCP tool, and test it locally over stdio before creating a remote endpoint. The production work is not the first handler. It is auth, schemas, approvals, logging, evals, and operations.

How do I start an MCP server?

Start with the official SDK for your stack, define McpServer, register one tool with an input schema, connect it over stdio, and call it from an MCP host or Inspector. Move to Streamable HTTP only after the local contract works.

Can I run an MCP server locally?

Yes. Stdio is the normal local transport. Keep ordinary logs on stderr, not stdout, because stdout carries MCP JSON-RPC messages.

How much does it cost to run an MCP server?

The wrapper code is usually not the cost center. The cost comes from hosting, identity, observability, review queues, retries, broad outputs, and maintenance. Narrow tools and compact structured results keep both model and operations cost under control.

Build an MCP Server

Design and ship a production MCP server with typed tools, auth, approvals, observability, and release gates for your internal APIs.

Last Updated

Jul 1, 2026

CategoryMCP

How to Build an MCP Server for Production Internal APIs

The Production Shape

Choose The Primitive Before You Write Code

Build The First Server Around One Internal API

Wrap one internal API

Add policy before mutation

Test with a real host

Freeze the contract

Use The Right Transport For The Trust Boundary

Add Authorization Before You Expose The Endpoint

Ship The Release Gate

What To Log And Evaluate

The Build Order

Build an MCP Server

More from MCP

Video Probe MCP Build Log

AWS MCP Server for Production Agents: The Build-or-Boundary Rule

MCP Sampling vs Elicitation for Production Servers

MCP Resources vs Tools: The Production Server Rule

MCP Authorization for Production Servers

MCP Security Best Practices for Production Servers

MCP vs Function Calling: The Production Decision Rule

One letter, every week. Working systems — not hot takes.