How to Build an MCP Server for Production Internal APIs
Build an MCP server with typed tools, HTTP transport, OAuth, approval gates, logs, and a release checklist for internal APIs.

Build the MCP server like a production API boundary for agents: narrow typed tools, read-only resources, explicit prompts, local stdio for development, and Streamable HTTP behind OAuth, approvals, logs, and release gates for anything remote.
The Production Shape
An MCP server is worth building when an internal capability needs to be shared safely across AI hosts, not when one app needs one private function call. If the tool only belongs inside one backend, keep it as local tool calling. If the same capability should work from Claude Desktop, Claude Code, ChatGPT Apps, Codex, an internal agent console, or a future host, put it behind MCP and treat it like a productized interface.
MCP has three moving parts. The host is the AI application. The client is the connection the host opens to one MCP server. The server is the program that exposes context and actions. The official architecture docs describe local stdio servers as typically serving one MCP client, while remote Streamable HTTP servers typically serve many clients.
That distinction should drive the build:
The production rule is simple: MCP is not the agent. MCP is the contract between the agent surface and the system it wants to use. The contract needs names, schemas, auth, approval rules, trace IDs, and rollback. The deeper build-vs-local-tool line is covered in the MCP versus function calling decision rule; this guide starts after that decision is made.
Choose The Primitive Before You Write Code
The first implementation decision is whether each capability is a tool, a resource, or a prompt. Most weak MCP servers expose everything as tools. That gives the model too much agency and gives the operator too little structure.
Use tools for narrow actions with typed inputs and outputs. A tool should do one thing, accept one explicit schema, and return a compact result that the model can reason about. Do not expose a generic query_database tool to a broad production host. Expose lookup_customer_invoice, list_recent_failed_payments, or open_support_ticket with exact argument names, policy checks, and output schemas.
Use resources for context the application can read without asking the model to act. A database schema, a permissions matrix, an account profile, or a runbook is usually a resource before it is a tool. That separation matters because a resource is application-driven context, while a tool is model-invoked execution.
Use prompts for workflows that should begin with human intent. If the operation has a named business process, make it a prompt that guides the user and model through known resources and tools. For example, investigate_failed_refund can read a refund-policy resource, call a read-only transaction lookup tool, and stop before a mutation until approval is granted.
Build The First Server Around One Internal API
Start with one read-only or low-risk API wrapper, not the whole internal platform. A good first server exposes one domain, such as orders, incidents, invoices, feature flags, or customer records. The goal is to prove the protocol boundary, not to mirror every route from the backing service.
The official TypeScript build path uses McpServer, Zod schemas, and stdio for the starter server. This minimal example wraps an internal order lookup as a typed tool and returns structured content for the model and client:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";
type Order = {
id: string;
status: "pending" | "paid" | "fulfilled" | "refunded";
totalCents: number;
customerTier: "standard" | "priority";
};
const demoOrders = new Map<string, Order>([
[
"ord_10001",
{
id: "ord_10001",
status: "paid",
totalCents: 12900,
customerTier: "priority",
},
],
]);
async function findOrder(orderId: string): Promise<Order | null> {
return demoOrders.get(orderId) ?? null;
}
const server = new McpServer({
name: "orders-mcp",
version: "1.0.0",
});
server.registerTool(
"lookup_order",
{
title: "Lookup order",
description: "Return a compact order status by order id.",
inputSchema: {
orderId: z.string().min(8).describe("Internal order id, for example ord_10001"),
},
outputSchema: {
orderId: z.string(),
status: z.enum(["pending", "paid", "fulfilled", "refunded"]),
totalCents: z.number().int().nonnegative(),
customerTier: z.enum(["standard", "priority"]),
},
},
async ({ orderId }) => {
const order = await findOrder(orderId);
if (!order) {
return {
isError: true,
content: [{ type: "text", text: `Order ${orderId} was not found.` }],
};
}
return {
structuredContent: {
orderId: order.id,
status: order.status,
totalCents: order.totalCents,
customerTier: order.customerTier,
},
content: [
{
type: "text",
text: `Order ${order.id} is ${order.status}.`,
},
],
};
},
);
const transport = new StdioServerTransport();
await server.connect(transport);
console.error("orders-mcp running on stdio");This is intentionally small. The tool name is stable, the argument is typed, the output is structured, and missing data is a tool execution error with isError: true, not a protocol crash. The latest tools spec says clients should provide tool execution errors to language models so they can self-correct, which is exactly what you want when a user pastes the wrong order ID.
Do not write ordinary logs to stdout in a stdio server. stdout carries JSON-RPC messages, and the transport spec says the server must not write anything there that is not a valid MCP message. Use stderr locally, and move to structured application logs when the server becomes HTTP.
Wrap one internal API
Pick a read-only route, define the smallest useful input schema, and return the smallest structured output the model needs. Do not expose your backing API shape directly if it includes secrets, internal IDs, or fields the model should not see.
Add policy before mutation
For write tools, put the policy check inside the tool handler and return a denial as a tool execution error. A model should never learn that a write succeeded when the backing policy rejected it.
Test with a real host
Connect the local stdio server to an MCP host or the MCP Inspector, call the tool with valid and invalid input, and confirm the model sees useful correction data without seeing secrets.
Freeze the contract
Keep the tool name, input schema, and output schema stable once another host depends on them. Add new tools or new optional fields instead of changing the meaning of existing fields.
Use The Right Transport For The Trust Boundary
Use stdio until the server needs to serve more than one client or live outside the developer machine. Stdio is the right local development transport because it avoids a network listener and keeps credentials in the local environment. It is also brittle in one specific way: stdout is protocol traffic, so any accidental console.log can corrupt the session.
Use Streamable HTTP when the MCP server becomes a shared internal service. The current transport spec defines Streamable HTTP as an independent process that accepts JSON-RPC messages at a single MCP endpoint. It supports remote communication, streaming, session management, and standard HTTP authentication methods.
The remote shape should look like this:
AI host
-> MCP client
-> HTTPS MCP endpoint
-> auth middleware
-> tool router
-> policy check
-> internal API client
-> structuredContent + content + logsFor HTTP, the transport spec adds operational rules that should be release blockers, not nice-to-haves. Validate the Origin header to reduce DNS rebinding risk. Bind local servers to 127.0.0.1 rather than 0.0.0.0. Require authentication. After initialization, clients include MCP-Protocol-Version on subsequent HTTP requests, and unsupported versions should fail cleanly.
Add Authorization Before You Expose The Endpoint
Remote MCP authorization is OAuth infrastructure, not a shared API key in a config file. The current authorization spec says HTTP-based transports should conform to the MCP authorization rules, while stdio transports should not follow that flow and should retrieve credentials from the environment.
For a protected remote server, the roles are familiar:
The MCP server must implement OAuth 2.0 Protected Resource Metadata so clients can discover the authorization server. Access tokens belong in the Authorization header, not the URI query string. The server must validate that the token was issued specifically for that MCP server as the intended audience.
That audience check is the line between "the user is logged in somewhere" and "this client is allowed to call this server." If the same bearer token can be replayed against multiple internal MCP servers, the boundary is weak. The deeper authorization flow belongs in the MCP authorization production guide, but the build rule is straightforward: no remote endpoint ships until auth discovery, token validation, audience checks, tenant checks, and denial logs are working.
For internal APIs, put authorization in three places:
- At the transport edge: reject unauthenticated HTTP requests before they reach the MCP router.
- At the tool boundary: check user, tenant, role, resource, and action before calling the backing API.
- At the backing service: keep the original API authorization in place so MCP is not the only gate.
The server should log every authorization decision with a trace ID, tool name, subject, tenant, resource, action, allow or deny result, and policy version. The log does not need raw payloads. It needs enough structure to answer: who asked, what did the model request, what did the server permit, what did the backing API do, and where did a human approve.
Ship The Release Gate
The release gate is what turns an MCP tutorial into a production server. Before launch, every tool needs a contract, a threat check, an approval rule, an eval, and an operational owner.
Use this gate for each exposed tool:
The latest tools spec says there should always be a human in the loop with the ability to deny tool invocations, and clients should prompt for confirmation on sensitive operations. Treat that as the minimum. For high-risk actions, add a server-side approval state too. A client prompt protects the user interface. A server approval gate protects the system.
For example, do not let refund_payment call the payment API directly:
server.registerTool(
"request_refund",
{
title: "Request refund",
description: "Create a refund request for human approval.",
inputSchema: {
orderId: z.string().min(8),
reason: z.string().min(10).max(500),
},
outputSchema: {
requestId: z.string(),
status: z.enum(["pending_approval"]),
},
},
async ({ orderId, reason }) => {
const request = await createRefundApprovalRequest({ orderId, reason });
return {
structuredContent: {
requestId: request.id,
status: "pending_approval",
},
content: [
{
type: "text",
text: `Refund request ${request.id} is pending approval.`,
},
],
};
},
);That pattern gives the model a useful next step without handing it direct money movement. The model can explain what happened, the user can approve or deny in a trusted surface, and the audit trail can show exactly which request moved from suggestion to action.
The same gate should catch prompt injection. Treat tool annotations and user-provided instructions as untrusted unless they come from a trusted server. Validate inputs server-side even when the client already validated them. Return clear execution errors for invalid requests, but keep internal policy details out of the model-visible text.
What To Log And Evaluate
Log the lifecycle of the tool call, not just the HTTP request. An MCP server sits between a probabilistic caller and deterministic internal systems, so the useful trace starts before the backing API call and ends after the model receives the result.
Minimum log fields:
trace_idhostmcp_client_idprotocol_versiontool_nametool_versionsubject_idtenant_idresource_idpolicy_versionapproval_stateauthz_resultinput_schema_versionoutput_schema_versionbacking_api_statusis_errorlatency_ms
Keep raw secrets, access tokens, payment details, and unnecessary customer content out of logs. If the support or security team needs payload inspection, store a redacted payload with a retention policy and a break-glass path.
The eval set should be small and sharp at launch:
This is also where cost control belongs. An MCP server usually is not expensive because of the SDK wrapper. It gets expensive when broad tools create long model loops, retries, large structured payloads, or manual review queues with no ownership. Compact outputs, narrow tools, pagination, rate limits, and clear errors reduce both model cost and operational cost.
The Build Order
The clean production order is narrower than most teams expect:
- Pick one internal API domain.
- Split capabilities into tools, resources, and prompts.
- Build a local stdio server with one read-only tool.
- Add structured outputs and execution errors.
- Add policy checks inside the handler.
- Test the server through a real MCP host or Inspector.
- Move to Streamable HTTP only when more than one client needs it.
- Add OAuth discovery, audience validation, tenant checks, and denial logs.
- Add approval gates for every sensitive write.
- Add evals, dashboards, rate limits, and a kill switch.
Stop before adding a generic admin tool. The best MCP servers are boring in exactly the right way: obvious tool names, small schemas, predictable results, explicit approvals, and logs that make incidents reconstructable.
Can I build my own MCP server?
Yes. Start with one internal capability, expose it as a typed MCP tool, and test it locally over stdio before creating a remote endpoint. The production work is not the first handler. It is auth, schemas, approvals, logging, evals, and operations.
How do I start an MCP server?
Start with the official SDK for your stack, define McpServer, register one tool with an input schema, connect it over stdio, and call it from an MCP host or Inspector. Move to Streamable HTTP only after the local contract works.
Can I run an MCP server locally?
Yes. Stdio is the normal local transport. Keep ordinary logs on stderr, not stdout, because stdout carries MCP JSON-RPC messages.
How much does it cost to run an MCP server?
The wrapper code is usually not the cost center. The cost comes from hosting, identity, observability, review queues, retries, broad outputs, and maintenance. Narrow tools and compact structured results keep both model and operations cost under control.
Build an MCP Server
Design and ship a production MCP server with typed tools, auth, approvals, observability, and release gates for your internal APIs.
Jul 1, 2026





