A wire-level look at the Model Context Protocol — capability negotiation, tool discovery, transport tradeoffs, and the context-budget mistakes that quietly degrade agent reliability.
Every "AI agent calls a tool" demo hides the same uncomfortable detail: the model doesn't call anything. It emits text that looks like a function call, and a runtime somewhere else turns that text into an actual side effect. The Model Context Protocol (MCP) is Anthropic's attempt to standardize that "somewhere else" — the boundary between a language model's intent and a program's execution. It shipped in late 2024 and by mid-2026 has become the closest thing the industry has to a common wire format for agent tool use, with implementations from Anthropic, OpenAI-compatible clients, and a long tail of third-party servers.
Most write-ups on MCP stop at "it's like a USB-C port for AI apps." That metaphor is fine for a slide, but it tells you nothing about why a well-behaved MCP server behaves differently from a sloppy one, why your agent's context window fills up before it does anything useful, or why two servers that both "support MCP" can produce wildly different agent reliability. This post goes one layer down: the actual JSON-RPC messages, the capability negotiation handshake, and the design decisions that determine whether an MCP integration is a delight or a liability.
A REST API assumes both sides already know the contract — you read the OpenAPI spec once, write your client, and you're done. MCP can't assume that, because the client is often an LLM host (Claude Code, Claude Desktop, a custom agent runtime) that has never seen this particular server before and needs to discover, at runtime, what the server can do and how expensive it is to ask.
That changes the shape of the protocol. Instead of one request type, MCP defines three:
All three ride on JSON-RPC 2.0, which is the part people skim past but which actually matters: JSON-RPC gives you request/response correlation via id, a distinction between notifications (no response expected) and requests (response required), and a standard error shape. MCP needs all of that because a single session involves dozens of back-and-forth messages, not one request-response pair.
Before any tool is called, client and server perform capability negotiation. This is the part that's easy to skip when you're just wiring up a demo, and the part that breaks production integrations when you don't.
// Client → Server
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "2025-06-18",
"capabilities": { "roots": { "listChanged": true }, "sampling": {} },
"clientInfo": { "name": "claude-code", "version": "2.1.0" }
}
}
// Server → Client
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"protocolVersion": "2025-06-18",
"capabilities": { "tools": { "listChanged": true }, "resources": {} },
"serverInfo": { "name": "utilix-mcp", "version": "1.4.0" }
}
}
Two things happen here that determine everything downstream. First, protocol version negotiation — if the client speaks a newer spec than the server, they fall back to the highest version both understand, or the connection fails cleanly instead of sending malformed requests. Second, capability advertisement — the server tells the client which of the optional protocol features it actually implements (does it support notifying the client when its tool list changes? does it want to receive "roots" telling it which directories are in scope?).
Only after this exchange does the client send tools/list to enumerate what's callable. This two-step process — negotiate, then discover — is why MCP servers can evolve their capabilities over time without breaking older clients, and why a well-implemented client should never hard-code assumptions about what a server exposes.
Here's the part that matters if you're an engineer building or consuming MCP servers rather than just following a tutorial: every tool you expose costs context window on every single turn, whether the model uses it or not.
When a client connects to N servers, it typically calls tools/list on each, concatenates the results, and injects the full set of tool definitions — name, description, JSON Schema for parameters — into the system context before the model generates a single token. A server with 40 tools, each with a verbose multi-paragraph description "for the model's benefit," can burn 15-20K tokens before the conversation starts. That's not a hypothetical; it's the single most common complaint from teams that connect many MCP servers to one agent and then wonder why quality drops.
This produces a genuine design tension:
| Design choice | Benefit | Cost |
|---|---|---|
Many narrow tools (e.g. get_user, list_orders, refund_order) | Clear intent, easy for the model to pick correctly | Large tool list, high fixed context cost per turn |
Few broad tools (e.g. one crm_query tool with a mode parameter) | Small context footprint | Model must learn an internal mini-DSL, more room for malformed calls |
| Rich per-tool descriptions with examples | Fewer wrong-tool selections, less retry overhead | Directly multiplies token cost by tool count |
| Terse descriptions, rely on naming | Cheap | Ambiguous tools get picked incorrectly or ignored |
Dynamic tool lists (listChanged notifications, scoped by context) | Only relevant tools loaded per session | Requires server-side session state and client support |
There's no universally correct point on this table — it depends on how many servers a given agent connects to simultaneously. A single-purpose coding agent with one filesystem server can afford verbose tool descriptions. A general-purpose assistant connected to fifteen MCP servers (which is increasingly normal in 2026 agent stacks) cannot, and needs either aggressive tool curation or servers that support scoped/dynamic tool exposure.
The practical implication: when you write an MCP server, the tool description isn't documentation, it's a prompt. Every word is competing for the same attention budget as the user's actual request.
Once negotiation and discovery are done, the actual call is almost anticlimactic — which is the point. Complexity should live in the handshake and schema, not in the hot path.
// Client → Server
{
"jsonrpc": "2.0",
"id": 42,
"method": "tools/call",
"params": {
"name": "check_ssl_certificate",
"arguments": { "hostname": "example.com" }
}
}
// Server → Client
{
"jsonrpc": "2.0",
"id": 42,
"result": {
"content": [
{ "type": "text", "text": "Certificate valid until 2026-11-02, issued by Let's Encrypt, SAN matches." }
],
"isError": false
}
}
Note that isError lives inside a successful JSON-RPC response, not as a transport-level failure. This is deliberate: a tool that fails in an expected way (invalid hostname, upstream timeout) is a normal outcome the model should see and reason about, not an exception that crashes the session. Reserve actual JSON-RPC errors for protocol-level problems — unknown method, malformed params, server crashed — and use isError: true with a descriptive text content block for domain failures. Servers that conflate these two failure modes produce agents that either retry forever on unrecoverable errors or give up on transient ones.
The following diagram shows the full lifecycle a client goes through with a single server, from connection to steady-state tool calling:
MCP originally shipped with two transports — stdio and HTTP+SSE — and the spec has since consolidated toward "Streamable HTTP" as the recommended remote transport, with stdio remaining the default for local processes. This isn't a minor implementation detail; it determines what kind of server you can build.
| Transport | Where it runs | Session model | Good for | Weak point |
|---|---|---|---|---|
| stdio | Local subprocess spawned by the client | One process per session, lifetime-bound | CLI tools, filesystem access, local dev servers | Can't be shared across users or scaled horizontally |
| HTTP + SSE (legacy) | Remote server, persistent connection | Long-lived SSE stream per session | Early remote MCP servers | Two endpoints to manage, proxies/load balancers often mishandle long-lived SSE |
| Streamable HTTP | Remote server, stateless-capable | Single POST endpoint, optional SSE upgrade for streaming | Multi-tenant remote servers, serverless deployments | Requires careful handling of resumability if you want stream recovery |
If you're building a remote MCP server today — say, one that exposes REST-style utilities like SSL checks, JSON formatting, or email validation as callable tools — Streamable HTTP is the right default: it degrades gracefully to plain request/response when streaming isn't needed, and it doesn't require pinning a client to a specific long-lived connection, which matters the moment you put a load balancer in front of it. A tool server like this behaves the same way any other MCP server does from the agent's perspective — it announces a handful of tools during tools/list, returns structured content blocks, and lives or dies by how precisely its schemas constrain bad input. There's nothing special about it architecturally; the interesting engineering is entirely in the negotiation and schema layer described above, not in the tool's own logic.
Three failure modes show up repeatedly once you've debugged enough MCP integrations:
https:// and a path, and the server chokes downstream. The fix isn't a better prompt — it's tightening the schema ("pattern", "format") so invalid input never reaches your handler.listChanged to swap in relevant subsets rather than exposing everything up front.isError as optional. Servers that throw raw exceptions instead of returning a structured error content block cause the client to surface a generic "tool failed" message with no detail, which the model can't reason about or recover from.MCP's real contribution isn't the transport or the JSON-RPC envelope — those are unremarkable by design. It's that it forces a discovery-and-negotiation step in front of every tool call, which means the burden of "does this agent know what it can do" moves from static prompt engineering into a runtime protocol both sides can evolve independently. If you're building a server, spend your design budget on two things: schemas tight enough that malformed calls can't reach your code, and tool descriptions short enough that they don't quietly eat the context budget of every other server the agent has connected. Everything else in the spec exists to support those two constraints.