← Back to blog
MCP·July 2, 2026·9 min read

What Actually Happens Inside an MCP Tool Call

A wire-level look at the Model Context Protocol — capability negotiation, tool discovery, transport tradeoffs, and the context-budget mistakes that quietly degrade agent reliability.

Every "AI agent calls a tool" demo hides the same uncomfortable detail: the model doesn't call anything. It emits text that looks like a function call, and a runtime somewhere else turns that text into an actual side effect. The Model Context Protocol (MCP) is Anthropic's attempt to standardize that "somewhere else" — the boundary between a language model's intent and a program's execution. It shipped in late 2024 and by mid-2026 has become the closest thing the industry has to a common wire format for agent tool use, with implementations from Anthropic, OpenAI-compatible clients, and a long tail of third-party servers.

Most write-ups on MCP stop at "it's like a USB-C port for AI apps." That metaphor is fine for a slide, but it tells you nothing about why a well-behaved MCP server behaves differently from a sloppy one, why your agent's context window fills up before it does anything useful, or why two servers that both "support MCP" can produce wildly different agent reliability. This post goes one layer down: the actual JSON-RPC messages, the capability negotiation handshake, and the design decisions that determine whether an MCP integration is a delight or a liability.

The core insight: MCP is not an API, it's a negotiation protocol

A REST API assumes both sides already know the contract — you read the OpenAPI spec once, write your client, and you're done. MCP can't assume that, because the client is often an LLM host (Claude Code, Claude Desktop, a custom agent runtime) that has never seen this particular server before and needs to discover, at runtime, what the server can do and how expensive it is to ask.

That changes the shape of the protocol. Instead of one request type, MCP defines three:

All three ride on JSON-RPC 2.0, which is the part people skim past but which actually matters: JSON-RPC gives you request/response correlation via id, a distinction between notifications (no response expected) and requests (response required), and a standard error shape. MCP needs all of that because a single session involves dozens of back-and-forth messages, not one request-response pair.

The handshake nobody talks about

Before any tool is called, client and server perform capability negotiation. This is the part that's easy to skip when you're just wiring up a demo, and the part that breaks production integrations when you don't.

// Client → Server
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "initialize",
  "params": {
    "protocolVersion": "2025-06-18",
    "capabilities": { "roots": { "listChanged": true }, "sampling": {} },
    "clientInfo": { "name": "claude-code", "version": "2.1.0" }
  }
}
// Server → Client
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "protocolVersion": "2025-06-18",
    "capabilities": { "tools": { "listChanged": true }, "resources": {} },
    "serverInfo": { "name": "utilix-mcp", "version": "1.4.0" }
  }
}

Two things happen here that determine everything downstream. First, protocol version negotiation — if the client speaks a newer spec than the server, they fall back to the highest version both understand, or the connection fails cleanly instead of sending malformed requests. Second, capability advertisement — the server tells the client which of the optional protocol features it actually implements (does it support notifying the client when its tool list changes? does it want to receive "roots" telling it which directories are in scope?).

Only after this exchange does the client send tools/list to enumerate what's callable. This two-step process — negotiate, then discover — is why MCP servers can evolve their capabilities over time without breaking older clients, and why a well-implemented client should never hard-code assumptions about what a server exposes.

Where the token budget actually goes

Here's the part that matters if you're an engineer building or consuming MCP servers rather than just following a tutorial: every tool you expose costs context window on every single turn, whether the model uses it or not.

When a client connects to N servers, it typically calls tools/list on each, concatenates the results, and injects the full set of tool definitions — name, description, JSON Schema for parameters — into the system context before the model generates a single token. A server with 40 tools, each with a verbose multi-paragraph description "for the model's benefit," can burn 15-20K tokens before the conversation starts. That's not a hypothetical; it's the single most common complaint from teams that connect many MCP servers to one agent and then wonder why quality drops.

This produces a genuine design tension:

Design choiceBenefitCost
Many narrow tools (e.g. get_user, list_orders, refund_order)Clear intent, easy for the model to pick correctlyLarge tool list, high fixed context cost per turn
Few broad tools (e.g. one crm_query tool with a mode parameter)Small context footprintModel must learn an internal mini-DSL, more room for malformed calls
Rich per-tool descriptions with examplesFewer wrong-tool selections, less retry overheadDirectly multiplies token cost by tool count
Terse descriptions, rely on namingCheapAmbiguous tools get picked incorrectly or ignored
Dynamic tool lists (listChanged notifications, scoped by context)Only relevant tools loaded per sessionRequires server-side session state and client support

There's no universally correct point on this table — it depends on how many servers a given agent connects to simultaneously. A single-purpose coding agent with one filesystem server can afford verbose tool descriptions. A general-purpose assistant connected to fifteen MCP servers (which is increasingly normal in 2026 agent stacks) cannot, and needs either aggressive tool curation or servers that support scoped/dynamic tool exposure.

The practical implication: when you write an MCP server, the tool description isn't documentation, it's a prompt. Every word is competing for the same attention budget as the user's actual request.

Anatomy of a tool call

Once negotiation and discovery are done, the actual call is almost anticlimactic — which is the point. Complexity should live in the handshake and schema, not in the hot path.

// Client → Server
{
  "jsonrpc": "2.0",
  "id": 42,
  "method": "tools/call",
  "params": {
    "name": "check_ssl_certificate",
    "arguments": { "hostname": "example.com" }
  }
}
// Server → Client
{
  "jsonrpc": "2.0",
  "id": 42,
  "result": {
    "content": [
      { "type": "text", "text": "Certificate valid until 2026-11-02, issued by Let's Encrypt, SAN matches." }
    ],
    "isError": false
  }
}

Note that isError lives inside a successful JSON-RPC response, not as a transport-level failure. This is deliberate: a tool that fails in an expected way (invalid hostname, upstream timeout) is a normal outcome the model should see and reason about, not an exception that crashes the session. Reserve actual JSON-RPC errors for protocol-level problems — unknown method, malformed params, server crashed — and use isError: true with a descriptive text content block for domain failures. Servers that conflate these two failure modes produce agents that either retry forever on unrecoverable errors or give up on transient ones.

The following diagram shows the full lifecycle a client goes through with a single server, from connection to steady-state tool calling:

Client (Agent Host) MCP Server initialize (capabilities) result (server capabilities) tools/list [tool schemas] model reasons, decides to call a tool tools/call { name, arguments } exec result { content, isError } model incorporates result, continues

Transport is not a footnote

MCP originally shipped with two transports — stdio and HTTP+SSE — and the spec has since consolidated toward "Streamable HTTP" as the recommended remote transport, with stdio remaining the default for local processes. This isn't a minor implementation detail; it determines what kind of server you can build.

TransportWhere it runsSession modelGood forWeak point
stdioLocal subprocess spawned by the clientOne process per session, lifetime-boundCLI tools, filesystem access, local dev serversCan't be shared across users or scaled horizontally
HTTP + SSE (legacy)Remote server, persistent connectionLong-lived SSE stream per sessionEarly remote MCP serversTwo endpoints to manage, proxies/load balancers often mishandle long-lived SSE
Streamable HTTPRemote server, stateless-capableSingle POST endpoint, optional SSE upgrade for streamingMulti-tenant remote servers, serverless deploymentsRequires careful handling of resumability if you want stream recovery

If you're building a remote MCP server today — say, one that exposes REST-style utilities like SSL checks, JSON formatting, or email validation as callable tools — Streamable HTTP is the right default: it degrades gracefully to plain request/response when streaming isn't needed, and it doesn't require pinning a client to a specific long-lived connection, which matters the moment you put a load balancer in front of it. A tool server like this behaves the same way any other MCP server does from the agent's perspective — it announces a handful of tools during tools/list, returns structured content blocks, and lives or dies by how precisely its schemas constrain bad input. There's nothing special about it architecturally; the interesting engineering is entirely in the negotiation and schema layer described above, not in the tool's own logic.

Where this breaks in practice

Three failure modes show up repeatedly once you've debugged enough MCP integrations:

  1. Schema drift between description and validation. The tool description says "hostname, e.g. example.com" but the JSON Schema doesn't restrict the type, so the model occasionally passes a full URL with https:// and a path, and the server chokes downstream. The fix isn't a better prompt — it's tightening the schema ("pattern", "format") so invalid input never reaches your handler.
  2. Silent tool list truncation. Some clients cap the number of tools they'll load across all connected servers. If you're the 6th server connected and you expose 30 tools, some may simply never appear to the model, with no error surfaced. Keep tool counts per server lean and rely on listChanged to swap in relevant subsets rather than exposing everything up front.
  3. Treating isError as optional. Servers that throw raw exceptions instead of returning a structured error content block cause the client to surface a generic "tool failed" message with no detail, which the model can't reason about or recover from.

Takeaway

MCP's real contribution isn't the transport or the JSON-RPC envelope — those are unremarkable by design. It's that it forces a discovery-and-negotiation step in front of every tool call, which means the burden of "does this agent know what it can do" moves from static prompt engineering into a runtime protocol both sides can evolve independently. If you're building a server, spend your design budget on two things: schemas tight enough that malformed calls can't reach your code, and tool descriptions short enough that they don't quietly eat the context budget of every other server the agent has connected. Everything else in the spec exists to support those two constraints.

#mcp#model-context-protocol#ai-agents#tool-calling#json-rpc

Related reading

Agent Evaluation
Why Your Agent Benchmark Score Doesn't Predict Production Reliability
Agent Memory
Agent Memory Isn't RAG: Why Vector Retrieval Falls Apart for Stateful Agents