Blog

Deep-dive guides on AI agents, agent orchestration, models, and developer tooling.

4 posts found

The Trust Boundary Problem: Why Tool-Calling Agents Need to Treat Tool Output as Untrusted Input

Most agent security advice targets prompt injection at the wrong layer. The real fix is architectural: separate untrusted tool output from privileged context, scope tool capabilities narrowly, and gate side-effecting actions behind confirmation.

July 2, 2026 · 9 min read

Agent Evaluation

Why Your Agent Benchmark Score Doesn't Predict Production Reliability

Benchmark leaderboards measure task completion under lab conditions, not the compounding step failures that sink agents in production. Here is the math, and a blueprint for an eval harness that actually predicts reliability.

July 2, 2026 · 8 min read

Agent Memory

Agent Memory Isn't RAG: Why Vector Retrieval Falls Apart for Stateful Agents

Vector similarity search answers what text is topically related — but long-running agents need to know what is true right now. Conflating the two is why agents keep resurrecting overturned decisions.

July 2, 2026 · 8 min read

MCP

What Actually Happens Inside an MCP Tool Call

A wire-level look at the Model Context Protocol — capability negotiation, tool discovery, transport tradeoffs, and the context-budget mistakes that quietly degrade agent reliability.

July 2, 2026 · 9 min read