▸ Opinion

MCP is the real unlock. Most teams are using it wrong.

QuantAimLabs Nov 5, 2025 6 min read

For the past year, the conversation has been dominated by which agent framework to pick. We think that's the wrong fight. The thing that actually matters is whether your tools are exposed properly — which is what MCP solves. And most teams are botching it.

The framework debate is a distraction

LangGraph, LlamaIndex, AutoGen, OpenAI Agents SDK, Anthropic's hosted agents — every quarter brings a new shiny thing. We've used most of them. They're all fine. None of them are the bottleneck for your AI roadmap.

What is the bottleneck: the model can't do anything useful unless it can call tools, and your tools have to expose themselves in a way the model can reliably reason about. That's what Model Context Protocol does. It's a boring, OpenAPI-shaped contract. And boring is exactly what we needed.

Three patterns we keep seeing fail

1. Stuffing too many tools into one MCP server

We've seen MCP servers exposing 60+ tools across loosely related domains: "search jira", "deploy", "run-sql", "post-to-slack", "create-figma-comment". The model doesn't know which one to pick because the surface is too broad and the descriptions blur together.

Fix: One MCP server per logical domain. The agent connects to several. Tool selection becomes a routing problem the model is genuinely good at solving — when the choices are bounded.

2. Treating MCP servers like internal scripts, not services

"It's just a tool wrapper" — and then it has no auth, no rate limiting, no observability, no idempotency. The first time someone hooks it up to a long-running agent, things get exciting.

Fix: Every MCP server is a production service. OAuth scopes per consumer. Per-call audit log. Per-tool rate limits. Idempotency keys for any side-effecting tool. We treat MCP servers exactly like we'd treat an internal API — because that's what they are.

3. No eval coverage on the tool layer

Teams have evals on the model output. Almost nobody has evals on whether the agent picks the right tool given a request. That's where the regressions hide. A prompt change shifts tool-selection probabilities, downstream behavior changes, and your output evals don't catch it because the final answer still looks plausible.

Fix: Tool-selection evals. For each canonical request, assert which tool sequence the agent should call. Run on every change. Cheap, deterministic, catches an entire class of silent regressions.

What good looks like

The teams getting the most out of MCP have boring, well-shaped server inventories:

Each server exposes 3–10 tools in a tight domain.
Each tool has a precise description that distinguishes it from siblings.
Servers run as standard services with auth, logs, metrics, and SLOs.
Tool calls are observable in traces alongside the LLM call that triggered them.
Both output evals and tool-selection evals run in CI.

That's it. None of it is exotic. All of it gets skipped in the rush to ship the demo.

Where this is heading

Our prediction: in 12 months, MCP server inventories will be a first-class engineering artifact, the same way we treat OpenAPI specs today. They'll be versioned, reviewed, and capacity-planned. The teams that build that discipline now will have a portfolio of agentic capabilities to reuse. The teams that don't will keep starting over.

If you're standing up MCP servers in production and want a second pair of eyes on the design, we do that. Our MCP server scaffold is being open-sourced on GitHub — follow the org to catch the release.