An MCP server is a user interface for an AI agent. And most MCP servers are bad products, for the same reason most early websites were bad products: nobody thought about the user.
Table of Contents
- MCP servers need agent-first design
- Five rules for building MCP servers that work
- Stop converting your REST APIs to MCP
- Server composition for large APIs
- The protocol keeps evolving
Jeremiah Lowin, the creator of FastMCP, made this argument in a recent conference talk. FastMCP is downloaded close to a million times per day and, by Lowin’s count, some version of the library powers 70% of all MCP servers across all languages. The advice he shared boils down to one uncomfortable truth: if you auto-converted a REST API into an MCP server, you built a bad product.
The rest of this post distills his five rules for building MCP servers that actually work.
MCP servers need agent-first design
The instinct from years of building REST APIs is to expose atomic operations: get_user, list_orders, filter_by_status, check_delivery. Great API design. Terrible MCP design.
Agents are fundamentally different from humans on three dimensions:
Discovery is expensive. You open API docs, scan them once, and remember the three endpoints you need. An agent downloads every tool description on every single connection. 200 tools means 200 schemas in the context window before the agent reads a single word of the user’s request.
Iteration is slow. Your script chains three API calls in a loop and finishes in milliseconds. An agent makes three round trips through an LLM, each time sending the entire conversation history. Every extra call is expensive.
Context is limited. You have years of experience and external memory. The agent has its context window and nothing else. A 200K token window sounds big until tool schemas fill the entire space.
Five rules for building MCP servers that work
1. Outcomes, not operations
Do not expose get_user, list_orders, filter_by_status as three separate tools. Expose one: track_latest_order(email: str). The API calls still happen inside the tool. We are moving complexity to where a deterministic program handles the work instead of a probabilistic LLM.
One tool equals one agent story. Not a user story. An agent story. Something an autonomous program with limited context is trying to achieve. Design top-down from the workflow, not bottom-up from your API endpoints.
@mcp.tool
def track_latest_order(email: str) -> str:
"""Get the current status of a customer's most recent order.
Returns the order ID, item summary, and delivery status."""
user = api.get_user(email)
orders = api.list_orders(user.id, limit=1)
status = api.get_order_status(orders[0].id)
return f"Order {orders[0].id}: {status.state}, ETA: {status.eta}"
The three API calls happen, but the agent makes one tool call instead of three. Deterministic, fast, debuggable.
2. Flatten your arguments
Do not pass a configuration dictionary. Do not use nested objects. Use top-level primitives.
# Bad: the LLM has to invent a complex object
@mcp.tool
def search_orders(config: dict) -> str:
"""Search orders using configuration."""
...
# Good: clear, flat, constrained
@mcp.tool
def search_orders(
email: str,
status: Literal["pending", "shipped", "delivered", "cancelled"],
include_cancelled: bool = False,
limit: int = 10
) -> str:
"""Search for customer orders by email and status."""
...
Use Literal types or enums whenever you have a constrained set of choices. Most LLMs do not know Literal syntax is supported, but the constraint works and prevents the agent from guessing invalid values.
Avoid tightly coupled arguments where valid values for one argument depend on what you chose for another. Coupled arguments confuse agents.
3. Instructions are context
If you do not document your MCP server, the agent will guess how to use the server. The agent will try. The agent will confuse itself, and those wrong guesses will pollute the conversation history.
Write detailed docstrings. Document every tool. Give examples, but be careful: examples are contracts. If your example shows a list with two items, the agent will almost always return two items. The agent treats the structure of your example as a template, not just a demonstration.
Errors are prompts. When a tool fails, the error message goes straight into the agent’s context as information. A cryptic ValueError or an empty error teaches the agent nothing. A helpful error message tells the agent exactly how to recover:
raise ValueError(
"Invalid status 'active'. Valid options: pending, shipped, delivered, cancelled"
)
4. Respect the token budget
Consider a real case from the talk: a company tries to expose 800 API endpoints as MCP tools. Divide a 200K token context window by 800 and each tool gets 250 tokens for its name, schema, and documentation combined. After the handshake, the agent has zero tokens left for actual work.
The practical limit is about 50 tools per agent, not per server, per agent. If an agent connects to three servers with 20 tools each, the total is 60 tools and performance degrades.
The GitHub MCP server has dozens of tools organized into toolsets. The GitHub team uses dynamic toolset discovery to make that work. Unless you are prepared to invest at the same level, keep your tool count low.
5. Curate ruthlessly
Start with what works, then tear the server down to the essentials.
The engineering instinct is to add, never remove. v2 is backwards compatible, so we keep the old endpoints. Backwards compatibility works for REST APIs. Backwards compatibility does not work for MCP. Every tool you add costs context tokens on every single agent interaction.
Stop converting your REST APIs to MCP
The most common violation of all five rules is also one of FastMCP’s most popular features: the OpenAPI-to-MCP converter. Jeremiah Lowin’s advice is simple: use the converter to bootstrap, not to ship. Mine is even simpler: never use the converter.
Auto-converting a REST API into an MCP server produces hundreds of atomic operations, complex nested arguments, no curation, and a bloated token budget. Auto-conversion is the fastest way to build a server that technically works but fails in practice.
Server composition for large APIs
If you genuinely need many tools, FastMCP supports server composition. Instead of one monolithic server, split the API into focused sub-servers:
orders_mcp = FastMCP("Orders")
users_mcp = FastMCP("Users")
analytics_mcp = FastMCP("Analytics")
main = FastMCP("Main")
main.mount("orders", orders_mcp)
main.mount("users", users_mcp)
main.mount("analytics", analytics_mcp)
Mounting creates namespaced prefixes. The agent sees tools like orders/track_latest_order and analytics/revenue_summary. Composition makes it possible to connect only the relevant sub-server to a specific agent workflow, keeping the tool count per agent low.
The protocol keeps evolving
FastMCP 3.0 introduced a new architecture built around three concepts: Providers (where components come from), Transforms (middleware for the component pipeline), and Components (tools, resources, prompts). The decorator API stays the same. @mcp.tool still works. Everything underneath is rebuilt for extensibility.
FastMCP 3.1 added Code Mode, where the agent writes code that calls multiple tools in sequence instead of calling tools one at a time. Code Mode is a promising approach, though running agent-generated code introduces sandboxing concerns.
On the protocol side, the MCP spec is evolving: async background tasks, streamable HTTP transport (replacing SSE), OAuth support, and an official server registry. Anthropic donated MCP to the Agentic AI Foundation, a directed fund under the Linux Foundation, in December 2025. MCP is here to stay.
The biggest shift is conceptual. We need to stop thinking about MCP servers as infrastructure and start thinking about MCP servers as context products: interfaces designed for AI agents with the same care we put into designing interfaces for humans.
Go From AI Janitor to AI Architect
Stop debugging unpredictable AI systems. I can help you build, measure, and deploy reliable, production-grade AI applications that don't hallucinate.
Message me on LinkedIn