deeflect

Universal MCP Server: Two Tools, 56 APIs

How I built a universal MCP server that wraps 56 APIs into just two tools using the OpenAPI Code Mode pattern - cutting token costs 50x.

Universal MCP Server: Two Tools, 56 APIs

I had 56 APIs I needed my agent to talk to. The idea of maintaining 56 separate MCP servers made me want to close my laptop and never open it again.

So I built one server that handles all of them.

That’s the premise behind building a universal MCP server - and specifically, the pattern I implemented in Universal CodeMode: wrap any OpenAPI spec into exactly two tools, search and execute, and let the model figure out the rest. If you’re running agents that touch multiple external services, this is probably the architecture you actually want.

The problem with the current MCP ecosystem

The Model Context Protocol is genuinely useful. It’s the right abstraction for giving agents access to external tools and services. But the ecosystem has a fragmentation problem that nobody’s really talking about.

The current pattern is: one API = one MCP server. Want GitHub integration? Here’s a GitHub MCP server with 30 tools. Want Notion? Another server, another 20 tools. Weather API? Linear? Stripe? Each one is its own server, its own deployment, its own auth config, its own maintenance burden.

I run an agent called borb on my OpenClaw system. It needs to hit GitHub for repo management, search for research, weather for daily digests, and a dozen other services for various tasks. Following the standard pattern, I’d need 56+ separate MCP servers deployed and configured. That’s not a system, that’s a zoo.

The token cost is the other thing. A traditional MCP server for GitHub might expose 30 endpoints as 30 separate tools, each with its full parameter schema described in the context. That’s easily 50K tokens just to tell the model what’s available - before it’s even made a single API call. At scale, that’s insane.

Think about what that means in practice. You have an agent doing a simple task - create a GitHub issue, post a Slack message, look up a weather forecast. Three API calls. But before any of those happen, you’ve burned 150K tokens just describing the available tools across three MCP servers. That’s money out the window for zero productive work. My monthly API spend across the whole OpenClaw system sits around $40. That number would be unrecognizable if I was loading full tool schemas for every session.

What building a universal MCP server actually looks like

The insight I’m building on comes from Cloudflare’s “Code Mode” pattern. Instead of describing every possible tool upfront, you give the model two generic tools that work with any API:

  1. search - natural language query against the OpenAPI spec catalog, returns the relevant endpoint spec (~1000 tokens)
  2. execute - takes a spec chunk and parameters, makes the actual HTTP call

That’s it. Two tools. Any API.

The flow looks like this: agent wants to create a GitHub issue, calls search("create a github issue"), gets back the relevant spec chunk for POST /repos/{owner}/{repo}/issues, then calls execute with the right parameters. The whole thing uses roughly 1000 tokens instead of 50K. That’s a 50x reduction in token usage for a single API call sequence.

The key insight is that the model doesn’t need to know every possible endpoint upfront. It just needs to know it can search for endpoints. The same way you don’t memorize every function in a library - you know how to search the docs.

This also means the catalog can grow without any impact on the model’s working memory. Whether you have 10 APIs or 500, the context overhead is identical: two tool schemas, a few hundred tokens. The model only loads the relevant spec chunk at the moment it needs it.

The stack

Universal CodeMode runs on Cloudflare Workers with R2 for spec storage and KV for caching. The core is TypeScript, using Hono for routing and the MCP TypeScript SDK for the protocol layer.

Here’s the high-level architecture:

// Two tools. That's the whole interface.
server. tool("search", SearchSchema, async ({ query, catalog }) => {
 // Natural language search against indexed OpenAPI specs
 // Returns relevant endpoint chunk, not the full spec
 const results = await searchCatalog(query, catalog);
 return { content: [{ type: "text", text: formatResults(results) }] };
});

server. tool("execute", ExecuteSchema, async ({ spec, params, auth }) => {
 // Takes the spec chunk from search, builds and fires the HTTP request
 const response = await executeRequest(spec, params, auth);
 return { content: [{ type: "text", text: JSON. stringify(response) }] };
});

R2 stores the raw OpenAPI specs. When a spec is ingested, it gets indexed so the search tool can find relevant endpoints by natural language. KV handles caching so repeated searches on the same endpoints don’t keep hitting the index.

The catalog currently has 56 pre-loaded API specs. Adding a new API is just ingesting its OpenAPI spec via the admin endpoint - the search and execute tools work automatically because they’re operating on the spec structure, not hardcoded tool definitions.

Why Cloudflare Workers specifically

I could’ve run this on a VPS or a Lambda function. I went with Workers for three reasons.

First, edge deployment means low latency from wherever the agent is running. An agent mid-task waiting on an API lookup is a bad experience - every millisecond of overhead compounds across a multi-step workflow.

Second, R2 and KV are native integrations. No external database config, no connection pooling, no cold start issues with a separate storage layer. The spec storage and caching are just Workers primitives.

Third, the GlobalOutbound security model fits perfectly for this use case. I can declare exactly which domains the Worker is allowed to call outbound - which is exactly the security property I want for a server that executes arbitrary API calls on behalf of agents.

Security model

Running arbitrary API calls through a single server sounds like a security nightmare. Here’s how I handled it.

Security model

GlobalOutbound restrictions on the Cloudflare Worker mean the server can only make outbound requests to explicitly allowlisted domains. You can’t use execute to hit evil. example.com.

Admin token authentication gates the catalog management endpoints. Ingesting or deleting specs requires the admin token. The two main tools (search and execute) are accessible to agents without admin rights.

Execution timeouts on every outbound request. An agent can’t hang the server by calling a slow endpoint.

Auth handling in execute is explicit - credentials get passed as parameters, not stored server-side in the MVP. There’s a planned hosted version where you’d configure auth per-catalog-entry, but for now, the agent passes credentials and they’re used once then discarded.

Is this perfect? No. But it’s a real security model, not vibes.

The domain allowlist is probably the most important piece. A compromised prompt injection attack that tries to exfiltrate data to an external server fails at the network level, not just the application level. Defense in depth.

Self-hosted mode

Not everyone wants to run on Cloudflare. The project supports a self-hosted mode via npx:

npx universal-codemode --port 3000

Point it at your OpenAPI specs, configure your MCP client, done. The Worker version is the “cloud native” path, the npx version is for local dev or running on your own infra.

Config looks like this:

{
 "mcpServers": {
 "universal-codemode": {
 "command": "npx",
 "args": ["universal-codemode"],
 "env": {
 "CATALOG_PATH": "./specs",
 "ADMIN_TOKEN": "your-token-here"
 }
 }
 }
}

That’s your entire MCP configuration. One entry. Covers every API in your catalog.

Compare that to what the equivalent config looks like with the standard one-server-per-API approach. If you’re running five integrations, you have five entries in your MCP config, five different sets of env vars, five different deployment concerns. Something breaks and you’re debugging which of the five servers is the problem. With this setup, there’s exactly one thing to look at.

Test coverage

13/13 E2E tests passing against real APIs - GitHub, JSONPlaceholder, and httpbin. The test suite covers the full search-then-execute flow, auth parameter handling, error responses, and the catalog management endpoints.

$ npm test

 search returns relevant endpoint for natural language query
 search handles queries with no matching endpoints
 execute calls GitHub API with correct parameters
 execute handles 404 responses gracefully
 execute respects timeout configuration
 catalog ingestion processes valid OpenAPI spec
 catalog ingestion rejects invalid spec
 admin endpoints reject requests without valid token... (13/13 passing)

Real tests against real endpoints, not mocks. If the GitHub API’s behavior changes, the tests catch it.

Testing against real APIs instead of mocks is a deliberate choice. Mocks give you confidence that your code does what you think it does. Real endpoint tests give you confidence that your code actually works. For infrastructure that agents depend on at runtime, I want the second kind of confidence. The tradeoff is that tests can fail for reasons outside my control - rate limits, API downtime, auth token expiry. That’s fine. Flaky tests that catch real issues are better than reliable tests that don’t.

Where this fits in the universal MCP server landscape

There are a few other projects trying to solve the “too many MCP servers” problem. Most of them are building aggregators - one entry point that proxies to multiple underlying servers. That doesn’t solve the token problem, it just reduces the config burden.

Where this fits in the universal MCP server landscape

The Code Mode pattern is different because it changes the interface. Instead of the model needing to know about github_create_issue and github_list_repos and github_get_pull_request as separate tools, it just knows about search and execute. The catalog can grow to 200 APIs and the model’s tool interface stays exactly the same size.

OpenAPI as the common format is doing a lot of work here. Virtually every serious API publishes an OpenAPI spec now. That means the ingestion pipeline is universal - you’re not writing custom parsers for each API.

There’s also a maintenance angle worth thinking about. When Stripe updates their API, a traditional MCP server for Stripe needs to be updated and redeployed. With this approach, you ingest the updated OpenAPI spec via the admin endpoint and you’re done. The search and execute tools don’t change. The model’s interface doesn’t change. One operation, propagated everywhere.

Current status and what’s next

MVP is deployed on Cloudflare Workers. The landing page is embedded in the Worker itself (Hono handles the HTML route). Repo is at deeflect/universal-codemode - MIT licensed, free to use.

What’s still in progress:

  • Custom domain (cm. dee. ad is planned)
  • Real-world testing with Claude Code and Cursor agents, not just the E2E test suite
  • Seeding more API specs into the catalog - 56 is a start but the long tail of useful APIs is way bigger
  • Per-catalog auth configuration for the hosted version so agents don’t need to pass credentials explicitly

The honest status: this is a working MVP with real test coverage, not a demo. But it hasn’t been battle-tested in production with real agent workloads at scale yet. That’s the next phase.

The spec seeding problem is actually interesting. There are thousands of APIs with published OpenAPI specs - the APIs. guru directory catalogs over 2,000 of them. Bulk-ingesting from that kind of source is on the roadmap. The architecture already handles it - it’s a data pipeline problem, not an architectural one.

Why two tools is the right number

There’s a version of this project where I expose five tools - search, preview_spec, execute, list_catalogs, check_health. I went back and forth on it.

Two is correct. Here’s why.

Every tool you add to an MCP server is cognitive overhead for the model. The model has to decide which tool to use before it uses it. With search and execute, the decision tree is: do I know exactly which endpoint to call? No, search first. Yes, execute directly. That’s it.

More tools means more opportunities for the model to pick the wrong one, more tokens describing the tool schemas, more edge cases in your implementation. The constraint of two tools forced me to make each tool more capable rather than adding escape hatches.

This is the same reason Unix pipes work. A small number of composable primitives beats a large number of specialized commands every time.

I tested a three-tool version with an explicit preview_spec step between search and execute. In theory it lets the model inspect a full spec before committing to an execute call. In practice, the model just used search and execute 95% of the time anyway and the extra tool added noise to every session context. Cut it.

The lesson: when you’re designing a tool interface for models, err toward fewer, more capable tools. Models are good at using tools creatively. They’re less good at picking between tools when the distinctions are subtle. Make the distinctions obvious by minimizing the surface area.

Building this as part of a larger agent stack

Universal CodeMode is one piece of my broader setup. If you’re curious about how the agent system it connects to actually works - borb, OpenClaw, the cron-based orchestration layer - check out the 31 Rust CLI tools for agents for background, or browse the why prompt engineering died for more on how I think about building agent systems.

The short version: I run multi-agent workflows where different models handle different job types. Having one MCP server that gives all of them access to 56+ APIs without per-model tool configuration changes is the kind of infrastructure win that compounds. Less config, lower token costs, one place to update when an API spec changes.

Every agent in the system gets the same two tools. Sonnet, Codex, Gemini Flash, Opus - they all connect to the same universal server and they all work identically. When I add a new API to the catalog, every agent can use it immediately. Zero redeployment, zero config changes, zero new tool descriptions to fit in context.

That compounding is the real argument for this pattern. Each individual win - fewer tokens, less config, one deployment - is incremental. All of them together, across every agent, across every session, across every API call, adds up to something that materially changes what I can run sustainably as a solo builder.

If you’re building anything in this space - agents that need to talk to external services, MCP server implementations, OpenAPI tooling - the repo is open. Issues and PRs welcome. I’m particularly interested in feedback from people running this with Claude Code or Cursor in real projects.

The pattern works. The implementation is early. Both of those things can be true.