Stop the m×n Spaghetti: An MCP Gateway Architecture You Can Reuse



This content originally appeared on Level Up Coding – Medium and was authored by Parth Saxena

If you have more than a couple of agents talking to more than a couple of tools, the connections multiply fast. Every new integration comes with its own auth glue, error handling, logging, and one-off assumptions. After a few sprints, your whiteboard looks like spaghetti.

An MCP gateway fixes that. Instead of wiring every client to every MCP server, you give teams one doorway. The gateway handles discovery, routing, authentication, sessions, policy, and observability in one place. Clients stay simple. Tools stay independent. Your architecture gets a backbone instead of a ball of yarn.

This post walks through a reusable MCP gateway you can stand up and adapt. It is opinionated where it matters and modular where you will want to customize. The core ideas: a Service Registry that declares what exists, an Authentication layer that can pass through tokens or mint new ones on behalf of the user, a Session Manager that keeps connections warm and isolated, a Circuit Breaker and Rate Limiter for resilience and fairness, and an Audit trail so your security and SRE folks can actually sleep.

The project behind this post is open source and built for collaboration. If this solves a problem you have today, jump in. Star the repo if you want to follow along, open issues for gaps or edge cases you hit, send PRs for docs or features, and pick up a “good first issue” if you want to get your feet wet. Maintainers are active and friendly, and we’re especially excited about contributions around RBAC, observability dashboards, multi-tenant policies, and transport adapters.

The problem: m×n spaghetti

As soon as you have multiple agents talking to multiple MCP servers, the connections explode. Six agents and eight tools is not 6 plus 8, it is 6 × 8 = 48 possible paths to wire, secure, log, and maintain. Each path grows its own tiny rules and secrets. After a few releases, no one is sure which client is using which token or where to look when a tool call stalls.

What breaks

  • Authentication drifts by client: slightly different headers, token lifetimes, and refresh logic.
  • Error handling is inconsistent: some clients retry, others fail fast, a few do both.
  • Observability is scattered: logs live in many places with different formats.
  • Policy is hard to prove: rate limits, audit, and access rules vary from team to team.
  • Changes are expensive: adding a new tool means touching every client that needs it.

What a gateway changes

  • One doorway for all clients to discover services and call tools.
  • Centralized authentication with a single place to validate, pass through, or exchange tokens.
  • Session management that keeps connections warm and scoped to the caller.
  • Uniform resilience with shared circuit breaker and rate limiting.
  • End-to-end audit trail that security and SRE can trust.

High-level architecture

source: https://github.com/codingjam/bridge-mcp/blob/main/docs/Agent_Integration_Guide.md

The pieces (and why they exist)

  • Service Registry (declarative source of truth)
    A simple .yaml config that lists each MCP server, where it lives, which transport it uses (Streammable HTTP or stdio), and what auth strategy applies. This turns “integration” work into configuration and lets you roll out new services without touching client code.
  • Auth & Token Strategy (pass-through or On-Behalf-Of)
    The gateway implements middleware and validates incoming user tokens once, then either forwards them as-is or exchanges them for service-scoped tokens (OBO) per downstream server. Centralizing this logic keeps secrets out of clients and makes audits sane.
  • Session Manager (warm, scoped connections)
    Creates the MCP session, runs the initialize → initialized handshake, and keeps per-caller state isolated. It reuses healthy connections, cleans up stale ones, and knows how to retry without duplicating calls.
  • Transport Layer (HTTP streamable / stdio)
    Speaks the wire protocol to downstream servers. Over HTTP, it can negotiate streamable responses so results arrive incrementally (great for long-running tools or big outputs). If a server only does plain JSON, that still works — no special handling required from clients.
  • Policy & Guardrails (rate limits + circuit breaker)
    Applies per-user / per-service / per-tool limits, returns clean 429 with retry hints, and trips a circuit breaker if a server starts failing. That prevents one flaky service from dragging everything down.
  • Observability & Audit (see what actually happened)
    Emits structured events for auth decisions, tool calls, durations, and errors with correlation IDs. Your security and SRE teams get a single place to search, alert, and prove policy.
  • Minimal Admin Surface (small, purposeful endpoints)
    Service discovery, health checks, and a single JSON-RPC proxy (plus optional explicit session endpoints). Small surface area = easier to secure and reason about.

How a tool call flows (end-to-end)

  1. Client calls the gateway with a normal MCP request.
  2. Auth runs: validate the user, optionally mint OBO token for the target service.
  3. Session Manager finds or creates a session and runs the MCP handshake if needed.
  4. Policy checks: rate limit buckets and circuit breaker state.
  5. Transport forwards the JSON-RPC request to the MCP server (HTTP streamable or stdio).
  6. Streaming (if available): chunks begin flowing back as they’re produced; otherwise a standard JSON body returns when ready.
  7. Audit logs the outcome with timings and correlation IDs; response goes to the client.

Service Registry (temporarily managed via .yaml config)

The gateway shouldn’t require code changes every time you add or tweak a tool. The Service Registry turns “integration” into configuration: you declare each MCP server once, its endpoint, transport, health check, and auth strategy, and the rest of the system picks it up.

# config/services.yaml
services:
example-service:
name: "Example MCP Server"
endpoint: "https://api.example.com"
transport: "http"
health_check_path: "/health"
auth:
strategy: "obo_required"
target_audience: "example-service"
required_scopes: ["mcp:call", "example:read"]
custom_headers:
"X-Service-Name": "mcp-gateway"
"X-Version": "1.0.0"

local-tool:
name: "Local Dev Tool"
transport: "stdio"
command: ["python", "-m", "my_mcp_server"]
working_directory: "./tools/local"
auth:
strategy: "no_auth"

What you declare (and why)

  • Identity & endpoint: a stable service_id, human name/description, and the URL or local command to reach it. Supports both HTTP and stdio so you can mix cloud services with local dev tools.
  • Transport details: transport: "http" (with optional base_path and health_check_path) or transport: "stdio" (with command, working_directory, environment).
  • Auth strategy per service: no_auth, passthrough, obo_preferred, or obo_required, plus audience/scopes and any custom headers. This lets you onboard strict services and legacy ones side-by-side without special client logic.
  • Global knobs: defaults for timeouts, health-check cadence, and discovery flags, so platform teams can set sane baselines once.

This same layout covers hybrids: modern services with OBO tokens, legacy systems with passthrough, and local stdio tools for fast iteration.

Authentication that scales (OIDC + On-Behalf-Of)

If every client hand-rolls auth, drift creeps in: different headers, token lifetimes, and refresh logic. The gateway centralizes all of that. It validates the caller once, decides the right strategy per downstream service, and (when needed) mints a fresh, service-scoped token so tools never see the user’s raw JWT.

The moving parts

  • Middleware + Validator — Intercepts requests, extracts Authorization: Bearer …, and validates the JWT against Keycloak’s (local IDP running in docker) JWKS with claim checks.
  • OBO Token Service — Implements OAuth2 Token Exchange (RFC 8693) to trade the user token for a service-specific token; includes caching and refresh.
  • Authenticated Proxy — Applies the right token when calling each MCP server (either pass-through or OBO), wired through the Service Registry.

One flow for callers, many strategies for services

  1. Client → Gateway: Send a normal request with a Bearer token.
  2. Gateway validates: Signature + claims, builds a UserContext.
  3. Pick strategy per service: NO_AUTH, PASSTHROUGH, OBO_PREFERRED, or OBO_REQUIRED (declared in services.yaml).
  4. If OBO is required/preferred: Exchange the user token for a service token with the right audience/scopes, then call the MCP server.
  5. Done: Response returns; audit and metrics capture the outcome.

Session-aware by design

When you do use OBO, the exchanged token is bound to the session so follow-up tool calls reuse the same, scoped credentials until refresh.

Session Management

source: https://github.com/codingjam/bridge-mcp

Sessions make MCP calls feel snappy and predictable. The gateway creates a session once, performs the MCP handshake, and reuses that warm path for follow-ups, so you aren’t paying connection and auth overhead on every call. Sessions are isolated per client and bound to the right auth context.

What a “session” does here

  • Pools & reuses connections to each MCP server, with proper transport lifecycle (HTTP streamable or stdio).
  • Keeps per-caller isolation (your token, your state) and cleans up resources on close.
  • Guards the init flow so only one thread initializes per (client, server), avoiding thundering herds.

The handshake you must get right

On the first call, the gateway opens a transport, runs initialize() and then sends notifications/initialized, caches the session, and continues with the original request. Status codes matter: initialize() returns 200 + JSON; notifications/initialized is 202 + empty; subsequent methods like tools/list and tools/call return 200 (JSON or stream).

Flow at a glance

A typical connect looks like this: client → /servers/connect → create transport → establish connection → create ClientSession → return session_id. The doc includes a mermaid diagram you can embed.

Session states

Sessions move through CREATING → CONNECTED → (ERROR) → CLOSING → CLOSED, with actions gated per state. This makes retries and cleanup deterministic.

Cleanup & error recovery

Cleanup closes the MCP client, the transport, and any underlying HTTP session, logging errors with useful context. On tool-call failure, the manager marks the session “suspect” and applies recovery logic.

Auth is bound to the session

If a service requires OBO, the exchanged token is stored with the session and refreshed for long-lived use — so downstream tools never see the caller’s raw JWT.

Concurrency without chaos

The manager uses per-client init locks and per-audience refresh locks. That prevents duplicate inits and thundering-herd token refreshes under load.

How agents use it (two styles)

  • Session endpoints: POST /api/v1/mcp/servers/connect → GET /mcp/sessions/{id}/tools → POST /mcp/sessions/{id}/tools/call. Great for long conversations.
  • JSON-RPC proxy: POST /mcp/proxy with {"method":"tools/list|tools/call","server_name":"…"}. Simple and stateless for quick jobs.

Resilience: Circuit breaker + rate limiting

When a downstream server gets flaky or a single client hammers a popular tool, you want graceful degradation, not a pile-up. The gateway builds this in with a circuit breaker (protect you from failing services) and rate limiting (protect you from noisy neighbors). They’re independent and complementary.

Circuit breaker (don’t let one bad server sink the ship)

The breaker watches each service’s health and flips through three states: CLOSED (normal), OPEN (block calls fast), and HALF_OPEN (probe to see if recovery has started). It trips after a configurable number of failures, waits a recovery timeout (with optional exponential backoff), then lets a small test call through. If that succeeds enough times, it closes; if not, it opens again. All of this is per server so one failure doesn’t cascade.

It integrates right where it matters — session creation, tool calls, and heartbeat checks — so the gateway can short-circuit quickly when a backend is sick, and recover automatically when it gets healthy.

What you tune

  • failure_threshold, recovery_timeout, success_threshold, timeout, and optional exponential_backoff with a max cap.
  • Different profiles for critical vs standard services.

How failures look to callers
If the circuit is OPEN, calls fail immediately with a clear error and a retry_after hint—no timeouts, no mystery.

Ops hooks you get for free
Stats and state changes are exposed for dashboards; the heartbeat loop is breaker-aware; and you can add a lightweight health endpoint to surface which circuits are OPEN

Rate limiting (fairness without friction)

Rate limiting sits in middleware and applies before auth and breaker checks. It builds a composite key — user × service × tool — so limits are precise and fair. If a bucket is exhausted, the gateway returns 429 with Retry-After. Default behavior uses a simple fixed window counter and a thread-safe in-memory backend; a Redis backend is documented for scale-out.

How a request is evaluated
A small pipeline extracts context (user ID, service ID from the path, tool name from the JSON body), builds the key, checks/increments the counter, and either lets the call through or returns 429.

What you tune

  • Defaults via env: limit, window seconds, and backend choice.
  • Policy objects for custom per-route or per-tenant rules.
  • Fixed-window behavior and storage layout (key + window).

Breaker + limiter together
The limiter protects shared capacity; the breaker protects against failing dependencies. In practice the limiter runs first, and when a circuit is OPEN, the breaker prevents those requests from touching the limiter at all — two layers, different jobs.

Audit ties it all together
Every decision: allowed, throttled, tripped, or recovered, lands in structured audit events with correlation IDs, so you can query, alert, and explain what happened later (That’s the idea, it is work in progress).

Final thoughts

If your architecture feels like m×n spaghetti, a gateway gives you a single, dependable doorway for every agent and tool. You centralize auth, sessions, policy, and observability. You gain resilience with rate limits and circuit breakers. And with streamable HTTP, long-running tools feel responsive instead of heavy.

This space is moving fast. There are already a few strong open-source MCP efforts in more advanced stages, yet MCP itself is still evolving. That means there is real room to learn, shape patterns, and make an impact early.

If this resonated, jump in:

  • Repo: codingjam/bridge-mcp
  • Clone: git clone https://github.com/codingjam/bridge-mcp
  • First steps: skim the README, then browse the docs/ folder for the Service Registry, Auth, Sessions, Circuit Breaker, and Rate Limiting modules.
  • Contribute: star the repo, open issues for gaps or edge cases, send PRs, or grab a “good first issue.” Docs, tests, and examples are especially welcome.
  • Impact: it is a great way to learn modern agent tooling and contribute patterns others can reuse.

Thanks for reading. If you build on this or have ideas to improve it, I would love to hear from you.


Stop the m×n Spaghetti: An MCP Gateway Architecture You Can Reuse was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding – Medium and was authored by Parth Saxena