Solving response Token 25k limit Wall: Introducing mcp-cache



This content originally appeared on DEV Community and was authored by Swapnil Surdi

I’ve been working with Claude and MCP servers extensively—building web automation, analyzing codebases, automating testing workflows. But I kept hitting the same frustrating wall:

Error: Response exceeds maximum allowed tokens (25,000)

The Problem

Modern applications generate massive responses:

  • Web page DOMs: 1.3MB+ (154K tokens)
  • GitHub PR diffs: 36K tokens (44% over limit)
  • Figma exports: 351K tokens (1,300% over)

Every time I asked Claude to analyze a real web page. Not because the AI couldn’t handle it—because MCP had a hard ceiling at 25,000 tokens.

The Real-World Impact

Looking at GitHub issues across popular MCP servers, I found hundreds of developers facing identical problems:

  • Chrome MCP: “screenshot always gives ‘exceeds maximum tokens’ error”
  • GitHub MCP: “get_pull_request_diff fails for any substantial PR”
  • Playwright MCP: “DOM content returns ‘Conversation Too Long’ error”

The pattern was clear: MCP works beautifully for toy examples. Breaks on real-world complexity.

The Solution: mcp-cache

I built mcp-cache—a universal response manager that wraps any MCP server and solves the token limit automatically.

How it works:

Claude Desktop
    ↓
mcp-cache (transparent proxy)
├─ Intercepts large responses
├─ Caches full data locally
├─ Returns summary + query tools
└─ AI searches cached data on demand
    ↓
Target MCP Server (unchanged)

Before mcp-cache:

→ "Get the DOM and find payment forms"
❌ Error: Response exceeds maximum length

After mcp-cache:

→ "Get the DOM and find payment forms"
✅ Cached as resp_xyz (1.2MB)
→ "Show forms with 'payment' in action"
✅ Found 3 forms

Zero Configuration

The best part? It’s completely transparent:

# Instead of:
npx @playwright/mcp@latest

# Just add mcp-cache:
npx @hapus/mcp-cache npx @playwright/mcp@latest

That’s it. No server modifications. No client changes.

Works with ANY MCP server:

  • ✅ Playwright, Chrome, GitHub, Filesystem
  • ✅ Python, Node.js, Go, Rust servers
  • ✅ Your custom MCP servers

Real Results

Since integrating mcp-cache:

E-Commerce Testing:

  • ✅ Full accessibility trees cached (was: 250K token errors)
  • ✅ AI queries specific elements from 1.2MB+ responses
  • ✅ Complex multi-page flows automated successfully

Performance:

  • ⚡ <10ms overhead for normal responses
  • ⚡ <200ms for cached queries
  • ⚡ 90%+ cache hit rate

What’s Next

Current: Local file-based caching
Coming: Redis-backed distributed caching for teams
Vision: Vector embeddings + semantic search

Imagine:

  • 🏢 Organization-wide shared cache
  • 🔍 Semantic search: “Find pages similar to our checkout flow”
  • 📊 Compliance audit trails
  • 🧠 Knowledge graphs from cached responses

Key Technical Highlights

Client-Aware Intelligence:

  • Auto-detects client (Claude Desktop, Cursor, Cline)
  • Adjusts token limits accordingly
  • No manual configuration needed

Powerful Query Interface:

// Text search
query_response('resp_id', 'submit button')

// JSONPath for structured data
query_response('resp_id', '$.div[?(@.class=="navbar")]')

// Regex patterns
query_response('resp_id', '/href=".*\\.pdf"/')

Try It Today

npm install -g @hapus/mcp-cache

# Or use directly:
npx @hapus/mcp-cache <your-server-command>

Links:

Looking For

✅ Testers – Try it with your MCP workflows
✅ Feedback – What features would help you most?
✅ Contributors – Interested in building Redis/vector DB layers?
✅ Use cases – What are you trying to automate?

This started as a side project to scratch my own itch. Now I’m hoping it helps others facing the same problem.


This content originally appeared on DEV Community and was authored by Swapnil Surdi