This content originally appeared on DEV Community and was authored by Swapnil Surdi
I’ve been working with Claude and MCP servers extensively—building web automation, analyzing codebases, automating testing workflows. But I kept hitting the same frustrating wall:
Error: Response exceeds maximum allowed tokens (25,000)
The Problem
Modern applications generate massive responses:
- Web page DOMs: 1.3MB+ (154K tokens)
- GitHub PR diffs: 36K tokens (44% over limit)
- Figma exports: 351K tokens (1,300% over)
Every time I asked Claude to analyze a real web page. Not because the AI couldn’t handle it—because MCP had a hard ceiling at 25,000 tokens.
The Real-World Impact
Looking at GitHub issues across popular MCP servers, I found hundreds of developers facing identical problems:
- Chrome MCP: “screenshot always gives ‘exceeds maximum tokens’ error”
- GitHub MCP: “get_pull_request_diff fails for any substantial PR”
- Playwright MCP: “DOM content returns ‘Conversation Too Long’ error”
The pattern was clear: MCP works beautifully for toy examples. Breaks on real-world complexity.
The Solution: mcp-cache
I built mcp-cache—a universal response manager that wraps any MCP server and solves the token limit automatically.
How it works:
Claude Desktop
↓
mcp-cache (transparent proxy)
├─ Intercepts large responses
├─ Caches full data locally
├─ Returns summary + query tools
└─ AI searches cached data on demand
↓
Target MCP Server (unchanged)
Before mcp-cache:
→ "Get the DOM and find payment forms"
❌ Error: Response exceeds maximum length
After mcp-cache:
→ "Get the DOM and find payment forms"
✅ Cached as resp_xyz (1.2MB)
→ "Show forms with 'payment' in action"
✅ Found 3 forms
Zero Configuration
The best part? It’s completely transparent:
# Instead of:
npx @playwright/mcp@latest
# Just add mcp-cache:
npx @hapus/mcp-cache npx @playwright/mcp@latest
That’s it. No server modifications. No client changes.
Works with ANY MCP server:
Playwright, Chrome, GitHub, Filesystem
Python, Node.js, Go, Rust servers
Your custom MCP servers
Real Results
Since integrating mcp-cache:
E-Commerce Testing:
Full accessibility trees cached (was: 250K token errors)
AI queries specific elements from 1.2MB+ responses
Complex multi-page flows automated successfully
Performance:
<10ms overhead for normal responses
<200ms for cached queries
90%+ cache hit rate
What’s Next
Current: Local file-based caching
Coming: Redis-backed distributed caching for teams
Vision: Vector embeddings + semantic search
Imagine:
Organization-wide shared cache
Semantic search: “Find pages similar to our checkout flow”
Compliance audit trails
Knowledge graphs from cached responses
Key Technical Highlights
Client-Aware Intelligence:
- Auto-detects client (Claude Desktop, Cursor, Cline)
- Adjusts token limits accordingly
- No manual configuration needed
Powerful Query Interface:
// Text search
query_response('resp_id', 'submit button')
// JSONPath for structured data
query_response('resp_id', '$.div[?(@.class=="navbar")]')
// Regex patterns
query_response('resp_id', '/href=".*\\.pdf"/')
Try It Today
npm install -g @hapus/mcp-cache
# Or use directly:
npx @hapus/mcp-cache <your-server-command>
Links:
GitHub: https://github.com/swapnilsurdi/mcp-cache
npm: @hapus/mcp-cache
Looking For
Testers – Try it with your MCP workflows
Feedback – What features would help you most?
Contributors – Interested in building Redis/vector DB layers?
Use cases – What are you trying to automate?
This started as a side project to scratch my own itch. Now I’m hoping it helps others facing the same problem.
This content originally appeared on DEV Community and was authored by Swapnil Surdi