Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness



This content originally appeared on DEV Community and was authored by Jay @ Designly

Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness

How I hacked together a semantic memory system for AI assistants with the Model Context Protocol

The Problem: Assistants With Goldfish Memory

You’ve probably run into this: you’re mid-project, bouncing ideas off Cursor, Claude, or whatever AI assistant you like. After hours of hashing through architecture choices and debugging strategies, you start a fresh session and… everything’s gone.

No history. No context. No sense of continuity. You’re left re-explaining the same project details that should have been “obvious” from earlier conversations.

That’s the context window problem. AI assistants don’t actually remember anything — they just replay what’s in the current conversation buffer. Once that buffer’s gone, so is your context.

Why the Current Fixes Don’t Cut It

Tools try to patch this problem, but none of them really solve it:

  • File context injection → fine for raw code, useless for design decisions
  • Project summaries → stale as soon as the code changes
  • Chat history → bounded by token limits and resets every new session
  • Manual notes → slow, brittle, not semantic

What we actually need is memory that sticks — and more importantly, memory that understands meaning instead of just matching keywords.

Introducing MCPMem!

That’s why I built it: a Model Context Protocol (MCP) server that gives AI assistants a way to store and retrieve memories semantically.

Why it’s different

  • Stores and searches by meaning (via OpenAI embeddings)
  • Persists across sessions (your assistant actually remembers)
  • MCP-native — integrates with any MCP-capable assistant
  • Fast vector search via SQLite + sqlite-vec
  • Minimal setup, works out of the box

It’s basically a lightweight memory layer you can drop in and instantly upgrade your assistant.

Under the Hood

1. Semantic Embeddings

Every memory gets embedded with OpenAI’s text-embedding-3-small, so searches return relevant context even if the words don’t match exactly.

2. SQLite Vector Search

Memories and embeddings live in SQLite with sqlite-vec. Queries come back in milliseconds, even across thousands of entries.

3. MCP Integration

Because it’s an MCP server, assistants can call it directly as part of the flow. Store, search, and retrieve are just standard MCP commands.

How It Feels in Practice

  • Project knowledge base: store architecture decisions, bug fixes, team agreements — pull them up later with semantic queries.
  • Learning log: stash notes, patterns, gotchas — search them when you hit similar problems.
  • Team memory: assistants can keep track of past discussions, design calls, and decisions without rehashing them.

Setup

It’s dead simple:

npm install -g mcpmem
export OPENAI_API_KEY=your-key-here
mcpmem store "Remember: use strict TypeScript mode"
mcpmem search "typescript config"

Or wire it into your MCP config for Cursor/Claude:

{
  "mcpServers": {
    "mcpmem": {
      "command": "npx",
      "args": ["mcpmem"],
      "env": {
        "OPENAI_API_KEY": "sk-svcacct-...",
        "OPENAI_MODEL": "text-embedding-3-small",
        "MCPMEM_DB_PATH": "/Users/johndoe/mcpmem/mcpmem.db"
      }
    }
  }
}

Cursor MCP Example

What Changed for Me

Before MCPMem

  • Explaining the same context repeatedly
  • Losing details between sessions
  • Wasting time writing notes I’d never search

After MCPMem

  • My assistant remembers context across chats
  • Semantic search brings back the right info fast
  • Project knowledge actually compounds over time

Lessons Learned

  • Semantic > keyword search. It feels like cheating once you’ve used it.
  • MCP is a surprisingly clean way to extend assistants.
  • You don’t need Pinecone or a massive vector DB — SQLite does just fine.
  • UX trumps everything. If memory isn’t seamless, you won’t use it.

Roadmap

  • Local embedding generation (no API calls needed)
  • Memory clustering and tagging
  • Import/export for team knowledge bases
  • Multi-modal memory (code, docs, images)

If you’re sick of AI tools that reset every conversation, MCPMem gives them something closer to real memory.

👉 GitHub repo
👉 NPM

Thank you for reading! Please visit my portfolio site when you have some free time!

https://yaa.bz

Also, read my blog:

https://blog.designly.biz

I post regular articles about full-stack development and systems administration.


This content originally appeared on DEV Community and was authored by Jay @ Designly