Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness

August 29, 2025

This content originally appeared on DEV Community and was authored by Jay @ Designly

Filling the Memory Gap: Building MCPMem to Fix AI Assistant Forgetfulness

How I hacked together a semantic memory system for AI assistants with the Model Context Protocol

The Problem: Assistants With Goldfish Memory

You’ve probably run into this: you’re mid-project, bouncing ideas off Cursor, Claude, or whatever AI assistant you like. After hours of hashing through architecture choices and debugging strategies, you start a fresh session and… everything’s gone.

No history. No context. No sense of continuity. You’re left re-explaining the same project details that should have been “obvious” from earlier conversations.

That’s the context window problem. AI assistants don’t actually remember anything — they just replay what’s in the current conversation buffer. Once that buffer’s gone, so is your context.

Why the Current Fixes Don’t Cut It

Tools try to patch this problem, but none of them really solve it:

File context injection → fine for raw code, useless for design decisions
Project summaries → stale as soon as the code changes
Chat history → bounded by token limits and resets every new session
Manual notes → slow, brittle, not semantic

What we actually need is memory that sticks — and more importantly, memory that understands meaning instead of just matching keywords.

Introducing MCPMem!

That’s why I built it: a Model Context Protocol (MCP) server that gives AI assistants a way to store and retrieve memories semantically.

Why it’s different

Stores and searches by meaning (via OpenAI embeddings)
Persists across sessions (your assistant actually remembers)
MCP-native — integrates with any MCP-capable assistant
Fast vector search via SQLite + sqlite-vec
Minimal setup, works out of the box

It’s basically a lightweight memory layer you can drop in and instantly upgrade your assistant.

Under the Hood

1. Semantic Embeddings

Every memory gets embedded with OpenAI’s text-embedding-3-small, so searches return relevant context even if the words don’t match exactly.

2. SQLite Vector Search

Memories and embeddings live in SQLite with sqlite-vec. Queries come back in milliseconds, even across thousands of entries.

3. MCP Integration

Because it’s an MCP server, assistants can call it directly as part of the flow. Store, search, and retrieve are just standard MCP commands.

How It Feels in Practice

Project knowledge base: store architecture decisions, bug fixes, team agreements — pull them up later with semantic queries.
Learning log: stash notes, patterns, gotchas — search them when you hit similar problems.
Team memory: assistants can keep track of past discussions, design calls, and decisions without rehashing them.

Setup

It’s dead simple:

npm install -g mcpmem
export OPENAI_API_KEY=your-key-here
mcpmem store "Remember: use strict TypeScript mode"
mcpmem search "typescript config"

Or wire it into your MCP config for Cursor/Claude:

{
  "mcpServers": {
    "mcpmem": {
      "command": "npx",
      "args": ["mcpmem"],
      "env": {
        "OPENAI_API_KEY": "sk-svcacct-...",
        "OPENAI_MODEL": "text-embedding-3-small",
        "MCPMEM_DB_PATH": "/Users/johndoe/mcpmem/mcpmem.db"
      }
    }
  }
}