[bmdpat]
All writing
5 min read

57-71% of AI Agents Leak Data Between Users. Here's the Fix.

A 2026 Mem0 survey found 57-71% cross-user memory contamination across major agent frameworks. Here is why it happens and how to stop it.

Share LinkedIn

If you run agents for more than one user, your memory layer is probably leaking.

The Mem0 2026 agent memory survey looked at eight agent frameworks: Claude Code, Codex, Copilot, OpenClaw, Hermes, Bedrock AgentCore, Windsurf, and Devin. It measured cross-user memory contamination at 57 to 71 percent across the group. That is not a rounding error. That is the default behavior.

What cross-user contamination actually means

Contamination is simple to state. Memory written while serving user A gets recalled into user B's context.

Your agent stores something for one person. Later it answers a different person and pulls that stored memory back in. The model never knew the two requests came from different humans. The memory layer did not tell it.

In a single-user toy, this is invisible. You are user A and user B. Once you have two real accounts, the boundary matters, and most setups do not enforce one.

Why keyword retrieval fails here

Most of the surveyed frameworks recall memory with keyword retrieval. You store text, you search text, you get fuzzy matches back. There is no principal attached to the memory and no principal check at recall time.

So a query from user B matches a memory written by user A on topic overlap alone. The retrieval has no idea those two users should never share state. It just returns the closest text.

The survey also found weak staleness handling everywhere. No consistent TTL. No eviction when the user or session changes. No cryptographic scoping. Old memory lingers and crosses boundaries it should never cross.

Three ways this bites you

Style bleed is the mild case. User B's answers start sounding like user A because tone notes carried over. Annoying, not dangerous.

PII leak is the real one. User A's email, address, or order details surface in user B's session because a keyword matched. Now you have a privacy incident.

Credential bleed is the worst. An API key or token stored as "context" during one session gets recalled into another. Decision contamination is the quiet sibling: a choice made for one account silently steers the agent for another.

The fix patterns

None of these are exotic. They are the boring controls the surveyed frameworks skip.

Per-user namespaces. Every memory write carries a user id, and every read filters on it. No id, no recall. This is the single highest-value change.

Recall-time principal check. Do not trust the namespace alone. At recall, assert that the requesting principal owns the memory before it enters context. Treat a mismatch as a hard stop, not a warning.

TTL and staleness rules. Give memory an expiry. Evict on session or user change. Stale memory is the fuel for most leaks.

Vector partitioning. If you use embeddings, partition the index by user instead of one shared pool with a metadata filter bolted on after the search. Isolation at the storage layer beats filtering after the fact.

Where runtime enforcement fits

Here is the pattern worth keeping. Memory isolation is enforcement at the recall surface. The framework decides what state an agent is allowed to see.

The action surface needs the same thing. When an agent calls tools, spends tokens, or hits an API, something has to enforce what it is allowed to do, per user and per session. If the framework already leaks state across users at the memory layer, you cannot assume it guards the action layer either.

That is the gap AgentGuard fills. It is an open-source runtime control layer for AI agents: budget caps, token limits, and rate limits enforced at call time. Same enforcement idea as per-user memory scoping, applied to what the agent does instead of what it remembers.

The point is that enforcement belongs in the runtime, not in good intentions. When the underlying framework leaks state, a runtime hook is the right place to draw the boundary, because it sees every call regardless of how the memory layer behaves.

Start here

Audit your own stack first. For every agent serving more than one user, ask three questions. Does every memory write carry a user id? Does recall check the requesting principal? Does anything expire?

If the answer to any of those is no, you are likely inside that 57 to 71 percent. The fix is namespaces, principal checks, and TTLs at the memory layer, plus runtime enforcement on the action side.

Lock down the action surface with budget and scope controls at runtime. Start with AgentGuard: https://bmdpat.com/tools/agentguard

Want more like this?

AI agent builds, real costs, what works. M-F only when there is something worth sending. No fluff.

PH

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.

More writing