[bmdpat]
All writing
7 min read

AI Agent Memory: What Actually Works in 2026

Most agent memory systems add complexity faster than value. This is the small set that actually compounds for one person running a fleet: files, ledgers, and strict verification.

Share LinkedIn

AI agents forget. They also hallucinate that they remembered. The result is the same: you open a file or check a ledger and the thing the agent swore it wrote is not there.

The fix is not a bigger vector database. It is a tiny set of durable surfaces plus deterministic checks that run on every "done" claim.

The surfaces that survive

Keep memory in plain files the operator can read without a special viewer.

  • Append-only logs (log.md, run-ledger rows, event streams). Newest at top. One line per real event when possible.
  • Knowledge wiki with layers (sources, entities, concepts, syntheses). Write sources once. Touch entities on every new signal.
  • Queue and Reports. The task files and the proof they produced.
  • Published artifacts and frontmatter. The only place that counts is the final output location.

Everything else is cache or scratch. If it is not in one of these, it did not happen for the purpose of later work.

How we actually use it

Every scheduled agent writes to the ledger on start and finish. The think checker reads the ledger's declared artifacts and compares them to disk and DB. No "the agent said it was green."

Vault health and nightshift linters walk the wiki for orphans and contradictions. They repair what they can and surface the rest.

The digest agent writes a source page for each article and links it into 10+ existing pages. That compounds.

When an agent claims it updated a file, the claim is recorded and a later pass asserts the bytes are there.

What fails in practice

Long context windows. The model predicts the next token, not the prior state. After 30 turns it confidently "remembers" a step that was overwritten.

Vector stores without grounding. They return plausible passages. You still have to open the file to know if the patch landed.

Shared mutable state without locks. Two agents write the same report. The second clobbers the first. The operator sees one version and assumes the story is complete.

No TTLs. Old drafts, stale ideas, and resolved requests sit forever and pollute search.

The rules that keep it small

  1. One line per log entry. If it needs more, link to a report.
  2. Sources are append-only. Never edit a past source page.
  3. Every completion that mutates state records a falsifiable claim.
  4. Every claim is re-checked by an independent pass (claims checker, think, doctor).
  5. Drafts in Queue have TTLs. Approved drafts in Review have age gates. Old stuff moves or dies.
  6. The operator can always cat the file and understand the state without running the agent.

When fancy memory is a trap

If you are building multi-agent research loops with long retrieval chains, you may need embeddings and rerankers. For a one-person holdco that ships one artifact per day, the cost in maintenance exceeds the gain.

Files + git history (for code) + the structured event/claim layer + simple date folders beat a custom memory service 95% of the time. You can read them on a plane with no wifi.

What we ship instead

We ship the checker, not the memory bank.

brain think --heal exists because the ledger said the autopublisher ran but the published count was zero. The heal path wrote the post, ran the sub-QA, posted it, and left the proof in Published/ and the live URL.

The memory that matters is the one a script can query at 07:30 and decide whether the thing that should have shipped actually shipped.

Accompanying prompt

What the prompt does: Audits an agent workflow and turns vague memory into plain-file state, append-only events, and checks that prove each completion claim.

Copy-ready prompt

Paste the exact block into your coding agent.

No article chrome, no footnotes, no formatting drift.

You are auditing the memory layer for an AI agent workflow. Your job is to replace vague "the agent remembers" claims with durable state that a human can inspect and a script can verify. Work in five passes: 1. List every place the agent currently stores state. Separate durable files, append-only ledgers, databases, caches, prompts, and scratch output. 2. For each surface, answer: who writes it, who reads it, what proves it is current, and what happens if two agents write at once. 3. Find any completion claim that is not backed by a file, ledger row, API readback, test result, or live URL. 4. Propose the smallest durable memory system that would survive a cold restart: plain files first, append-only events second, database rows only where they are already part of the product path. 5. Write the exact checks that should run before an agent is allowed to say "done." Return: - A table of current memory surfaces and their risk. - A short list of state that should be deleted, archived, or ignored. - The proposed durable memory contract. - The verification commands or scripts that prove the contract. - One example "done" claim rewritten as a falsifiable claim. Constraints: - Prefer files a senior engineer can open and read. - Do not add a vector database unless the workflow has a specific retrieval failure that plain files cannot solve. - Do not trust model output as evidence. - Every important claim needs a deterministic readback.
26 lines1455 chars
Ready

Copy the block above.


See the full set of checks in config/think/check.py and config/claims/check.py. When an agent tells you it is done, make it prove it with a file you can open.

https://bmdpat.com/tools/agentguard

Want more like this?

AI agent builds, real costs, what works. M-F only when there is something worth sending. No fluff.

PH

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.

More writing