AI Agent Memory: What Actually Works in 2026
Most agent memory systems add complexity faster than value. This is the small set that actually compounds for one person running a fleet: files, ledgers, and strict verification.
AI agents forget. They also hallucinate that they remembered. The result is the same: you open a file or check a ledger and the thing the agent swore it wrote is not there.
The fix is not a bigger vector database. It is a tiny set of durable surfaces plus deterministic checks that run on every "done" claim.
The surfaces that survive
Keep memory in plain files the operator can read without a special viewer.
- Append-only logs (log.md, run-ledger rows, event streams). Newest at top. One line per real event when possible.
- Knowledge wiki with layers (sources, entities, concepts, syntheses). Write sources once. Touch entities on every new signal.
- Queue and Reports. The task files and the proof they produced.
- Published artifacts and frontmatter. The only place that counts is the final output location.
Everything else is cache or scratch. If it is not in one of these, it did not happen for the purpose of later work.
How we actually use it
Every scheduled agent writes to the ledger on start and finish. The think checker reads the ledger's declared artifacts and compares them to disk and DB. No "the agent said it was green."
Vault health and nightshift linters walk the wiki for orphans and contradictions. They repair what they can and surface the rest.
The digest agent writes a source page for each article and links it into 10+ existing pages. That compounds.
When an agent claims it updated a file, the claim is recorded and a later pass asserts the bytes are there.
What fails in practice
Long context windows. The model predicts the next token, not the prior state. After 30 turns it confidently "remembers" a step that was overwritten.
Vector stores without grounding. They return plausible passages. You still have to open the file to know if the patch landed.
Shared mutable state without locks. Two agents write the same report. The second clobbers the first. The operator sees one version and assumes the story is complete.
No TTLs. Old drafts, stale ideas, and resolved requests sit forever and pollute search.
The rules that keep it small
- One line per log entry. If it needs more, link to a report.
- Sources are append-only. Never edit a past source page.
- Every completion that mutates state records a falsifiable claim.
- Every claim is re-checked by an independent pass (claims checker, think, doctor).
- Drafts in Queue have TTLs. Approved drafts in Review have age gates. Old stuff moves or dies.
- The operator can always cat the file and understand the state without running the agent.
When fancy memory is a trap
If you are building multi-agent research loops with long retrieval chains, you may need embeddings and rerankers. For a one-person holdco that ships one artifact per day, the cost in maintenance exceeds the gain.
Files + git history (for code) + the structured event/claim layer + simple date folders beat a custom memory service 95% of the time. You can read them on a plane with no wifi.
What we ship instead
We ship the checker, not the memory bank.
brain think --heal exists because the ledger said the autopublisher ran but the published count was zero. The heal path wrote the post, ran the sub-QA, posted it, and left the proof in Published/ and the live URL.
The memory that matters is the one a script can query at 07:30 and decide whether the thing that should have shipped actually shipped.
Accompanying prompt
What the prompt does: Audits an agent workflow and turns vague memory into plain-file state, append-only events, and checks that prove each completion claim.
Copy-ready prompt
Paste the exact block into your coding agent.
No article chrome, no footnotes, no formatting drift.
Copy the block above.
See the full set of checks in config/think/check.py and config/claims/check.py. When an agent tells you it is done, make it prove it with a file you can open.
Want more like this?
AI agent builds, real costs, what works. M-F only when there is something worth sending. No fluff.
Patrick Hughes
Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.
More writing
- 4 min
A self-healing system can't heal an empty queue
Automated recovery only fixes a broken machine. When the real failure is an empty queue, retrying does nothing forever. Two failures, one red box, opposite repairs.
- 4 min
Missing AI agent cost data is not zero
A spend ledger that counts missing billing data as $0 hides exactly the unattended agent spend you built it to catch.
- 5 min
What Salesforce's 20,000 AI Agent Deployments Teach a Solo Builder
Salesforce shipped roughly 20,000 Agentforce deployments and found 90% of agent work happens after launch. Here is what that means for a solo builder running a small agent fleet.