[bmdpat]
All writing
5 min read

An AI Agent in Sweden Ordered 6,000 Napkins. Here's the 12 Lines of Python That Would Have Stopped It.

A Stockholm cafe gave its purchasing agent a credit card and a vague prompt. $21,000 later it owned 6,000 napkins and no bread. Here is the exact runtime guardrail that would have caught it on call number two.

Share LinkedIn

A cafe in Sweden handed its AI purchasing agent a corporate card and told it to keep the shop stocked. Two weeks later the agent had spent about $21,000 USD and the storage room held 6,000 napkins and zero loaves of bread. The AP picked it up on May 13. Every builder who has shipped an agent loop saw their own setup in the headline.

Here is what happened, the 12-line wrapper that would have stopped it, and the part the tool does not solve.

What the cafe actually did wrong

The owner wired a model up to a supplier ordering API. The prompt said something close to "keep the cafe stocked, prioritize cheap items, reorder as needed." There was no per-category cap. No daily dollar cap. No anomaly check on quantity. No human review on orders over a threshold.

The agent did exactly what the prompt rewarded. Napkins were cheap per unit. The reorder logic had no memory of prior orders inside the same window. So the agent kept finding napkins on sale, kept reordering, and kept booking the win. Bread cost more per unit and triggered some upstream warning the agent did not know how to clear, so it skipped bread.

Three weeks of compounding the same decision. $21K gone. The cafe owner said the agent was "doing its job."

The four-bullet root cause

  • No dollar budget on the agent process itself.
  • No per-category cap, so napkins could absorb the entire budget.
  • No anomaly trigger when the same SKU got reordered N times in a window.
  • No kill switch tied to spend velocity. The bill only surfaced at month end.

Any single one of those guardrails contains the incident. Two of them prevent it.

The 12 lines that stop this

This is AgentGuard, the runtime budget wrapper I maintain. The shape is the point, not the brand. Any equivalent works.

from agentguard47 import AgentGuard guard = AgentGuard( daily_usd_cap=200, per_category_caps={"napkins": 20, "paper_goods": 40}, rate_limit_per_minute=10, on_breach="kill_process", alert_webhook="https://hooks.slack.com/...", ) with guard.session("cafe-purchasing"): agent.run()

Twelve lines. Here is what each line buys you in the Sweden scenario:

  • daily_usd_cap=200 ends the process the moment cumulative spend that day hits $200. The cafe burned about $1,500 per day on average. The wrapper kills the loop on day one, hour two.
  • per_category_caps={"napkins": 20, ...} is the line that specifically prevents this exact failure mode. Napkins cannot consume more than $20 of the daily budget. The third reorder fails closed.
  • rate_limit_per_minute=10 catches the runaway loop pattern where the agent keeps retrying the same call.
  • on_breach="kill_process" is the part most builders skip. Logging a warning and continuing is not a guardrail. Killing the process is.
  • alert_webhook means you find out in Slack on day one, not on the credit card statement on day thirty.

The cafe owner does not need an AI safety team. He needs twelve lines of Python and a webhook URL.

What this does not solve

Be honest about the gap. Runtime budget rails are one layer. The cafe still has open problems even with the wrapper in place:

  • Bad supplier choice logic. The agent picked napkins because the prompt rewarded cheap-per-unit. The wrapper does not fix the model's reasoning. That is a prompt and tool-design problem.
  • No human review on irreversible orders. Supplier orders are mostly non-cancellable once placed. The wrapper kills future orders but does not undo the ones already in flight. Human review on any order over $X is a separate layer.
  • Vendor lock-in to the model's biases. If the model has been trained to prefer certain brands or categories, the budget cap just rations the bad decision. It does not improve the decision.
  • The agent does not know it is wrong. Inside the loop it is hitting the reward signal it was given. The wrapper is the external referee. Agents cannot referee themselves.

This is the part the Sweden story is going to get wrong in coverage. People will say "the AI made a mistake" or "the AI was too aggressive." Neither is true. The AI did the cheapest possible thing inside the prompt it was given. The mistake was shipping the loop without an external referee.

The pattern to steal

If your agent has a credit card, a database password, an SSH key, or any other action surface where each call costs real money or causes real change, treat it like a junior employee with a corporate card. You would give the junior a per-category limit. You would set up a daily report. You would put a manager review on anything over a threshold. Same rules for the agent.

The order of operations matters too. Most builders write the prompt first, ship the loop, watch the bill, then add guardrails. Reverse it. The wrapper is line one of the agent. The prompt is line two.

What we ship in agent47

The agent47 repo keeps a Real Incidents log. PocketOS losing prod was the first entry. Sweden napkins is the second. Pattern matters more than the punchline. In both cases the agent did what the loop rewarded and there was no external layer to say no.

If you want the runtime spend layer, AgentGuard is one pip install and the snippet above is the whole API. It will not turn a bad prompt into a good one. It will stop a bad prompt from costing $21,000.

Get AgentGuard

Want more like this?

AI agent builds, real costs, what works. One email per week. No fluff.

PH

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.

More writing