The CrewAI demo worked. Then the tool call retried 913 times.
The demo worked. Then the same CrewAI tool call retried until the run became an operator problem.
Here is the agent failure mode nobody shows in the demo.
The CrewAI flow works on the first run. The agents are named well. The tasks are clear. The tool call returns the right data.
Then production input changes.
The vendor API returns a 429. The search tool returns the same empty page. The file tool cannot find the path.
The agent does what agents do.
It tries again.
Why this gets expensive
One retry is fine. Ten retries might be fine.
Nine hundred retries is not a bug report. It is a bill.
The problem is not CrewAI. The problem is shipping an autonomous loop without runtime limits.
If the agent can retry, the runtime needs to know:
- How many times did this action repeat?
- Did the tool input actually change?
- Is cost rising faster than expected?
- Has a human approved this path?
- Should the run stop now?
What I want to see live
For a CrewAI workflow, I want a simple map:
Crew -> agent -> task -> tool call -> retry -> budget -> alert -> kill state.
Not a wall of spans. Not a giant trace log.
A control map.
When a tool repeats, I want it obvious. When spend crosses a cap, I want it obvious. When a kill switch is armed, I want it obvious.
That is the difference between watching an agent and operating one.
The synthetic failure
Example:
Agent: vendor research agent. Task: enrich a vendor before contract review. Tool: company search API.
The API starts returning 429. The agent keeps asking the same question. The retries produce no new data. Cost rises. No one gets an alert.
That is exactly the kind of run that should stop itself.
What good looks like
Good does not mean the agent never fails.
Good means the failure is bounded.
- Retry count is capped.
- Budget burn is capped.
- Alert delivery is visible.
- A human can kill the run.
- The incident is retained.
That is what a client should see before trusting an agent with real work.
Patrick Hughes
Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.
Want more like this?
AI agent builds, real costs, what works. One email per week. No fluff.
More writing
- 4 min
Your AI agent does not need observability. It needs a kill switch.
A trace tells you what happened. A kill switch changes what happens next.
- 4 min
Before you ship an AI agent for a client, prove these 5 controls.
Before you ship an AI agent for a client, prove budget caps, loop detection, alert proof, remote kill, and retained incident history.
- 4 min
OpenAI's guardrails don't control costs. Here's the gap.
OpenAI shipped guardrails in the Agents SDK last month. They validate behavior. They do not enforce spend. Here is the gap and how to close it.