The CrewAI demo worked. Then the tool call retried 913 times.
The demo worked. Then the same CrewAI tool call retried until the run became an operator problem.
Here is the agent failure mode nobody shows in the demo.
The CrewAI flow works on the first run. The agents are named well. The tasks are clear. The tool call returns the right data.
Then production input changes.
The vendor API returns a 429. The search tool returns the same empty page. The file tool cannot find the path.
The agent does what agents do.
It tries again.
Why this gets expensive
One retry is fine. Ten retries might be fine.
Nine hundred retries is not a bug report. It is a bill.
The problem is not CrewAI. The problem is shipping an autonomous loop without runtime limits.
If the agent can retry, the runtime needs to know:
- How many times did this action repeat?
- Did the tool input actually change?
- Is cost rising faster than expected?
- Has a human approved this path?
- Should the run stop now?
What I want to see live
For a CrewAI workflow, I want a simple map:
Crew -> agent -> task -> tool call -> retry -> budget -> alert -> kill state.
Not a wall of spans. Not a giant trace log.
A control map.
When a tool repeats, I want it obvious. When spend crosses a cap, I want it obvious. When a kill switch is armed, I want it obvious.
That is the difference between watching an agent and operating one.
The synthetic failure
Example:
Agent: vendor research agent. Task: enrich a vendor before contract review. Tool: company search API.
The API starts returning 429. The agent keeps asking the same question. The retries produce no new data. Cost rises. No one gets an alert.
That is exactly the kind of run that should stop itself.
What good looks like
Good does not mean the agent never fails.
Good means the failure is bounded.
- Retry count is capped.
- Budget burn is capped.
- Alert delivery is visible.
- A human can kill the run.
- The incident is retained.
That is what a client should see before trusting an agent with real work.
Want more like this?
AI agent builds, real costs, what works. M-F only when there is something worth sending. No fluff.
Patrick Hughes
Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.
More writing
- 4 min
Your AI agent does not need observability. It needs a kill switch.
A trace tells you what happened. A kill switch changes what happens next.
- 4 min
Before you ship an AI agent for a client, prove these 5 controls.
Before you ship an AI agent for a client, prove budget caps, loop detection, alert proof, remote kill, and retained incident history.
- 6 min
How to Close the AI Agent Cost Gap at the Call Site
The cost gap between what an AI agent could cost and what it does cost is 40%. You close it at the call site, not in a dashboard. Here is how.