AI Agent Cost Control: How AgentGuard Stops Runaway LLM Spend
AI agents that run unsupervised can burn through your API budget in minutes. Here's how I built AgentGuard — a Python SDK that enforces hard cost limits at runtime — and why every autonomous agent needs one.
AI Agent Cost Control: How AgentGuard Stops Runaway LLM Spend
I ran 100 ML experiments overnight with an autonomous AI agent. It worked — 25% model performance gain, zero human intervention. But before I got comfortable leaving agents running unsupervised, I had to solve one problem first: what happens when an agent loops, hallucinates tool calls, or just keeps going when it should have stopped?
The answer, without guardrails, is a large API bill and a lot of regret.
So I built AgentGuard — a Python SDK that enforces budget, token, time, and rate limits on AI agents at runtime. This post covers the problem it solves, how it works, and how to drop it into your own agents in about four lines of code.
The Real Problem: Agents Don't Know When to Stop
Monitoring tools like LangSmith and Langfuse are useful. They tell you what happened. But they can't stop anything mid-run. By the time you get a cost alert from your dashboard, the damage is done.
The failure modes are predictable:
- Loop traps: An agent keeps calling the same tool because each response is slightly different and the termination condition never triggers
- Hallucinated tool chains: The agent invents multi-step plans that require dozens of API calls to execute
- Runaway research tasks: A research agent finds one more source, then another, then another — with no concept of diminishing returns
- Clock drift: A task that should take 2 minutes runs for 45 because something upstream is slow
None of these are bugs in the traditional sense. They're emergent behaviors from giving a language model agency. The fix isn't better prompting — it's enforcement at the infrastructure level.
How AgentGuard Works
AgentGuard installs as a lightweight wrapper around your existing LLM calls. It tracks cost, tokens, time, and tool call patterns in real time, and kills the agent the moment any limit is breached.
Install it:
pip install agentguard47
Drop it into an existing OpenAI agent in four lines:
from agentguard import Tracer, BudgetGuard, patch_openai tracer = Tracer(guards=[BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)]) patch_openai(tracer) # From here, all OpenAI calls are tracked and enforced automatically
That's it. Your existing agent code doesn't change. AgentGuard patches the OpenAI client and intercepts every call. When the cumulative cost hits $4.00 (80% of the limit), it fires a warning callback. At $5.00, it raises a BudgetExceededError and terminates the run.
The Full Guard Suite
A dollar budget isn't always the right constraint. AgentGuard ships five guard types:
BudgetGuard — Hard dollar and token limits with configurable warning thresholds. Supports per-model pricing for OpenAI, Anthropic, Google, Mistral, and Meta out of the box.
BudgetGuard(max_cost_usd=10.00, max_tokens=100_000, warn_at_pct=0.75)
LoopGuard — Detects exact repeated tool calls. Useful when an agent is stuck calling the same function with the same arguments.
LoopGuard(max_repeats=3)
FuzzyLoopGuard — Detects similar (not identical) patterns. Better for agents that slightly vary their inputs but are functionally stuck.
FuzzyLoopGuard(max_tool_repeats=5)
TimeoutGuard — Wall-clock enforcement. If the agent hasn't finished in N seconds, it's terminated.
TimeoutGuard(max_seconds=300)
RateLimitGuard — Caps calls per minute. Useful for shared environments or when you're working within upstream API rate limits.
RateLimitGuard(max_calls_per_minute=60)
Combine them:
tracer = Tracer(guards=[ BudgetGuard(max_cost_usd=5.00), LoopGuard(max_repeats=3), TimeoutGuard(max_seconds=600), ])
Framework Support
Most real agents aren't built with raw OpenAI calls. AgentGuard integrates with the frameworks people actually use:
- LangChain:
AgentGuardCallbackHandlerplugs into the standard callback interface - LangGraph:
@guarded_nodedecorator wraps individual nodes - CrewAI:
AgentGuardCrewHandlervia step callbacks - Direct patching:
patch_openai()andpatch_anthropic()for everything else
Tracing and Evaluation
Every run generates a JSONL trace file with full event history, span data, and cost attribution. You can run assertions against it after the fact:
from agentguard import EvalSuite EvalSuite("traces.jsonl") \ .assert_no_loops() \ .assert_budget_under(tokens=50_000) \ .assert_completes_within(seconds=30) \ .run()
This is useful for testing agent behavior before deploying to production, or for building regression tests after you've tuned a prompt.
Why Not Just Use LangSmith / Langfuse?
Observability tools are necessary but not sufficient. They show you the trace after execution. AgentGuard acts during execution. It's the difference between a security camera and a deadbolt.
A few other differences:
| Feature | LangSmith / Langfuse | AgentGuard |
|---|---|---|
| Cost monitoring | ✅ | ✅ |
| Hard budget enforcement | ❌ | ✅ |
| Kill switch mid-run | ❌ | ✅ |
| Loop detection | ❌ | ✅ |
| Zero external dependencies | ❌ | ✅ |
| Self-hosted | Partial | ✅ |
AgentGuard isn't an alternative to observability — it's what you add when you move from development to unsupervised production runs.
The Overnight Experiment Context
This library came directly from the autonomous ML research agent I described in this post. When you're running 100 experiments with zero human intervention, you need to know that if the agent enters a degenerate loop at 2am, it's going to stop itself — not run until morning and hand you a $800 API bill.
AgentGuard is how I made that guarantee. It's MIT licensed, has 93% test coverage, and zero runtime dependencies. If you're building anything that runs unsupervised, it belongs in your stack.
Install: pip install agentguard47
Repo: github.com/bmdhodl/agent47
Putting It Into Production
The setup I use for serious overnight runs:
from agentguard import Tracer, BudgetGuard, LoopGuard, TimeoutGuard, patch_anthropic import logging def on_warning(event): logging.warning(f"AgentGuard warning: {event}") tracer = Tracer( guards=[ BudgetGuard(max_cost_usd=20.00, warn_at_pct=0.8, on_warn=on_warning), LoopGuard(max_repeats=4), TimeoutGuard(max_seconds=3600), ], trace_file="run_traces.jsonl" ) patch_anthropic(tracer)
That's a $20 hard cap, a warning at $16, loop detection, and a 1-hour timeout. If any of those trigger, the agent stops cleanly and I get a trace file explaining exactly what happened.
If you're building autonomous agents and want them running safely in production, get in touch. This is the exact kind of infrastructure work I help with.
Ready to automate?
I build AI agents and automated workflows. Async delivery. No meetings. Flat rate.
Start a ProjectGet new posts delivered to your inbox
No spam. Unsubscribe anytime.
More from the blog
OpenClaw Has 250K GitHub Stars — But Should Your Business Actually Use It?
OpenClaw is the hottest open-source AI agent tool in 2026. But there's a gap between cool demo and production business automation. Here's when OpenClaw makes sense — and when you need something custom.
How I Let an AI Agent Run 100 ML Experiments Overnight on a $500 GPU
I built an autonomous research agent that proposes neural network changes, trains models, evaluates results, and iterates — all while I sleep. Here's what happened.
Multi-Agent AI Systems for Business: What They Are and When You Actually Need One
Single AI agents hit a ceiling fast. Multi-agent systems let specialized agents collaborate on complex workflows — here's how they work, when they make sense, and how to build one without a six-figure budget.