How do you control AI agent cost in Python?

Wrap model and tool calls with budgets, rate limits, timeouts, retry caps, and loop detection. The controls must run where the calls happen.

What does AgentGuard do for Python agents?

AgentGuard adds runtime guardrails for Python agents, including budget caps, loop detection, timeouts, and rate limits around agent work.

All writing

March 25, 20268 min read

Stop Runaway LLM Spend: AI Agent Cost Control (Python)

One bad loop and an AI agent burned $200 in minutes. AgentGuard is a Python SDK that enforces hard cost limits at runtime — here is how to ship it.

#AI Agents #Python #Cost Control #AgentGuard #Runtime Enforcement #Open Source

Share LinkedIn

AI Agent Cost Control: How AgentGuard Stops Runaway LLM Spend

I ran 100 ML experiments overnight with an autonomous AI agent. It worked — 25% model performance gain, zero human intervention. But before I got comfortable leaving agents running unsupervised, I had to solve one problem first: what happens when an agent loops, hallucinates tool calls, or just keeps going when it should have stopped?

The answer, without guardrails, is a large API bill and a lot of regret.

So I built AgentGuard — a Python SDK that enforces budget, token, time, and rate limits on AI agents at runtime. This post covers the problem it solves, how it works, and how to drop it into your own agents in about four lines of code.

The Real Problem: Agents Don't Know When to Stop

Monitoring tools like LangSmith and Langfuse are useful. They tell you what happened. But they can't stop anything mid-run. By the time you get a cost alert from your dashboard, the damage is done.

The failure modes are predictable:

Loop traps: An agent keeps calling the same tool because each response is slightly different and the termination condition never triggers
Hallucinated tool chains: The agent invents multi-step plans that require dozens of API calls to execute
Runaway research tasks: A research agent finds one more source, then another, then another — with no concept of diminishing returns
Clock drift: A task that should take 2 minutes runs for 45 because something upstream is slow

None of these are bugs in the traditional sense. They're emergent behaviors from giving a language model agency. The fix isn't better prompting — it's enforcement at the infrastructure level.

How AgentGuard Works

AgentGuard installs as a lightweight wrapper around your existing LLM calls. It tracks cost, tokens, time, and tool call patterns in real time, and kills the agent the moment any limit is breached.

Install it:

pip install agentguard47

Drop it into an existing OpenAI agent in four lines:

from agentguard import Tracer, BudgetGuard, patch_openai

tracer = Tracer(guards=[BudgetGuard(max_cost_usd=5.00, warn_at_pct=0.8)])
patch_openai(tracer)
# From here, all OpenAI calls are tracked and enforced automatically

That's it. Your existing agent code doesn't change. AgentGuard patches the OpenAI client and intercepts every call. When the cumulative cost hits $4.00 (80% of the limit), it fires a warning callback. At $5.00, it raises a BudgetExceededError and terminates the run.

The Full Guard Suite

A dollar budget isn't always the right constraint. AgentGuard ships five guard types:

BudgetGuard — Hard dollar and token limits with configurable warning thresholds. Supports per-model pricing for OpenAI, Anthropic, Google, Mistral, and Meta out of the box.

BudgetGuard(max_cost_usd=10.00, max_tokens=100_000, warn_at_pct=0.75)

LoopGuard — Detects exact repeated tool calls. Useful when an agent is stuck calling the same function with the same arguments.

LoopGuard(max_repeats=3)

FuzzyLoopGuard — Detects similar (not identical) patterns. Better for agents that slightly vary their inputs but are functionally stuck.

FuzzyLoopGuard(max_tool_repeats=5)

TimeoutGuard — Wall-clock enforcement. If the agent hasn't finished in N seconds, it's terminated.

TimeoutGuard(max_seconds=300)

RateLimitGuard — Caps calls per minute. Useful for shared environments or when you're working within upstream API rate limits.

RateLimitGuard(max_calls_per_minute=60)

Combine them:

tracer = Tracer(guards=[
    BudgetGuard(max_cost_usd=5.00),
    LoopGuard(max_repeats=3),
    TimeoutGuard(max_seconds=600),
])

Framework Support

Most real agents aren't built with raw OpenAI calls. AgentGuard integrates with the frameworks people actually use:

LangChain: AgentGuardCallbackHandler plugs into the standard callback interface
LangGraph: @guarded_node decorator wraps individual nodes
CrewAI: AgentGuardCrewHandler via step callbacks
Direct patching: patch_openai() and patch_anthropic() for everything else

Tracing and Evaluation

Every run generates a JSONL trace file with full event history, span data, and cost attribution. You can run assertions against it after the fact:

from agentguard import EvalSuite

EvalSuite("traces.jsonl") \
    .assert_no_loops() \
    .assert_budget_under(tokens=50_000) \
    .assert_completes_within(seconds=30) \
    .run()

This is useful for testing agent behavior before deploying to production, or for building regression tests after you've tuned a prompt.

Why Not Just Use LangSmith / Langfuse?

Observability tools are necessary but not sufficient. They show you the trace after execution. AgentGuard acts during execution. It's the difference between a security camera and a deadbolt.

A few other differences:

Feature	LangSmith / Langfuse	AgentGuard
Cost monitoring	✅	✅
Hard budget enforcement	❌	✅
Kill switch mid-run	❌	✅
Loop detection	❌	✅
Zero external dependencies	❌	✅
Self-hosted	Partial	✅

AgentGuard isn't an alternative to observability — it's what you add when you move from development to unsupervised production runs.

The Overnight Experiment Context

This library came directly from the autonomous ML research agent I described in this post. When you're running 100 experiments with zero human intervention, you need to know that if the agent enters a degenerate loop at 2am, it's going to stop itself — not run until morning and hand you a $800 API bill.

AgentGuard is how I made that guarantee. It's MIT licensed, has 93% test coverage, and zero runtime dependencies. If you're building anything that runs unsupervised, it belongs in your stack.

Install: pip install agentguard47 Repo: github.com/bmdhodl/agent47

Putting It Into Production

The setup I use for serious overnight runs:

from agentguard import Tracer, BudgetGuard, LoopGuard, TimeoutGuard, patch_anthropic
import logging

def on_warning(event):
    logging.warning(f"AgentGuard warning: {event}")

tracer = Tracer(
    guards=[
        BudgetGuard(max_cost_usd=20.00, warn_at_pct=0.8, on_warn=on_warning),
        LoopGuard(max_repeats=4),
        TimeoutGuard(max_seconds=3600),
    ],
    trace_file="run_traces.jsonl"
)
patch_anthropic(tracer)

That's a $20 hard cap, a warning at $16, loop detection, and a 1-hour timeout. If any of those trigger, the agent stops cleanly and I get a trace file explaining exactly what happened.

If you're building autonomous agents and want them running safely in production, get in touch. This is the exact kind of infrastructure work I help with.

Want more like this?

AI agent builds, real costs, what works. M-F only when there is something worth sending. No fluff.

Patrick Hughes

Building BMD HODL — a one-person AI-operated holding company. Nashville, Tennessee. Twenty-Two agents.