§ 001 / 5090 REPORTS

The 5090 Reports

Weekly benchmark and build logs for local AI agent experiments on controlled compute, starting with an RTX 5090 workstation. The hook is the 5090. The moat is the operating system around local, private, production agents.

RTX 5090Local agentsBenchmarks

Get the lab notes Check your GPU fit

Share LinkedIn

§ 002 / CURRENT RUN

The report includes misses.

The 2026-07-09 sweep measured six real tokens/sec rows on the RTX 5090: llama3.1:8b Q4_K_M at 207-229 tok/s and gemma4:26b Q4_K_M at 180-207 tok/s, plus a 140-second model-reload gotcha when num_ctx changes between requests. The June 12 Ollama runner timeout stays in the notebook as the first entry.

Tokens/sec

Model x quant

Measured on real local-agent prompts, not placeholder demos.

VRAM pressure

Context + cache

What fits, what spills, and what changes after quantization.

Cost curve

Local vs API

Per-workload math for agents that run often enough to matter.

Failure log

Timeouts included

Runner crashes, bad configs, and dead ends stay in the record.

Latest public artifact

The 5090 Reports - 2026-07-09

Six measured tokens/sec rows on the RTX 5090: llama3.1:8b Q4_K_M holds 207-229 tok/s generation and gemma4:26b Q4_K_M holds 180-207 tok/s, with VRAM, watts, and a 140-second model-reload gotcha documented in the notes.

GPU: NVIDIA GeForce RTX 5090; Captured from nvidia-smi on the primary dev machine.
Driver: 610.62; Driver reported by the 2026-07-09 hardware snapshot.
Memory: 20233 / 32607 MiB peak; Peak memory during the gemma4:26b Q4_K_M benchmark runs.
Power: 463 / 575.00 W peak; Peak power draw during generation and the board power limit.

Raw report Failure log RSS feed JSON feed Share kit

Distribution kit

Three drafts from the same artifact.

LinkedIn, X, and r/LocalLLaMA drafts use the latest raw report, the failure log, and the email-list CTA.

LinkedIn: Lab note; One artifact, one current result, one email-list CTA.
X: Thread; Short claims that point to the raw report instead of screenshots.
r/LocalLLaMA: Build log; Failure-first technical context for local-inference builders.

Open share kit

AgentGuard install path

Guard the run before it becomes a system.

The 5090 Reports measure local agent experiments on controlled compute. AgentGuard is the installable runtime stop layer for the runs that can spend, loop, timeout, or hammer tools.

Read quickstart PyPI Source

§ 003 / OPERATING RULES

Publish the model, quant, prompt, hardware, and result.

Run one local experiment per week, even when the result is a failure.

No fake benchmark numbers. A failed run is a valid artifact.

Prefer measured notes over broad claims.

§ 004 / PRODUCT PATH

Content is the sensor. Product is the output.

The loop is simple: benchmark in public, grow the list, take capped inbound deployment work only when it teaches the product, then ship the repeated tooling as self-serve software.

Phase 0

Instrument the lab

Weekly reports from hardware snapshots, benchmark CSVs, and failure logs.

Phase 1

Distribute artifacts

Three posts per week across LinkedIn, X, and r/LocalLLaMA, all pointing here.

Phase 2

Capped deployments

Inbound-only, async paid R&D for regulated teams that need local AI.

Phase 3

Extract product

Local agent observability, memory, or MCP tooling rebuilt from repeated deployment work.