Writing · Tag
3 posts tagged #local-llm.
Blackwell rental hit $4.08/hr. CoreWeave raised prices 20%. Anthropic restricted their newest model to 40 orgs. Meanwhile, consumer GPUs are sitting idle.
Anthropic shipped a pattern where a cheap model runs the loop and escalates to Opus only when it needs to. The pattern works on any two-model setup. Here is the math and the playbook.
n_gpu_layers -1 offloads every layer to GPU. Learn what each value means, the exact VRAM math, and pick the right setting with real benchmarks.
Real costs, real tools, no fluff. One email per week with what I'm building, what's working, and what's not.