Writing · Tag

#local-llm

5 posts tagged #local-llm.

Jun 4, 20266 min read
llama.cpp ngl: when -ngl 99 still runs on your CPU
You set -ngl 99 and llama.cpp still runs on your CPU. The flag is fine. Here is the 30-second load-log diagnostic and the five real causes, ranked.
#llama-cpp#local-llm#gpu-offloading#n-gpu-layers
May 15, 20267 min read
Localmaxxing isn't theory. Here's what my 3-GPU rig actually does.
Tom Tunguz called it localmaxxing. I run a 3070 + 5070 Ti + 5090 in one box and serve Llama 3.1 8B locally every day. Here are the real tokens-per-second, the real watts, and the real cost per million tokens.
#local-llm#ai-economics#agent-cost-control#gpu-inference
Apr 16, 20265 min read
GPU Prices Up 48% in Two Months. I Run LLMs in My Garage.
Blackwell rental hit $4.08/hr. CoreWeave raised prices 20%. Anthropic restricted their newest model to 40 orgs. Meanwhile, consumer GPUs are sitting idle.
#local-llm#gpu#infrastructure#agentguard
Apr 15, 20266 min read
Anthropic's Advisor Tool Is the Cost-Split Pattern You Should Already Be Running
Anthropic shipped a pattern where a cheap model runs the loop and escalates to Opus only when it needs to. The pattern works on any two-model setup. Here is the math and the playbook.
#ai-agents#cost-optimization#anthropic#agentguard
Apr 6, 20267 min read
llama.cpp n-gpu-layers Explained: -1 vs 0 + VRAM Guide (2026)
Setting --n-gpu-layers wrong tanks your tokens/sec or crashes with OOM. Here's exactly what to use (-1, 0, or a number), the VRAM-per-layer math, and 4060-4090 benchmarks.
#llama-cpp#local-llm#gpu-offloading#n-gpu-layers

The AI agent build notes

Real costs, real tools, no fluff. M-F when I ship, publish, or learn something worth sending.

llama.cpp ngl: when -ngl 99 still runs on your CPU

Localmaxxing isn't theory. Here's what my 3-GPU rig actually does.

GPU Prices Up 48% in Two Months. I Run LLMs in My Garage.

Anthropic's Advisor Tool Is the Cost-Split Pattern You Should Already Be Running

llama.cpp n-gpu-layers Explained: -1 vs 0 + VRAM Guide (2026)

The AI agent build notes