Writing · Tag

#gpu-offloading

2 posts tagged #gpu-offloading.

Jun 4, 20266 min read
llama.cpp ngl: when -ngl 99 still runs on your CPU
You set -ngl 99 and llama.cpp still runs on your CPU. The flag is fine. Here is the 30-second load-log diagnostic and the five real causes, ranked.
#llama-cpp#local-llm#gpu-offloading#n-gpu-layers
Apr 6, 20267 min read
llama.cpp n-gpu-layers Explained: -1 vs 0 + VRAM Guide (2026)
Setting --n-gpu-layers wrong tanks your tokens/sec or crashes with OOM. Here's exactly what to use (-1, 0, or a number), the VRAM-per-layer math, and 4060-4090 benchmarks.
#llama-cpp#local-llm#gpu-offloading#n-gpu-layers

The AI agent build notes

Real costs, real tools, no fluff. M-F when I ship, publish, or learn something worth sending.