Writing · Tag
2 posts tagged #gpu-offloading.
You set -ngl 99 and llama.cpp still runs on your CPU. The flag is fine. Here is the 30-second load-log diagnostic and the five real causes, ranked.
Setting --n-gpu-layers wrong tanks your tokens/sec or crashes with OOM. Here's exactly what to use (-1, 0, or a number), the VRAM-per-layer math, and 4060-4090 benchmarks.
Real costs, real tools, no fluff. M-F when I ship, publish, or learn something worth sending.