Writing · Tag

#llama-cpp

6 posts tagged #llama-cpp.

Jun 9, 20265 min read
How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)
A practical guide to picking llama.cpp --n-gpu-layers: VRAM math, KV cache, OOM fixes, and a fast tuning loop.
#local-llm#llama-cpp#gpu#vram
Jun 9, 20265 min read
GGUF Quantization and VRAM: How to Pick Q4, Q5, or Q8 for Your GPU (2026)
VRAM decides your GGUF quant, not vibes. How I assign Q4, Q5, Q8 across an 8GB 3070, 16GB 5070 Ti, and 32GB 5090.
#local-llm#gguf#quantization#llama-cpp
Jun 8, 20266 min read
How to Tune --n-gpu-layers for Your VRAM Budget
How to actually pick --n-gpu-layers: the offload math, finding the number with nvidia-smi, multi-GPU splits, and the top OOM mistakes.
#local-llm#llama-cpp#gpu#vram
Jun 8, 20266 min read
How to Pick a GGUF Quant Level for Your VRAM Budget
Given your GPU, which GGUF quant do you actually pick? The VRAM math, a card-by-card table, and the quality tradeoff in plain terms.
#local-llm#gguf#quantization#gpu
Jun 4, 20266 min read
llama.cpp ngl: when -ngl 99 still runs on your CPU
You set -ngl 99 and llama.cpp still runs on your CPU. The flag is fine. Here is the 30-second load-log diagnostic and the five real causes, ranked.
#llama-cpp#local-llm#gpu-offloading#n-gpu-layers
Apr 6, 20267 min read
llama.cpp n-gpu-layers Explained: -1 vs 0 + VRAM Guide (2026)
Setting --n-gpu-layers wrong tanks your tokens/sec or crashes with OOM. Here's exactly what to use (-1, 0, or a number), the VRAM-per-layer math, and 4060-4090 benchmarks.
#llama-cpp#local-llm#gpu-offloading#n-gpu-layers

The AI agent build notes

Real costs, real tools, no fluff. M-F when I ship, publish, or learn something worth sending.

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

GGUF Quantization and VRAM: How to Pick Q4, Q5, or Q8 for Your GPU (2026)

How to Tune --n-gpu-layers for Your VRAM Budget

How to Pick a GGUF Quant Level for Your VRAM Budget

llama.cpp ngl: when -ngl 99 still runs on your CPU

llama.cpp n-gpu-layers Explained: -1 vs 0 + VRAM Guide (2026)

The AI agent build notes