Writing · Tag
5 posts tagged #gpu.
Split a 70B model across multiple GPUs with llama.cpp. How --tensor-split, --main-gpu, and --split-mode work on a real consumer rig.
How to actually pick --n-gpu-layers: the offload math, finding the number with nvidia-smi, multi-GPU splits, and the top OOM mistakes.
Given your GPU, which GGUF quant do you actually pick? The VRAM math, a card-by-card table, and the quality tradeoff in plain terms.
Q4_K_M cuts model size 75% with barely any quality loss — but Q5, Q6, and Q8 each win in specific cases. We benchmarked every quant level on real hardware. Here's which to pick. (2026)
Blackwell rental hit $4.08/hr. CoreWeave raised prices 20%. Anthropic restricted their newest model to 40 orgs. Meanwhile, consumer GPUs are sitting idle.
Real costs, real tools, no fluff. M-F when I ship, publish, or learn something worth sending.