Writing · Tag

#GPU

7 posts tagged #GPU.

Jun 9, 20265 min read
Which GGUF Quant Should You Actually Pick? Q4 vs Q5 vs Q6 vs Q8 (2026)
Q4_K_M vs Q5_K_M vs Q6_K vs Q8_0. A practical decision guide for picking the right GGUF quant on consumer GPUs.
#local-llm#gguf#quantization#gpu
Jun 9, 20265 min read
How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)
A practical guide to picking llama.cpp --n-gpu-layers: VRAM math, KV cache, OOM fixes, and a fast tuning loop.
#local-llm#llama-cpp#gpu#vram
Jun 8, 20267 min read
llama.cpp Multi-GPU: Splitting a Model Across Cards with --tensor-split
Split a 70B model across multiple GPUs with llama.cpp. How --tensor-split, --main-gpu, and --split-mode work on a real consumer rig.
#llama.cpp#local-llm#gpu#multi-gpu
Jun 8, 20266 min read
How to Tune --n-gpu-layers for Your VRAM Budget
How to actually pick --n-gpu-layers: the offload math, finding the number with nvidia-smi, multi-GPU splits, and the top OOM mistakes.
#local-llm#llama-cpp#gpu#vram
Jun 8, 20266 min read
How to Pick a GGUF Quant Level for Your VRAM Budget
Given your GPU, which GGUF quant do you actually pick? The VRAM math, a card-by-card table, and the quality tradeoff in plain terms.
#local-llm#gguf#quantization#gpu
May 13, 20268 min read
GGUF Quantization: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)
Q4_K_M cuts model size 75% with barely any quality loss — but Q5, Q6, and Q8 each win in specific cases. We benchmarked every quant level on real hardware. Here's which to pick. (2026)
#llama.cpp#GGUF#Quantization#Local AI
Apr 16, 20265 min read
GPU Prices Up 48% in Two Months. I Run LLMs in My Garage.
Blackwell rental hit $4.08/hr. CoreWeave raised prices 20%. Anthropic restricted their newest model to 40 orgs. Meanwhile, consumer GPUs are sitting idle.
#local-llm#gpu#infrastructure#agentguard

The AI agent build notes

Real costs, real tools, no fluff. M-F when I ship, publish, or learn something worth sending.

Which GGUF Quant Should You Actually Pick? Q4 vs Q5 vs Q6 vs Q8 (2026)

How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)

llama.cpp Multi-GPU: Splitting a Model Across Cards with --tensor-split

How to Tune --n-gpu-layers for Your VRAM Budget

How to Pick a GGUF Quant Level for Your VRAM Budget

GGUF Quantization: Q4_K_M vs Q5_K_M vs Q8 — Which to Pick (2026)

GPU Prices Up 48% in Two Months. I Run LLMs in My Garage.

The AI agent build notes