[bmdpat]

Local LLM Toolkit

See the GGUF quality tradeoff before you download.

Compare Q4, Q5, Q8, and the IQ quants by size, quality, speed, and GPU fit. Pick the smallest file that still keeps the model useful.

Compare

5/5 free runs left today

What I would pick

Q4_K_M

Q4_K_M needs CPU offload, but it preserves more quality than the ultra-small quants.

Size

40GB

Quality

97.5%

GPU layers

46/80

Speed

8.3-16 tok/s

GGUF tradeoff curve

Generic 70B on RTX 4090 24GB

Q4 to Q5 sweet spot
Quant levelSizeQuality vs F16Speed boostVRAM savedVerdict
IQ2_XXS20GB87%2.4x120GB (85.7%)Last resort. Small, but the model changes.
IQ3_XXS28GB93%2.3x112GB (80%)Quality loss is noticeable. Use for fit.
IQ4_XS36GB96.5%2.1x104GB (74.3%)Compact pick when Q4_K_M is too tight.
Q4_037GB96%2x103GB (73.6%)Older Q4. Prefer Q4_K_M when available.
Q4_K_MSweet spot40GB97.5%2x100GB (71.4%)Default sweet spot for most local runs.
Q5_K_MSweet spot47GB98.5%1.8x93GB (66.4%)Quality pick. Worth it when it fits.
Q6_K54GB99%1.6x86GB (61.4%)High quality. Still a large download.
Q8_070GB99.5%1.4x70GB (50%)Near lossless. Good when memory is abundant.
F16140GB100%1x0GB (0%)Native quality. Usually too large for local GPUs.

Default model

70B baseline

Sweet spot

Q4_K_M to Q5_K_M

Logged event

tool_use

FAQ

What is the best GGUF quantization for local LLMs?
Q4_K_M is the default pick for most local runs. Q5_K_M is better when you have enough VRAM. Q8_0 is useful when quality matters more than size.
How much quality do you lose with Q4_K_M?
For many models, Q4_K_M lands around 97% to 98% of the F16 baseline in practical use. It is the main sweet spot because the size drops hard while quality usually holds.
Should I use Q5_K_M or Q8_0?
Use Q5_K_M when you want strong quality on one prosumer GPU. Use Q8_0 when you have enough VRAM and want a near-lossless local copy.

Want more like this?

AI agent builds, real costs, what works. One email per week. No fluff.

Get The One-Person Holdco (free PDF)

How one human plus twenty-two AI agents runs a seven-pillar portfolio with no employees.