Local LLM Toolkit
See the GGUF quality tradeoff before you download.
Compare IQ2_XXS, IQ3_XXS, IQ4_XS, Q4_0, Q4_K_M, Q5_K_M, Q6_K, Q8_0, and F16 by size, quality, speed, and GPU fit. Pick the smallest file that still keeps the model useful.
Get the local-LLM digest
Get GGUF picks, GPU-fit notes, and benchmark updates. M-F, only when there is something worth sending.
Single opt-in for the local-LLM newsletter. Unsubscribe anytime. Privacy.
Compare
5/5 free runs left today
Q4_K_M
Q4_K_M needs CPU offload, but it preserves more quality than the ultra-small quants.
Size
40GB
Quality
97.5%
GPU layers
46/80
Speed
8.3-16 tok/s
Needs more VRAM
Some links are affiliate links. If you buy or rent through them I may earn a commission at no extra cost to you.
GGUF tradeoff curve
Generic 70B on RTX 4090 24GB
| Quant level | Size | Quality vs F16 | Speed boost | VRAM saved | Verdict |
|---|---|---|---|---|---|
| IQ2_XXS | 20GB | 87% | 2.4x | 120GB (85.7%) | Last resort. Small, but the model changes. |
| IQ3_XXS | 28GB | 93% | 2.3x | 112GB (80%) | Quality loss is noticeable. Use for fit. |
| IQ4_XS | 36GB | 96.5% | 2.1x | 104GB (74.3%) | Compact pick when Q4_K_M is too tight. |
| Q4_0 | 37GB | 96% | 2x | 103GB (73.6%) | Older Q4. Prefer Q4_K_M when available. |
| Q4_K_MSweet spot | 40GB | 97.5% | 2x | 100GB (71.4%) | Default sweet spot for most local runs. |
| Q5_K_MSweet spot | 47GB | 98.5% | 1.8x | 93GB (66.4%) | Quality pick. Worth it when it fits. |
| Q6_K | 54GB | 99% | 1.6x | 86GB (61.4%) | High quality. Still a large download. |
| Q8_0 | 70GB | 99.5% | 1.4x | 70GB (50%) | Near lossless. Good when memory is abundant. |
| F16 | 140GB | 100% | 1x | 0GB (0%) | Native quality. Usually too large for local GPUs. |
Default model
Generic 70B
Sweet spot
Q4_K_M to Q5_K_M
Recent usage
0 tracked runs / 30d
FAQ
- What is the best GGUF quantization for local LLMs?
- Q4_K_M is the default pick from this tool's reference table: 97.5% of F16, 40GB for the 70B reference, and marked as a sweet spot. Q5_K_M is the quality-leaning sweet spot at 98.5% when it fits. Q8_0 is the near-lossless option at 99.5% when memory matters less than fidelity.
- How much quality do you lose with Q4_K_M?
- Q4_K_M is 97.5% vs F16, so the reference loss is 2.5%. The 70B reference size drops from 140GB to 40GB, saving 100GB (71.4%).
- Should I use Q5_K_M or Q8_0?
- Use Q5_K_M when you want the higher sweet-spot quality proxy (98.5%) and it fits. Use Q8_0 when you have enough VRAM and want the near-lossless row (99.5%) more than the smaller download. For most default local runs, compare both against Q4_K_M.
§ 002 / PRICING
Unlimited local LLM decisions with Pro.
The toolkit is free for up to 5 free runs per tool per day. Upgrade to Pro to remove the limit and keep your rig history in one place.
Free
$0
- 5 free runs per tool per day
- Standard GPU presets
Pro
$7/mo
- Unlimited calculator runs
- Save my rig and get new-fit alerts
- Import custom models from Hugging Face URLs
- Benchmark history across model and quant choices
- Early access to new toolkit surfaces
- No ads
Or $49/year