Qwen 2.5 72B
72B densestrong reasoning and data work when 70B fits your setup.
Quant
Q5_K_M
Speed
6.8-13
GPU layers
41/80
Score
86
Local LLM Toolkit
Tell it your GPU, workload, and tradeoff. It ranks 18 local models across 6 workloads and 3 priorities, then prefills VRAM checks before you download a 350GB+ weight file.
5/5 free runs left today
Current pick
strong reasoning and data work when 70B fits your setup.
Quant
Q5_K_M
Speed
6.8-13 tok/s
Ranked recommendations
strong reasoning and data work when 70B fits your setup.
Quant
Q5_K_M
Speed
6.8-13
GPU layers
41/80
Score
86
best general chat quality before the models get huge.
Quant
Q5_K_M
Speed
7-13.4
GPU layers
42/80
Score
86
strongest reasoning under 20B for tight VRAM.
Quant
Q8_0
Speed
18.6-35.8
GPU layers
40/40
Score
83
strongest code model that still fits many prosumer GPUs.
Quant
Q8_0
Speed
9.6-18.5
GPU layers
46/64
Score
83
vision-capable model for midrange GPUs.
Quant
Q8_0
Speed
20.1-38.7
GPU layers
48/48
Score
82
good all-around quality without 70B memory pressure.
Quant
Q8_0
Speed
13.1-25.2
GPU layers
39/40
Score
82
New GGUF quant benchmarks, VRAM math, and what actually runs on consumer GPUs. M-F, only when there is something worth sending.
Single opt-in for the local-LLM newsletter. Unsubscribe anytime. Privacy.
Default context
4K tokens
Scope
18 models / 6 workloads / 3 priorities
Recent usage
0 tracked runs / 30d
§ 002 / PRICING
The toolkit is free for up to 5 free runs per tool per day. Upgrade to Pro to remove the limit and keep your rig history in one place.
Free
$0
Pro
$7/mo
Or $49/year
Benchmark rows, VRAM fit checks, quant choices, and what actually runs on consumer GPUs. M-F, only when there is something worth sending.