3 comments
You can try it with this model here: <a href="https://hugston.com/models/56tps-tested-autoround-qwen35-35b-a3b-q2-k-s" rel="nofollow">https://hugston.com/models/56tps-tested-autoround-qwen35-35b...</a>
which is really well done and can run pretty fast with ctx up to 300k.
Just 11.65 GB.
Get the Mmproj also for vision/image processing.
hmm... at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy, AutoRound pushes that to ~99.4–100.n% (??) the gap is roughly 0.1–0.7 percentage points<p><a href="https://github.com/intel/auto-round/blob/main/docs/gguf_alg_ext_acc.md" rel="nofollow">https://github.com/intel/auto-round/blob/main/docs/gguf_alg_...</a>
[dead]