3 comments

  • trilogic13 minutes ago
    You can try it with this model here: <a href="https:&#x2F;&#x2F;hugston.com&#x2F;models&#x2F;56tps-tested-autoround-qwen35-35b-a3b-q2-k-s" rel="nofollow">https:&#x2F;&#x2F;hugston.com&#x2F;models&#x2F;56tps-tested-autoround-qwen35-35b...</a> which is really well done and can run pretty fast with ctx up to 300k. Just 11.65 GB. Get the Mmproj also for vision&#x2F;image processing.
  • netdur1 hour ago
    hmm... at Q4_K_M, stock-style quantization is retaining ~99–99.8% of BF16 accuracy, AutoRound pushes that to ~99.4–100.n% (??) the gap is roughly 0.1–0.7 percentage points<p><a href="https:&#x2F;&#x2F;github.com&#x2F;intel&#x2F;auto-round&#x2F;blob&#x2F;main&#x2F;docs&#x2F;gguf_alg_ext_acc.md" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;intel&#x2F;auto-round&#x2F;blob&#x2F;main&#x2F;docs&#x2F;gguf_alg_...</a>
  • potter0980 minutes ago
    [dead]