3 comments

  • everlier5 minutes ago
    owning GGUF conversion step is good in sone circumstances, but running in fp16 is below optimal for this hardware due to low-ish bandwidth.<p>It looks like context is set to 32k which is the bare minimum needed for OpenCode with its ~10k initial system prompt. So overall, something like Unsloth&#x27;s UD q8 XL or q6 XL quants free up a lot of memory and bandwidth moving into the next tier of usefulness.
  • JSR_FDED4 minutes ago
    Perfect. No fluff, just the minimum needed to get things working.
  • timmy77756 minutes ago
    Thanks for sharing. However, this missed being a good writeup due to lack of numbers and data.<p>I&#x27;ll give a specific example in my feedback, You said:<p>``` so far, so good, I was able to play with PyTorch and run Qwen3.6 on llama.cpp with a large context window ```<p>But there are no numbers, results or output paste. Performance, or timings.<p>Anyone with ram can run these models, it will just be impracticably slow. The halo strix is for a descent performance, so you sharing numbers will be valuable here.<p>Do you mind sharing these? Thanks!
    • gessha16 minutes ago
      This is more of a “succeeding to get anywhere close to messing around” rather than “it works so now I can run some benchmarks” type of article.
    • l33tfr4gg3r16 minutes ago
      To give benefit of doubt, author does state multiple times (including in the title) that these were &quot;first impressions&quot;, so perhaps they should have mentioned something like &quot;...In the next post, we&#x27;ll explore performance and numbers&quot; to avoid a cliffhanger situation, or do a part 1 (assuming the intention was to follow-up with a part 2).