My first impressions on ROCm and Strix Halo

(blog.marcoinacio.com)

21 points by random_4 hours ago

3 comments

everlier5 minutes ago
owning GGUF conversion step is good in sone circumstances, but running in fp16 is below optimal for this hardware due to low-ish bandwidth.It looks like context is set to 32k which is the bare minimum needed for OpenCode with its ~10k initial system prompt. So overall, something like Unsloth's UD q8 XL or q6 XL quants free up a lot of memory and bandwidth moving into the next tier of usefulness.
JSR_FDED4 minutes ago
Perfect. No fluff, just the minimum needed to get things working.
timmy77756 minutes ago
Thanks for sharing. However, this missed being a good writeup due to lack of numbers and data.I'll give a specific example in my feedback, You said:``` so far, so good, I was able to play with PyTorch and run Qwen3.6 on llama.cpp with a large context window ```But there are no numbers, results or output paste. Performance, or timings.Anyone with ram can run these models, it will just be impracticably slow. The halo strix is for a descent performance, so you sharing numbers will be valuable here.Do you mind sharing these? Thanks!
- gessha16 minutes ago
 This is more of a “succeeding to get anywhere close to messing around” rather than “it works so now I can run some benchmarks” type of article.
- l33tfr4gg3r16 minutes ago
 To give benefit of doubt, author does state multiple times (including in the title) that these were "first impressions", so perhaps they should have mentioned something like "...In the next post, we'll explore performance and numbers" to avoid a cliffhanger situation, or do a part 1 (assuming the intention was to follow-up with a part 2).