4 comments

  • titanix8822 hours ago
    Looks like the author have not used software pipelining compiler directives with the kernel loops. AMD AIE architecture has 5 cycle load/store latency and 7 cycle FP unit latency. With software pipelining, they could have 5-10x speed up for long loops.
  • fooblaster19 hours ago
    This architecture is likely going to be a dead end for AMD. It has been in the wild for several years, yet still has no open programming model, multiple compiler stacks with poor software support. I find it likely that AMD drops this architecture and unifies their ML support around their GPGPU hardware.
  • nl22 hours ago
    Note that this is BLAS on the AMD&#x2F;Xilinx VCK5000 FPGA: <a href="https:&#x2F;&#x2F;www.amd.com&#x2F;en&#x2F;products&#x2F;adaptive-socs-and-fpgas&#x2F;evaluation-boards&#x2F;vck5000.html" rel="nofollow">https:&#x2F;&#x2F;www.amd.com&#x2F;en&#x2F;products&#x2F;adaptive-socs-and-fpgas&#x2F;eval...</a>
    • heavyset_go21 hours ago
      How does this line compare to the Ryzen AI branded Xilinx FPGAs in newer mobile AMD APUs?
      • wmf21 hours ago
        The Ryzen AI NPU is from Xilinx but it&#x27;s not an FPGA BTW.
        • heavyset_go20 hours ago
          I thought the XDNA line was related to Xilinx&#x27;s Versal (or Alveo, I forget) lines that use FPGA fabric?<p>Or maybe I&#x27;m misinterpreting press releases, as evidently Notebookcheck.net lied to me years ago :(<p>[1] <a href="https:&#x2F;&#x2F;www.notebookcheck.net&#x2F;AMD-details-4-nm-Zen-4-Ryzen-7040-Phoenix-HS-APUs-with-integrated-Xilinx-AI-smarts-alongside-refreshed-Zen-3-and-Zen3-offerings.679093.0.html" rel="nofollow">https:&#x2F;&#x2F;www.notebookcheck.net&#x2F;AMD-details-4-nm-Zen-4-Ryzen-7...</a>
          • wtallis10 hours ago
            It&#x27;s an IP block that Xilinx can provide for use on their FPGAs, but as implemented on the Ryzen parts it&#x27;s synthesized into a hard IP block, not an FPGA block plus bitstream.
  • kouteiheika23 hours ago
    So it&#x27;s called an &quot;AI Engine&quot;, but its performance is worse than just running the same thing on CPU? Doesn&#x27;t it make it essentially useless for anything AI related? What&#x27;s the point of this hardware then? Better power efficiency for tiny models? Surely someone must be using it for something?
    • heavyset_go22 hours ago
      The point is offloading ML workloads to hardware that is energy efficient, not necessarily &quot;fast&quot; hardware.<p>You want to minimize the real and energy costs at the expense of time.<p>Assuming NPUs don&#x27;t get pulled from consumer hardware altogether, theoretically the time&#x2F;efficiency trade-off gap will become smaller and smaller as time goes on.
    • shetaye23 hours ago
      The CPU baseline seems to be the beefy host CPU. The AIE is presumably faster than what you could do with the FPGA (DPS, LUT, etc.) alone.