3 comments

  • RossBencina1 hour ago
    &gt; Can different iterations of this loop run independently?<p>I&#x27;m not a compiler guy, but vectorisation algorithms typically analyze loop-carried dependencies and can vectorise loops that are not trivially data parallel as is the case in the post. Allen &amp; Kennedy (MK, 2002) discusses the classical methods.<p>Here&#x27;s an example, I&#x27;m not sure whether the post&#x27;s algorithm would handle it:<p><pre><code> phase = 0.0 inc = 0.12345678899 for i in [0, n): out[i] = table[phase % TABLE_LEN] phase = phase + inc </code></pre> Assuming n%4 == 0, this can be trivially 4 lane vectorised as:<p><pre><code> phasev = [phase, phase+inc, phase+inc+inc, phase+inc+inc+inc] incv = [inc, inc, inc, inc] for i in [0, n) step 4: out[i..i+3] = table[phasev % TABLE_LEN] phasev = phasev + incv </code></pre> When I read Allan and Kennedy my impression was that vectorising arbitrary imperative code is a much harder problem than designing a language that only allows for vectorisable constructs to be expressed in the first place. For example maybe it&#x27;s better to express trivially data parallel kernels as pure functions over buffers and buffer indices. That&#x27;s how shader programs work isn&#x27;t it? In my example that would produce different code, requiring a multiplication:<p><pre><code> lambda i: table[(phase0 + i*inc) % TABLE_LEN]</code></pre>
  • rapatel01 hour ago
    Look into the Eigen library. They use template meta programming to chain linear algebra operations in a way that the compiler <i>should</i> be able to optimize memory layout and kernels for vector instructions. Might give you some ideas.<p><a href="https:&#x2F;&#x2F;libeigen.gitlab.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;libeigen.gitlab.io&#x2F;</a><p>Though you can expect very verbose compiler output. (I had 35 pages of compiler output output for a single type error once). Probably Nbd with llms.
  • mgaunard5 minutes ago
    essentially this is a mini-ISPC?