1 comments

  • themafia2 hours ago
    <p><pre><code> vpcmpestri xmm2, xmm3, BYTEWISE_CMP test cx, 0x10 ; if(rcx != 16) </code></pre> I see this test&#x2F;cmp all the time after the instruction and I don&#x27;t understand it. pcmpestri will set ZF if edx &lt; 16, and it will set SF if eax &lt; 16. It is already giving you the necessary status. Also testing sub words of the larger register is very slow and is a pipeline hazard.<p>You&#x27;ve got this monster of an instruction and then people place all this paranoid slowness around it. Am I reading the x86 manual wrong?
    • kevinday1 hour ago
      I think people started doing that after one of the Intel SSE examples did it and everyone just copied it.<p>But on any modern CPU there should be essentially no penalty for doing that now. Testing the full register is basically free as long as you aren&#x27;t doing a partial write followed by a full read (write AH then read AX), and I don&#x27;t think there&#x27;s any case where this could stall on anything newer than a Core 2 era processor. But just replacing that with a &quot;jnc&quot; or whatever you&#x27;re exactly trying to test for would be less instructions at least. I&#x27;d love to see benchmarks though if someone has dug deeper into this than I have.