6 comments

  • andikleen22 hours ago
    Early x86-64 Linux had a similar problem. The x86-64 ABI uses registers for the first 6 arguments. To support variable number of arguments (like printf) requires passing the number of arguments in an extra register (RAX), so that the callee can save the registers to memory for va_arg() and friends. Doing this for every call is too expensive, so it&#x27;s only done when the prototype is marked as stdarg.<p>Now the initial gcc implemented this saving to memory with a kind of duffs device, with a computed jump into a block of register saving instructions to only save the needed registers. There was no boundary check, so if the no argument register (RAX) was not initialized correctly it would jump randomly based on the junk, and cause very confusing bug reports.<p>This bit quite some software which didn&#x27;t use correct prototypes, calling stdarg functions without indicating that in the prototype. On 32bit code which didn&#x27;t use register arguments this wasn&#x27;t a problem.<p>Later compiler versions switched to saving all registers unconditionally.
    • veltas48 minutes ago
      In the SysV ABI for AMD64 the AL register is used to pass an upper bound on the number of vector registers used, is this related to what you&#x27;re talking about?
  • Joker_vD4 hours ago
    Raymond Chen has a whole &quot;Introduction to IA-64&quot; series of posts on his blog, by the way. It&#x27;s such an unconventional ISA that I am baffled that Intel seriously thought they would&#x27;ve been able to persuade anyone to switch to it from x86: it&#x27;s very poorly suited for general-purpose computations. Number crunching, sure, but anything more freeform, and you stare at the specs and wonder how the hell the designers supposed this thing to be programmed and used.
    • pjmlp51 minutes ago
      Itanium only failed because AMD for various reasons was able to come up with AMD64 and rug pull Intel&#x27;s efforts.<p>In an alternative universe without AMD64, Intel would have kept pushing Itanium while sorting out its issues, HP-UX was on it, and Windows XP as well.
    • eru44 minutes ago
      &gt; It&#x27;s such an unconventional ISA that I am baffled that Intel seriously thought they would&#x27;ve been able to persuade anyone to switch to it from x86 [...]<p>I don&#x27;t know, most people don&#x27;t care about the ISA being weird as long as the compiler produces reasonably fast code?
    • kragen1 hour ago
      They took technical risks that didn&#x27;t pan out. They thought they&#x27;d be able to solve whatever problems they ran into, but they couldn&#x27;t. They didn&#x27;t know ahead of time that the result was going to suck. If you try to run an actual tech company, like Intel, without taking any technical risks, competitors who do take technical risks will leave you in the dust.<p>This doesn&#x27;t apply to fake tech companies like AirBnB, Dropbox, and Stripe, and if you&#x27;ve spent your career at fake tech companies, your intuition is going to be &quot;off&quot; on this point.
      • eru41 minutes ago
        Computer hardware isn&#x27;t the only &#x27;tech&#x27; that exists, you know?<p>Problems in operations research (like logistics) or fraud detection can be just as technical.
    • jcranmer3 hours ago
      Some guesses here:<p>First off, Itanium was definitely meant to be the 64-bit successor to x86 (that&#x27;s why it&#x27;s called IA-64 after all), and moving from 32-bit to 64-bit would absolutely have been a killer feature. It&#x27;s basically only after the underwhelming launch of Itanium that AMD comes out with AMD64, which becomes the actual 64-bit version of x86; once that comes out, the 64-bitness of Itanium is no longer a differentiation.<p>Second... given that Itanium basically implements every weird architecture feature you&#x27;ve ever heard of, my guess is that they decided they had the resources to make all of this stuff work. And they got into a bubble where they just simply ignored any countervailing viewpoints anytime someone brought up a problem. (This does seem to be a particular specialty of Intel.)<p>Third, there&#x27;s definitely a baseline assumption of a sufficiently-smart compiler. And my understanding is that the Intel compiler was actually halfway decent at Itanium, whereas gcc was absolute shit at it. So while some aspects of the design are necessarily inferior (a sufficiently-smart compiler will never be as good at hardware at scavenging ILP, hardware architects, so please stop trying to foist that job on us compiler writers), it actually did do reasonably well on performance in the HPC sector.
      • happosai2 hours ago
        It appeared to me (from far outside) that Intel was trying to segment the market into &quot;Affordable Home and office PC:s with x86&quot; and &quot;Expensive serious computing with itanium&quot;. Having everything so different was a feature, to justify the eyewateringly expensive itanium pricetag.
    • fulafel2 hours ago
      &gt; baffled that Intel seriously thought they would&#x27;ve been able to persuade anyone to switch to it from x86<p>They did persuade SGI, DEC and HP to switch from their RISCs to it though. Which turned out to be rather good for business.
      • fredoralive2 hours ago
        I suspect SGI and DEC &#x2F; Compaq could look at a chart and see that with P6 Intel was getting very close to their RISC chips, through the power of MONEY (simplification). They weren&#x27;t hitting a CISC wall, and the main moat custom RISC had left was 64 bit. Intel&#x27;s 64 bit chip would inevitably become the standard chip for PCs, and therefore Intel would be able to turn its money cannon onto overpowering all 64 bit RISCs in short order. May as well get aboard the 64 bit Intel train early.<p>Which is nearly true 64 bit Intel chips did (mostly) kill RISC. But not their (and HP&#x27;s) fun science project IA64, they had to copy AMD&#x27;s &quot;what if x86, but 64 bit?&quot; idea instead.
      • zinekeller2 hours ago
        SGI and DEC, yes, but HP? Itanium was HP&#x27;s idea all along! [1]<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Itanium#History" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Itanium#History</a>
    • yongjik4 hours ago
      Well, they did persuade HP to ditch their own homegrown PA-RISC architecture and jump on board with Itanium, so there&#x27;s that. I wonder how much that decision contributed to the eventual demise of HP&#x27;s high performance server division ...
      • classichasclass2 hours ago
        A lot, I think. PA-RISC had a lot going for it, high performance, solid ISA, even some low-end consumer grade parts (not to the same degree as PowerPC but certainly more so than, say, SPARC). It could have gone much farther than it did.<p>Not that HP was the only one to lose their minds over Itanic (SGI in particular), but I thought they were the ones who walked away from the most.
    • AndrewStephens3 hours ago
      I remember when IA-64 was going to be the next big thing and being utterly baffled when the instruction set was made public. Even if you could somehow ship code that efficiently used the weird instruction bundles, there was no indication that future IA-64 CPUs would have the same limits for instruction grouping.<p>It did make a tiny bit of sense at the time. Java was ascendant and I think Intel assumed that JIT compiled languages were going to dominate the new century and that a really good compiler could unlock performance. It was not to be.
      • kragen1 hour ago
        That is not what happened.<p>EPIC development at HP started in 01989, and the Intel collaboration was publicly announced in 01994. The planned ship date for Merced, the first Itanic, was 01998, and it was first floorplanned in 01996, the year Java was announced. Merced finally taped out in July 01999, three months after the first JIT option for the JVM shipped. <i>Nobody</i> was <i>assuming</i> that JIT compiled languages were going to dominate the new century at that time, although there were some promising signs from Self and Strongtalk that maybe they could be half as fast as C.
    • msla4 hours ago
      &quot;We don&#x27;t care, we don&#x27;t have to, we&#x27;re Intel.&quot;<p>Plus, DEC managed to move all of its VAX users to Alpha through the simple expedient of no longer making VAXen, so I wonder if HP (which by that point had swallowed what used to be DEC) thought it could repeat that trick and sunset x86, which Intel has wanted to do for very nearly as long as the x86 has existed. See also: Intel i860<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Intel_i860" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Intel_i860</a>
  • nayuki2 hours ago
    &gt; The ia64 is a very demanding architecture. In tomorrow’s entry, I’ll talk about some other ways the ia64 will make you pay the penalty when you take shortcuts in your code and manage to skate by on the comparatively error-forgiving i386.<p><a href="https:&#x2F;&#x2F;devblogs.microsoft.com&#x2F;oldnewthing&#x2F;20040120-00&#x2F;?p=40993" rel="nofollow">https:&#x2F;&#x2F;devblogs.microsoft.com&#x2F;oldnewthing&#x2F;20040120-00&#x2F;?p=40...</a> &quot;ia64 – misdeclaring near and far data&quot;<p><a href="https:&#x2F;&#x2F;devblogs.microsoft.com&#x2F;oldnewthing&#x2F;2004&#x2F;01" rel="nofollow">https:&#x2F;&#x2F;devblogs.microsoft.com&#x2F;oldnewthing&#x2F;2004&#x2F;01</a>
  • vardump7 hours ago
    Pretty surprising. So IA64 registers were 65 bit, with the extra bit describing whether the register contains garbage or not. If NaT (Not a Thing) is set, the register contents are invalid and that can cause &quot;fun&quot; things to happen...<p>Not that this matters to anyone anymore. IA64 utterly failed long ago.
    • kragen1 hour ago
      It matters to people designing new hardware and maybe new virtual machine instruction sets.
    • ashleyn6 hours ago
      There are modern VLIW architectures. I think Groq uses one. The lessons on what works and what doesn&#x27;t are worth learning from history.
      • bri3d6 hours ago
        VLIW works for workloads where the compiler can somewhat accurately predict what will be resident in cache. It’s used everywhere in DSP, was common in GPU for awhile, and is present in lots of niche accelerators. It’s a dead end for situations where cache residency is not predictable, like any kind of multitenant general purpose workload.
      • 0dyl4 hours ago
        The new TI C2000 F29 series of microcontrollers are VLIW
      • addaon6 hours ago
        A more everyday example is the Hexagon DSP ISA in Qualcomm chips. Four-wide VLIW + SMT.
      • vardump5 hours ago
        I meant narrowly only about IA64. There is sure some lessons learned value.
      • msla4 hours ago
        IA64 was EPIC, which, itself, was a &quot;lessons learned&quot; VLIW design, in that it had things like stop bits to explicitly demarcate dependency boundaries so instructions from multiple words could be combined on future hardware with more parallelism, and speculative execution and loads, which, well, see the article on how the speculative loads were a mixed blessing.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Explicitly_parallel_instruction_computing" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Explicitly_parallel_instructio...</a>
    • msla7 hours ago
      In case someone hasn&#x27;t heard:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Itanium" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Itanium</a><p>&gt; In 2019, Intel announced that new orders for Itanium would be accepted until January 30, 2020, and shipments would cease by July 29, 2021.[1] This took place on schedule.[9]
  • ronsor7 hours ago
    Yet another reason IA64 was a design disaster.<p>VLIW architectures still live on in GPUs and special purpose (parallel) processors, where these sorts of constraints are more reasonable.
    • MindSpunk6 hours ago
      Are any relevant GPUs VLIW anymore? As far as I&#x27;m aware they all dropped it too, moving to scalar ISAs on SIMT hardware. The last VLIW GPU I remember was AMD TeraScale, replaced by GCN where one of the most important architecture changes was dropping VLIW.
    • nneonneo7 hours ago
      I mean, there is a reason why these sorts of constructs are UB, even if they work on popular architectures. The problems aren’t unique to IA64, either; the better solution is to be aware that UB means UB and to avoid it studiously. (Unfortunately, that’s also hard to do in C).
      • loeg6 hours ago
        It&#x27;s a very weird architecture to have these NAT states representable in registers but <i>not</i> main memory. Register spilling is a common requirement!
        • Someone53 minutes ago
          Old-time x86 sort-of has “states representable in registers but not main memory”, too.<p>Compilers used to use its 80-bit floating point registers for 64-bit float computations, but also might spill them to memory as 64-bit float numbers.<p><a href="https:&#x2F;&#x2F;hal.science&#x2F;hal-00128124v3&#x2F;file&#x2F;floating-point.pdf" rel="nofollow">https:&#x2F;&#x2F;hal.science&#x2F;hal-00128124v3&#x2F;file&#x2F;floating-point.pdf</a> section 3 has some examples, including one where the <i>assert</i> can fail in:<p><pre><code> int main (void) { double x = 0x1p-1022, y = 0x1p100, z; do_nothing(&amp;y); z = x &#x2F; y; if (z != 0) { do_nothing(&amp;z); assert(z != 0); } } </code></pre> with<p><pre><code> void do nothing (double *x) { } </code></pre> in a different compilation unit.
        • amluto6 hours ago
          Hah, this is IA-64. It has special hardware support for register spills, and you can search for “NaT bits” here:<p><a href="https:&#x2F;&#x2F;portal.cs.umbc.edu&#x2F;help&#x2F;architecture&#x2F;aig.pdf" rel="nofollow">https:&#x2F;&#x2F;portal.cs.umbc.edu&#x2F;help&#x2F;architecture&#x2F;aig.pdf</a><p>to discover at least two magical registers to hold up to 127 spilled registers worth of NaT bits. So they tried.<p>The NaT bits are truly bizarre and I’m really not convinced they worked well. I’m not sure what happens to bits that don’t fit in those magic registers. And it’s definitely a mistake to have registers where the register’s value cannot be reliably represented in the common in-memory form of the register. x87 FPU’s 80-bit registers that are usually stored in 64-bit words in memory are another example.
          • dwattttt5 hours ago
            CHERI looks at this and says &quot;64+1 bits? A childish effort&quot;, and brings 128+1 to the table.<p>EDIT: to be fair to it, they carry it through to main memory too
            • amluto1 hour ago
              I no real complaints about CHERI here. What’s a pointer, anyway? Lots of old systems thought it was 8 or 16 bits that give a linear address. 8086 thought it was 16 + 16 bits split among two registers, with some interesting arithmetic [0]. You can’t add, say, 20000 to a pointer and get a pointer to a byte 20000 farther into memory. 80286 changed it so those high bits index into a table, and the actual segment registers are much wider than 16 bits and can’t be read or written directly [1]. Unprivileged code certainly cannot load arbitrary values into a segment register. 80386 added bits. Even x86_64 still technically has those extra segment registers, but they mostly don’t work any more.<p>So who am I to complain if CHERI pointers are even wider and have strange rules? At least you can write a pointer to memory and read it back again.<p>[0] I could be wrong. I’ve hacked on Linux’s v8086 support, but that’s <i>virtual</i> and I never really cared what its effect was in user mode so long as it worked.<p>[1] You can read and write them via SMM entry or using virtualization extensions.
        • mwkaufma6 hours ago
          I assume they were stored in an out-of-band mask word
      • awesome_dude6 hours ago
        The bigger problem is that a user cannot avoid an application where someone was writing code with UB, unless they both have the source code, and expertise in understanding it.
        • eru38 minutes ago
          Isn&#x27;t that a general problem?