Encoding x86 Instructions

(www-user.tu-chemnitz.de)

106 points by st_goliath101 days ago

10 comments

jcranmer100 days ago
This reminds me that at some point I should write up my exploration of the x86 encoding scheme, because a lot of the traditional explanations tend to be overly focused on how the 8086 would have decoded instructions, which isn't exactly the same way you look at them for a modern processors.I actually have a tool I wrote to automatically derive on x86 decoder from observing hardware execution (based in part on sandsifter, so please don't ask me if I've heard of it), and it turns out to largely be a lot simpler than people make it out to be... if you take a step back and ignore some of what people have said about the role of various instruction prefixes (they're not prefixes, they're extra opcode bits).(FWIW, this is fairly dated in that it doesn't cover the three-byte opcodes, or the 64-bit prefixes that were added, like the REX and VEX prefixes).
- peterfirefly100 days ago
 I handwrote some C code (years ago) to do parallel x86 decode in a realistic manner (= similar to how a modern CPU would do it). Was a lot easier than I feared. I didn't complete it, it was just exploratory to see what it would look like.
- QQ0099 days ago
 I would absolutely love to read about that. Please, if you have the time to do that, do it. also do you have a blog or a place where I can follow you and read your articles/code?
 - jcranmer98 days ago
 My blog is at quetzalcoatal.blogspot.com, but it has been some time since I've written a post.When I do get around to writing it up, I'll probably advertise it here on HN.
cube2222100 days ago
This is so relevant for me!I spent some time last weekend on a small side project which involves JIT encoding ARM64 instructions to run them on Apple Silicon.I’ve written assembly before, but encoding was always kind of black magic.How surprised was I to learn how simple instruction encoding is on arm64! Arguably simpler than implementing encoding wasm to byte code, which I played with a while ago.If you want to play with this, based on my very very limited experience so far, I’d suggest starting with arm - fixed length 4 byte instructions, nice register naming scheme, straightforward encoding of arguments, make it very friendly.
- aengelke100 days ago
 > I’d suggest starting with armI agree: AArch64 is a nice instruction set to learn. (Source: I taught ARMv7, AArch64, x86-64 to first-year students in the past.)> how simple instruction encoding is on arm64Having written encoders, decoders, and compilers for AArch64 and x86-64, I disagree. While AArch64 is, in my opinion, very well designed (also better than RISC-V), it's certainly not simple. Here's some of my favorite complexities:- Many instructions have (sometimes very) different encodings. While x86 has a more complex encoding structure, most encodings follow the same structure and are therefore remarkably similar.- Huge amount of instruction operand types: memory + register, memory + unsigned scaled offset, memory + signed offset, optionally with pre/post-increment, but every instruction supports a different subset; vector, vector element, vector table, vector table element; sometimes general-purpose register encodes a stack pointer, sometimes a zero register; various immediate encodings; ...- Logical immediate encoding. Clever, but also very complex. (To be sure that I implemented the decoding correctly, I brute-force test all inputs...)- Register constraints: MUL (by element) with 16-bit integers has a register constraint on the lowest 16 registers. CASP requires an even-numbered register. LD64B requires an even-numbered register less than 24 (it writes Xt..Xt+7).- Much more instructions: AArch64 SIMD (even excluding SVE) has more instructions than x86 including up to AVX-512. SVE/SME takes this to another level.
 - userbinator100 days ago
 A32 is simpler, but looking at A64 instructions certainly begs the question "they still call this a RISC?"
 - aengelke100 days ago
 Actually, nowadays Arm describes the ISA as a load-store architecture. The RISC vs. CISC debate is, in my opinion, pretty pointless nowadays and I'd prefer if we'd just stop using these words to describe ISAs.
 - Joker_vD100 days ago
 Hey, if you can't write "ADD [r0], r1" but have instead to do "LDR r2, [r0]; ADD r2, r1; STR [r0], r2", it means it's still RISC.
davikr100 days ago
x86 is an octal machine (1995): <a href="https://gist.github.com/seanjensengrey/f971c20d05d4d0efc0781f2f3c0353da" rel="nofollow">https://gist.github.com/seanjensengrey/f971c20d05d4d0efc0781...</a>
- userbinator100 days ago
 Discussed here: <a href="https://news.ycombinator.com/item?id=30409100">https://news.ycombinator.com/item?id=30409100</a>I've memorised most of the 1st-page instructions - in octal - and it's easier than it sounds.
- peterfirefly100 days ago
 Originally, yes. These days, not so much. It's "whatever bit confetti was necessary to squeeze in all the opcode/operand bits".
 - userbinator100 days ago
 By "these days" you mean "since AMD messed it up".
 - peterfirefly100 days ago
 Obviously not.Even if you dislike the REX prefixes of AMD64, you've got to think about the 66/F2/F3 prefixes used for older SIMD instructions. They were introduced by Intel and basically contribute to more opcode bits. There's also the 2E/3E prefixes used for static branch prediction hints in some Pentium 4's (and also in new/upcoming Intel CPUs). VEX is from Intel, EVEX is from Intel, and the upcoming REX2 is also from Intel.
 - userbinator96 days ago
 REX is horrible because they occupy two full rows of what were otherwise very useful instructions (inc/dec), and there were other gaps and less-useful instructions (segment override prefixes? If segmentation doesn't work in AMD64 mode, what's the point?) in the opcode map they could've used instead.
1718627440100 days ago
This uses frames and the link is to the inner frame, maybe is should rather link to <a href="https://www-user.tu-chemnitz.de/~heha/hs/chm/x86.chm/" rel="nofollow">https://www-user.tu-chemnitz.de/~heha/hs/chm/x86.chm/</a> . Nevertheless, this is a nice looking website.
khedoros1100 days ago
I was recently working on some x86 emulation code. This is one of the best links that I found to summarize how it works, skipping the giant Intel instruction set references.
- __alexander100 days ago
 It’s been a little bit since I watched it but I recall this playlist being useful/interesting<a href="https://youtube.com/playlist?list=PLJRRppeFlVGIvcTQNISPTxvNmBk3m80qD&si=wsJWDB6GO2U7iCJG" rel="nofollow">https://youtube.com/playlist?list=PLJRRppeFlVGIvcTQNISPTxvNm...</a>
PaulHoule100 days ago
Here is some explanation from the source plus some code that can encode/decode x86 instructions in software
non_obsolete100 days ago
<a href="https://www.sandpile.org/" rel="nofollow">https://www.sandpile.org/</a>concise & complete1996 - 2025with APX
fweimer100 days ago
It would be really nice to have something like this for the x86-64 variant.
- aengelke100 days ago
 The same site hosts [1], but that's not nearly as nice as the 32-bit version. It's also a bit outdated.[1]: <a href="https://www-user.tu-chemnitz.de/~heha/hs/chm/x86.chm/x64.htm" rel="nofollow">https://www-user.tu-chemnitz.de/~heha/hs/chm/x86.chm/x64.htm</a>
 - fweimer100 days ago
 Thanks. Looks like the original now has some clarifications, including more detail regarding the REX prefixes: <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding" rel="nofollow">https://wiki.osdev.org/X86-64_Instruction_Encoding</a>
 - non_obsolete100 days ago
 sandpile.org is your friend.
timonoko100 days ago
[flagged]
- ainiriand100 days ago
 You have the proof right there that it will never be a waste of time. You can't understand why the C one is faster, and someone who does will be superior to a machine because they can apply this learning, context, and much more, to solve really though problems.
- PaulHoule100 days ago
 If it was a real superhuman AI it would use the closed form expansion<a href="https://en.wikipedia.org/wiki/Fibonacci_sequence" rel="nofollow">https://en.wikipedia.org/wiki/Fibonacci_sequence</a>Writing Fibonacci in assembly as recursive functions using C-like calling conventions is like asking Superman to mainline Kryptonite, rather I'd expect an assembly implementation to look like a BASIC version that calculates iteratively.
- d_tr100 days ago
 You have a really sad view on what constitutes a waste of time.
- timonoko100 days ago
 [flagged]
LarsDu88100 days ago
X86 feels painful. So many instructions that wind up being decoded by physical hardware, eating up unnecessary die space and electricity, all to save ram and storage space which is now abundant and cheap compared to when x86 was designed
- jeffbee100 days ago
 The instruction decoder was a large part of the die in 1985. Today you won't be able to identify it in a die photo. In a world with gigantic vector register files, the area used by decode simply is not relevant. Anyway x86 does not save storage space. x86_64 code tends to be larger than armv8 code.
 - buildbot100 days ago
 All the various bits that get tacked on for doing prefetch and branch prediction all are fairly large too, given the amount of random caching, which often is what people account for when measuring decode power usage I think. That’s going to be the case in any arch besides something like a DSP without any kind of dynamic dispatch.
 - jeffbee100 days ago
 I think it's safe to say that a modern x86 branch predictor with its BTBs is significantly larger than the decode block.
 - adgjlsfhk1100 days ago
 Sure, but branch prediction is (as far as we know) a necessary evil. Decode complexity simply isn't.
 buildbot100 days ago
 Right, but decode compexity doesn't matter because of the giant BTB and such. At least that's what I understand.
 throwaway31131100 days ago
 For the cores working hardest to achieve the absolute lowest cpi running user code, this is true. But these days the computers have computers in them to manage the system. And these kinds of statements aren’t necessarily true for these “inner cores” that aren’t user accessible.“ RTKit: Apple's proprietary real-time operating system. Most of the accelerators (AGX, ANE, AOP, DCP, AVE, PMP) run RTKit on an internal processor. The string "RTKSTACKRTKSTACK" is characteristic of a firmware containing RTKit.”<a href="https://asahilinux.org/docs/project/glossary/#r" rel="nofollow">https://asahilinux.org/docs/project/glossary/#r</a>
 saagarjha100 days ago
 And those cores do not run x86.
 jeffbee100 days ago
 I was pretty surprised to find out that the weird non-architectural cores in a Core or Xeon really do run x86 code.
 - LarsDu88100 days ago
 You say its irrelevant, but that's not the same as being necessary. These decode components are simply not necessary whereas a branch prediction actually makes the processor faster
 - Joker_vD100 days ago
 The high-end CPU designs, be it ARMv7, AArch64, RISC-V or x86(-64), have parallelized pre-decoding hardware and buffers for decoded microinstruction because it too, apparently, speeds the execution. From what I understand, the differences in those subsystems that are due to the ISA baroqueness are, again, minuscule.
 - LarsDu8899 days ago
 I've been reading up on this. The differences are indeed minimal. Still not zero, but not the explainer for why M series macs outperform intel x86 on power consumption <a href="https://chipsandcheese.com/p/why-x86-doesnt-need-to-die" rel="nofollow">https://chipsandcheese.com/p/why-x86-doesnt-need-to-die</a>