GCC alread has this for x64, I thought. <a href="https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html" rel="nofollow">https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html</a><p>RISC-V has no carry bit and this whole thing becomes awkward.<p>I am under the impression that boost::multiprecision has specialized templates for 128 and 256 bit math, but maybe I'm wrong. In practice when I've wanted extended precision, I've just used GMP or a language with bignums.<p>I would expect the best x86 machine code for many 128 bit operations would use XMM instructions, no? But I haven't investigated.
Question for those smarter than me: What is an application for an int128 type anyways? I've never personally needed it, and I laughed at RISC-V for emphasizing that early on rather than... standardizing packed SIMD.
<p><pre><code> int64_t a, b, c, r;
r = (a * b) / c; /* multiplication step could overflow so use 128bits */</code></pre>
Cryptography would be one application. Many crypto libraries use an arbitrary size `bigint` type, but the algorithms typically use modular arithmetic on some fixed width types (128-bit, 256-bit, 512-bit, or some in-between like 384-bits).<p>They're typically implemented with arrays of 64-bit or 32-bit unsigned integers, but if 128-bits were available in hardware, we could get a performance boost. Any arbitrary precision integer library would benefit from 128-bit hardware integers.
Any application which uses arithmetic on 64bit ints, because most operations can overflow. And most libs/compilers don't check for overflows.
I implemented a rational number library for media timestamps (think CMTime, AVRational, etc.) that uses 64-bit numerators and denominators. It uses 128-bit integers for intermediate operations when adding, subtracting, multiplying, etc. It even uses 128-bit floats (represented as 2 doubles and using double-double arithmetic[1]) for some approximation operations and even 192-bit integers in one spot (IIRC it's multiplying a 128-bit and 64-bit ints and I just want the high bits so it shifts back down to 128 bits immediately after the multiplication).<p>I keep meaning to see if work will let me open source it.<p>[1]: <a href="https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format#Double-double_arithmetic" rel="nofollow">https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...</a>
The last time I used one I wanted UNIX timestamps + fractional seconds. Since there was no difference between adding 1 bit or 64, I just gave it 32 bits for the fraction and 32 more bits for the integral part.
I made a time sync library over local network that had to be more precise than NTP and used i128 to make sure the i64 math I was doing couldn't overflow.<p>I32 didn't cover enough time span and f64 has edge cases from the nature of floats. This was for Windows (MACC not GCC) so I had to roll out my own i128.
It's used fairly frequently (e.g. in turning 64 bit division into multiplication and shifts).
It's an opaque way to hold a GUID or an IP6 address
Intersection calculations from computational geometry. Intersection calculations generally require about 2*n+log2(n) bits.<p>If you like your CAD accurate, you have to operate in integer space.
I am so happy that MSVC added 128 bit integers to their standard library in order to do ranges distance of uint64_t iota views. One type alias away from int128's on most machines running gcc/clang/msvc
I understand why a non-standard compiler-specific implementation of int128 was not used (Besides being compiler specific, the point of the article is to walk through an implementation of it), but why use<p>> using u64 = unsigned long long;<p>? Although in practice, this is _usually_ an unsigned 64 bit integer, the C++ Standard does not technically guarantee this, all it says is that the type need to be _at least_ 64 bits. [0]<p>I would use std::uint64_t which guarantees a type of that size, provided it is supported. [1]<p>Re: Multiplication: regrouping our u64 digits<p>I am aware more advanced and faster algorithms exist, but I wonder if something simple like Karatsuba's Algorithm [2] which uses 3 multiplications instead of 4, could be a quick win for performance over the naive method used in the article. Though since it was mentioned that the compiler-specific unsigned 128 integers more closely resembles the ones created in the article, I suppose there must be a reason for that method to be used instead, or something I missed that makes this method unsuitable here.<p>Speaking of which, I would be interested to see how all these operations fair against compiler-specific implementations (as well as the comparisons between different compilers). [3]. The article only briefly mentioned their multiplication method is similar for the builtin `__uint128_t` [4], but did not go into detail or mention similarities/differences with their implementation of the other arithmetic operations.<p>[0] <a href="https://en.cppreference.com/w/cpp/language/types.html" rel="nofollow">https://en.cppreference.com/w/cpp/language/types.html</a> The official standard needs to be purchased, which is why I did not reference that. But it should be under the section basic.fundamental<p>[1] <a href="https://en.cppreference.com/w/cpp/types/integer.html" rel="nofollow">https://en.cppreference.com/w/cpp/types/integer.html</a><p>[2] <a href="https://en.wikipedia.org/wiki/Karatsuba_algorithm" rel="nofollow">https://en.wikipedia.org/wiki/Karatsuba_algorithm</a><p>[3] I suppose I could see for myself using godbolt, but I would like to see some commentary/discussion on this.<p>[4] And did not state for which compiler, though by context, I suppose it would be MSVC?
> I would use std::uint64_t which guarantees a type of that size, provided it is supported.<p>The comment on the typedef points out that the signature of intrinsics uses `unsigned long long`, though he incorrectly states that `uint64_t` is `unsigned long` - which isn't true, as long is only guaranteed to be at least 32-bits and at least as large as `int`. In ILP64 and LLP64 for example, `long` is only 32-bits.<p>I don't think this really matters anyway. `long long` is 64-bits on pretty much everything that matters, and he is using architecture-specific intrinsics in the code so it is not going to be portable anyway.<p>If some future arch had 128-bit hardware integers and a data model where `long long` is 128-bits, we wouldn't need this code at all, as we would just use the hardware support for 128-bits.<p>But I agree that `uint64_t` is the correct type to use for the definition of `u128`, if we wanted to guarantee it occupies the same storage. The width-specific intrinsics should also use this type.<p>> I would be interested to see how all these operations fair against compiler-specific implementations<p>There's a godbolt link at the top of the article which has the comparison. The resulting assembly is basically equivalent to the built-in support.
Since they don't calculate the upper 128-bits of the product, they use only 3 multiplications anyway.
Makes me think of the bad old days where the platform gave you 8-bit ints and you built everything else yourself... or AVR-8.
I guess modern compilers (meaning anything Arduino era and up, at least when I first got into them maybe mid 2010s) abstract that away, because while true that it's doing that under the hood we at least don't have to worry about it.
Tangential. A long time ago at a company far far away, this is how we did UUIDs that made up a TenantId and a UserId, using this exact same logic, minus the arithmetic. Great stuff.<p>(We wanted something UUID like but deterministic that we could easily decompose and do RBAC with, this was prior to the invention of JWT’s, OAuth, and scopes, worked at the time).
> On division: There is no neat codegen for division.<p>Wait, what? I'm fairly certain that you can do a 128-bit by 128-bit division using a x64's 128-bit by 64-bit division instruction that gives you only 64-bit quotient and remainder. The trick is to pre-multiply both dividend and divisor by a large enough power of 2 so that the "partial" quotient and remainders that the hardware instruction would need to produce will fit into 64 bits. On the whole, IIRC you need either 1 or 2 division instructions, depending on how large the divisor is (if it's too small, you need two divisions).
<i>> we use 256-bit integers in our hot paths and go up to 564 bits for certain edge cases.</i><p>Why 564 bits? That’s 70.5 bytes.
Maybe it's a typo for 512. I'm not even sure how you would achieve 564 in this context.
It was a nice, round number.