> Consider, for instance, bitcasting a [2]u8 to a u16. Under the old semantics, the result of this operation depends on the target endian: on big-endian targets, the first array element became the 8 most significant bits, whereas on little-endian targets, the first array element became the 8 least significant bits. Under the new semantics, because we only care about logical bit representation (which is endian-agnostic), the operation behaves identically on every target:<p>This is a huge mistake. You would never expect something like bitCast to do this.<p>I don't understand this approach. Why change something so simple and low level to be complicated and high level?<p>Just don't allow casting to u24, as it makes no sense unless you define u24 to be u32 sized as I think c standard does.<p>I think this approach as an idea is bad but at least just add another built-in that implements this higher level idea to not break a simple expectation and current behavior?
> Just don't allow casting to u24, as it makes no sense unless you define u24 to be u32 sized as I think c standard does.<p>The reason u32->u24 casting must be well defined is because some hardware (e.g. many GPUs, microcontrollers) <i>only</i> have floating point multipliers. A 24 bit unsigned integer (stored in a 32 bit register) can be losslessly converted to a 32 bit float by the hardware, multiplied, then converted back.<p>This is much faster than doing 32 bit multiplication in software, however, you still need to tell the compiler about this constraint.
I am criticizing the part where they allowed [3]u8 to u24 bitCast in the first place. It doesn't make sense logically as u24 is likely not 24 bits in any targets let alone portably on every target.<p>Interpreting u24 like it is actually 24 bits sounds like programming in crazy land since it is not 24 bits in any relevant architecture afaik.<p>They didn't allow []u24 with a similar rationale as far as I can remember. I agree with this as someone programming at this level should be able to understand there is no real u24 layout and they should use []u32. Going with the same magical rational they went with here, compiler should generate unaligned u24 loading code when you use []u24 since it is "logically 24 bits"
The ease of dealing with arbitrary bit-width integers and packed structs is actually one of the 'killer features' for me in zig.<p>Zig natively supports arbitrary bit-width integers, the ABI is defined and you <i>could</i> simply think it as a slice of the next larger backing integer.<p>The[3]u8 to u24 bitCast will simply be backed by a 32bit int, using the same ABI. As you have u1 - u65535, sometimes it can be multiple words.<p>The 24 Bits (3 Bytes) [3]u8 to u24 example is <i>exactly</i> related to utf-8 that covers all the languages but excludes the emojis.<p>There are very valid use cases when you want to limit utf-8 to U+0000-U+FFFF, and it is valuable if your language allows you to make those decisions.<p>Remember, in zig packed structs are just integers and integers are just a group of logically consecutive bits.<p>Arrays like []u24 do not have the same ABI, arrays are not bit/byte packed, are not universally LSB across archs etc..<p>The compiler isn't producing unaligned code, don't confuse the abstraction with the concrete implementation. And yes [8]u1 and [8]u8 are exactly the same size and shape, even though they are arrays.<p>My current project is parsing ELF/Macho files, I can easily have zero allocations in my hot path with zig, the same is far more challenging in C, so I am biased, especially with zig allowing methods on structs.<p>And yes, I do use that crazy casting to 0xdeadbeef and other ascii metadata that is in those files.<p>To be clear here, I am not trying to prove you wrong, this is one of the places zig is very different and (IMHO) useful. Especially with streaming data or where you have network ordering etc... It is so nice to only cast what you need to but it does take a little while to wrap your head around how this interacts with buffers which are not your native endianness. At least for me, once I figured out to separate the shape of those data streams from their values it was super useful.
> many GPUs<p>Citation please - every single GPU in the literal world supports integer arithmetic for operating on tid, gid, etc.
GCC has had __int24 for the AVR backend for some time. Useful for larger integers than int16_t while saving 25% over a 32-bit value. C23 does not mandate padding for _BitInt types. It is wrong to assume that will happen or is the optimal implementation for portable code.
Thanks for the context, but what I am criticising is this part:<p>> it became allowed to use @bitCast to reinterpret a [3]u8 as a u24<p>This cant't make sense unless u24 is defined to be 24bits in the first place. It is just silly to allow something like this. It would make so much more sense to me if they started disallowing this or just even print a deprecation notice for it for one release version.<p>> Useful for larger integers than int16_t while saving 25% over a 32-bit value<p>You can't even do []u24 in zig as far as I can remember and understand anyway so this is only happening in a packed struct context.<p>C doesn't mandate padding but C compilers allow having pointers and arrays of irregular _BitInt types as far as I can understand.<p>In this [1] document, in Abi considerations section, it writes that it is defined to have next-power-of-two layout size.<p>Also here (for RISCV) [2] it seems like it is defined with next-power-of-two layout.<p>Also the document here (for x86_64) defines it similarly [3]<p>[1] <a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf" rel="nofollow">https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2709.pdf</a><p>[2] <a href="https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/300" rel="nofollow">https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/...</a><p>[3] <a href="https://gitlab.com/x86-psABIs/x86-64-ABI/-/tree/master?ref_type=heads" rel="nofollow">https://gitlab.com/x86-psABIs/x86-64-ABI/-/tree/master?ref_t...</a>
> This cant't make sense unless u24 is defined to be 24bits in the first place<p>It's worth remembering that zig is a ~hll that should be platform agnostic. suppose someone built a byte-chip with a 24 bit word. the "new" zig way of doing things will be more portable and slot right in, <i>and support 32- and 16-bit datatypes just fine</i>.
> This is a huge mistake. You would never expect something like bitCast to do this.<p>Is there at least some sort of @transmute or something ? If Zig wants to say "bitCast" means this odd operation, but provides the thing most people actually want under some plausible name that's just an extra thing to learn which seems OK.
@intCast
So, since I don't write Zig I had to go look this up, to save anyone else the bother this is what Rust would call an 'as' cast or C programmers might think of as a value cast, it's going to try to make a value which has a similar meaning but of another type, which may be arbitrarily expensive. What people often want here is a transmute, Rust's core::mem::transmute which changes nothing about the bits except what those bits mean, since the bits didn't change and the machine only has bits anyway this is "free".
If I understand it correctly, it basically boils down to copying bits from the source to the destination, in order from the least significant bit to the most significant bit. It's not equivalent to C++'s reinterpret_cast.<p>I'm no Zig expert, but if you want endian-dependent semantics I'd assume either @ptrCast or a packed union would do the job.
But doesn't that show why this is a bad idea? If I understand correctly, this code:<p><pre><code> const MyUnion = packed union {
full: u16,
bytes: [2]u8,
};
const value: u16 = 0x55aa;
const in_union: MyUnion = @bitCast(value);
const without_union: [2]u8 = @bitCast(value);
std.debug.assert(without_union[0] == in_union.bytes[0]);
std.debug.assert(without_union[1] == in_union.bytes[1]);
</code></pre>
...will now succeed or fail depending on the endianness of the target. That looks like the type of footgun that will bring decades of joy.
zig does not allow arrays in packed structs/unions specifically for endianness reasons (there may be other reasons as well but endianness is what i know of)
Ah, that is useful to know. Is that documented somewhere? From what I can quickly find in the obvious place [0], the only requirement is that "all fields in a packed union must have the same @bitSizeOf" and [2]u8 does satisfy that requirement.<p>[0] <a href="https://ziglang.org/documentation/0.16.0/#packed-union" rel="nofollow">https://ziglang.org/documentation/0.16.0/#packed-union</a>
I wonder if packed union also got/will get the same "logical bits" treatment?
You don't need to use @bitCast for the behavior you're talking about. @ptrCast still exists.
@ptrCast,<p>> Converts a pointer of one type to a pointer of another type. [1]<p>[1] <a href="https://ziglang.org/documentation/master/#toc-ptrCast" rel="nofollow">https://ziglang.org/documentation/master/#toc-ptrCast</a><p>So it is not the same.<p>You could use it to define a function that implements bitCast. Which defeats the purpose of having any @bitCast intrinsic instead of using @mempcy for everything
Take the address and deref afterwards, and it's exactly the same. Or to say another way: if you want bits to be reinterpreted raw as if they're in memory, then... put them in memory, then reinterpret them.<p>> You could use it to define a function that implements bitCast. Which defeats the purpose of having any @bitCast intrinsic<p>Yes, and this is one reason @bitCast was changed to have different semantics that are not trivially achieved with @ptrCast.
> Take the address and deref afterwards, and it's exactly the same.<p>It is significantly worse to take address and deref afterwards.<p>You have to do something like:<p>@as(<i>const u32, @ptrCast(&x)).</i><p>instead of just<p>@bitCast(x)<p>> Yes, and this is one reason @bitCast was changed to have different semantics that are not trivially achieved with @ptrCast.<p>This makes sense except breaking existing code that properly handled endianness by doing a conditional @byteSwap. And what you end up with is a more complicated intrinsic compared to something that reinterprets values with same layout size
> This makes sense except breaking existing code<p>Before Zig hits 1.0, users should expect language changes. Has anyone claimed otherwise?<p>If you need the old thing often enough, you can write a wrapper for it. It's a trivial one-liner, as you've shown.
Your example is incorrect. @ptrCast has the same (similar, if you want to be pedantic to the exclusion of good faith) result rules. If you need @as to @ptrcast, you'd need it to @bitCast as well.<p>> It is significantly worse to take address and deref afterwards.<p>How are you measuring worse? Because my understanding from the article is that's exactly the behavior @bitCast used to have. So, instead of worse, it'd be exactly the same?<p>If you mean it's simply more things that you have to type... You're describing a core language feature as "worse". For all the builtins, some of them can help the compiler emit better code, but <i>can</i> for some doesn't mean will for all. As an example<p><pre><code> const thing: f64 = @floatFromInt(int_ish);
const result = thing + other_float;
return @intFromFloat(result);
</code></pre>
Could zig auto convert between these types? Yes, absolutely. But it doesn't as a design decision. On some arch, converting between float and int can be very expensive. A competent engineer will ensure they're type converting in a reasonable order. Zig requires this painfully verbose syntax it order to make it painful. Are there times where it's is actually the only reasonable option? yes, but even if there wasn't it'd still need to exist because I'm not rewriting my whole program to avoid a single float conversion. But because it's a bit painful, I will rewrite this one function to make it less painful.<p>And, yes having already made that exact mistake... I now write better code from the start because there's no way I'm gonna ruin all my beautiful code with a bunch of ugly, annoying, hard to read, casts.<p>I used to complain about unused variable errors, unhandled enum branch, var unmodified (hint: use const) errors, hell even result ignored or error ignored when I'm trying to test some unrelated single line of code. But now that I'm used to them, I emit better code without thinking. It's made me a better programmer. Is it annoying? abso-fucking-lutely but I'm better now than I used to be, so: worth it; and: thankyou sir can I have another. :D
To me it makes sense. If you don't know what endianness is, it doesn't make sense that a program you write in one programming language works for one target but doesn't work for the other.<p>I think endianness is the footgun that Zig is solving, rather than Zig being the one introducing a footgun when you deal with endianness.
I understand the reaction, but I don't agree. I suggest reading the associated proposal[0] along with the devlog, and having a real think about what's going on here. I'm responding to you saying that you "don't understand" the approach: reasonable, and resembles my initial reaction.<p>I was inclined to agree with you, but what decided it for me is that Zig has another mechanism for "reinterpret bytes". It's exposed on the stdlib as std.mem.asBytes, but this is literally a wrapper for the following:<p><pre><code> @ptrCast(@alignCast(ptr));
</code></pre>
So nothing is lost here: if you need, for whatever reason (and those do exist), to get a raw array of underlying bytes, you absolutely may. Std.mem also has bytesToValue(T, bytes) T, which makes a copy. All the ingredients are there, and this family of mem functions are thin wrappers over builtins, which boil down to pointer casting, dereferencing, and comptime magic.<p>Also worth noting: packed structs in Zig are already defined as logically little-endian: the first field is of low significance, the second is above that, and so on. So this makes `@bitCast` consistent with an existing convention of treating integers as logically little-ended, without regard to how they're actually arrayed in memory.<p>Plus it stands to make low-level bit-twiddling, using oddly-sized integers, optimize better. I like that, especially when what we trade for that is: nothing. Nothing at all, this is a pure win.<p>I'd even guess it's that rare language update which silently fixes buggy code, where someone figured "well, basically everything is little-endian already" (or just didn't think about it), and now that code works properly on big-endian machines.<p>[0]: <a href="https://github.com/ziglang/zig/issues/19755" rel="nofollow">https://github.com/ziglang/zig/issues/19755</a>