Hash tables in Go and advantage of self-hosted compilers

(rushter.com)

62 points by f311a55 days ago

9 comments

pdpi49 days ago
It's worth noting that the "self-hosted compiler" thing here is a red herring.E.g. the JVM is a C++ project, but you can easily read the HashMap implementation, because it's part of the standard library, not part of the runtime.
- f311a48 days ago
 In Go, hashmap implementation is pretty low level. Even linker carries some parts of its implementation details.
 - pdpi48 days ago
 Huh, that's pretty cool. Do you happen to have any links pointing at any of that stuff?
 - f311a48 days ago
 <a href="https://github.com/golang/go/blob/7ecb1f36acab7b48d77991d58d456a34074a2d0e/src/cmd/link/internal/ld/deadcode.go#L562" rel="nofollow">https://github.com/golang/go/blob/7ecb1f36acab7b48d77991d58d...</a>Here linker relies on the current memory layout. Changing it requires updating linker code as well.
- vips7L49 days ago
 FWIW javac is self-hosted.
 - pdpi49 days ago
 Right, but the point stands — javac could've been implemented in whatever language you like, it doesn't have to be the same language as the JVM itself, and neither of those affect the fact that HashMap is a plain old java class, not a builtin type.
j1elo49 days ago
Interesting how the Go team is the utmost example of thinking through and bikeshedding ad infinitum even the tiniest angles of each proposal (something that I like a lot by the way), which is part of the reason that popular feature requests take years to come, and others such as the `Set` type are binned because of not providing enough added value.But an implementation change that will for sure baloon the memory usage of everybody's code making heavy use of Hashmap-as-set (a popular idiom)? Yeah no problem, change shipped.
- voidfunc49 days ago
 The Go team has a lot of old school nerd cred thats why it gets away with a lot of stupid shit. Then a fan base of nerd hero worshippers beat down any discussion about doing things a better way with: SIMPLICITY.Its frustrating and I say this as someone who has been writing Go for around a decade.
 - yomismoaqui49 days ago
 Go is the worst programming language except for all those others that I have tried from time to time
- avianlyric49 days ago
 There’s a big difference between a change which modifies the languages API, and one that just modifies the implementation of the API.Given GoLangs compatibility guarantee, any mistake in the design of a language API has to be preserved forever, and is very difficult to improve.But implementations of the GoLang spec and language APIs are much easier to evolve. There’s nothing preventing the Go team rolling out future improvements to deal with this issue, without having to worry about long term consequences. There’s also nothing preventing other implementations of the GoLang spec choosing a different approach.
 - Someone49 days ago
 I think all language definitions have lots of implicit non-functional requirements.When the main implementation of a language changes one of them, that technically isn’t a breaking change, but it still is one, as existing programs mays have to be changed in order to keep satisfying their own non-functional requirements.
- 9rx49 days ago
 It's called marketing. If Go quietly made something perfect, nobody would know of its existence. Do stupid things that gets people talking and everyone soon learns about you.
ncruces49 days ago
Issue tracking this “regression”: <a href="https://github.com/golang/go/issues/71368" rel="nofollow">https://github.com/golang/go/issues/71368</a>
cabirum49 days ago
> Using empty structs also hurts readabilityAn empty struct is idiomatic and expected to be used in a Set type. When/if the memory optimization is reintroduced, no code change will be needed to take advantage of it.
- tym049 days ago
 Using a bool instead of empty struct also means that there is more way to use it wrong: check the bool instead of if the key exist, set the bool incorrectly, etc...I would argue using bool hurts readability more.Even better write/use a simple library that calls things that are sets `Set`.
 - Joker_vD49 days ago
 I could've sworn we got "sets" in the Go's standard library along with the "maps" module but... apparently not? Huh.
 - kevindamm49 days ago
 Almost made it into 1.18 but looks like it doesn't add enough value and has some open questions like what to use for a backing data type and what complexity promises to make.<a href="https://github.com/golang/go/discussions/47331" rel="nofollow">https://github.com/golang/go/discussions/47331</a>
 - vips7L49 days ago
 Honestly insane in 2025 to not have a generic Set.
 - frou_dh49 days ago
 e.g. <a href="https://pkg.go.dev/github.com/zyedidia/generic/mapset" rel="nofollow">https://pkg.go.dev/github.com/zyedidia/generic/mapset</a>
- ioanaci49 days ago
 I also feel like map[T]struct{} communicates its purpose way better than map[T]bool. When I see a bool I expect it to represent a bit of information, I don't see why using it as a placeholder for "nothing" would be more readable than a type that can literally store nothing.
- rplnt49 days ago
 Isn't it empty interface that's idiomatic? Or was anyway?edit: I may be wrong here
nickcw49 days ago
I wonder if the compiler really needs to allocate 1 byte so you can get the address of the struct {}In the general case then yes, but here you can't take addresses of dictionary values (the compiler won't let you) so adding 1 byte to make a unique pointer for the struct {} shouldn't be necessary.Unless it is used in the implementation of the map I suppose.So I conjecture a bit of internal magic could fix this.
- occamrazor49 days ago
 I’m curious, what was the rationale for forbidding it?
 - kbolino49 days ago
 I interpret this as asking "why can't you get the address of a value in a map?"There are two reasons, and we could also ask "why can't you get the address of a key in a map?"The first reason is flexibility in implementation. Maps are fairly opaque, their implementation details are some of the least exposed in the language (see also: channels), and this is done on purpose to discourage users of the language from mucking with the internals and thus making it harder for the developers of the language to change them. Denying access to internal pointers makes it a lot easier to change the implementation of a map.The second reason is that most ways of implementing a map move the value around copiously. Supposing you could get a pointer p := &m[k] for some map m and key k, what would it even point to? Just the value position of a slot in a hash table. If you do delete(m, k) now what does it point to? If you assign m[k2] but hash(k2) == hash(k) and the map handles the collision by picking a new slot for k, now what does it point to? And eventually you may assign so many keys that the old hash table is too small and so a new one somewhere else in memory has to be allocated, leaving the pointer dangling.While the above also apply to pointers-to-keys, there is another reason you can't get one of those: if you mutated the key, you would (with high probability) violate the core invariant of a hash table, namely that the slot for an entry is determined exactly by the hash of its key. The exact consequences of violating this would depend on the specific implementation, but they are mostly quite bad.For comparison, Rust, with its strong control over mutability and lifetimes, can give you safe references to the entries of a HashMap in a way Go cannot.
 - hiddendoom4549 days ago
 I was burnt by the mutability of keys in go maps a few months ago, I'm not sure exactly how go handles it internally but it ended up with the map growing and duplicate keys in the key list when looking at it with a debugger.The footgun was that url.QueryUnescape returned a slice of the original string if nothing needed to be escaped so if the original string was modified, it would modify the key in the map if you put the returned slice directly into the map.
 - kbolino49 days ago
 This sounds like a bug, whether it be in your code, the map implementation, or even the debugger. Map keys are not mutable, and neither are strings.
 hiddendoom4549 days ago
 This shouldn't be a race condition, reads were done by taking a RLock() from a mutex in a struct with the map, and defer RUnlock(), writes were similar where a Lock() was taken on the same mutex with a defer Unlock(). All these functions did was get/set values in the map and operated on a struct with just a mutex and the map. Unless I have a fundamental misunderstanding of how to use mutexes to avoid race conditions this shouldn't have been the case. This also feels a lot like a llm response with the Hypotheses section.edit: this part below was originally a part of the comment I'm replying toHypotheses: you were modifying the map in another goroutine (do not share maps between goroutines unless they all treat it as read-only), the map implementation had some short-circuit logic for strings which was broken (file a bug report/it's probably already fixed), the debugger paused execution at an unsafe location (e.g. in the middle of non-user code), or the debugger incorrectly interpreted the contents of the map.
 - arccy49 days ago
 That just means fiber is a bad library that abuses unsafe, resulting in real bugs.
 - ncruces49 days ago
 Just how are you modifying strings? Cause that's your bug to fix.
 hiddendoom4549 days ago
 That was probably done by fiber[1] the code specifically took the param from it in the function passed to the Get(path string, handlers ...Handler) Router function. c is the *fiber.Ctx passed by fiber to the handler. My code took the string from c.Param("name") passed it to url.QueryUnescape then another function which had a mutex around setting the key/value in the map. I got the hint it was slices and something modifying the keys when I found truncated keys in the key list.My guess is fiber used the same string for the param to avoid allocations. The fix for it is just to create a copy of the string with strings.Clone() to ensure it does not get mutated when it is used as a key. I understand it was an issue with my code, it just wasn't something I expected to be the case so it took several hours and using the debugger to find the root cause. Probably didn't help that a lot of the code was generated by Grok-4-Code/Sonic as a vibe coding test when I decided to go back a few months later and try and fix some of the issues I had myself.[1] <a href="https://github.com/gofiber/fiber" rel="nofollow">https://github.com/gofiber/fiber</a>
 ncruces48 days ago
 Go strings are supposed to be immutable.I see that fiber goes behind your back and produces potentially mutable strings behind your back: <a href="https://github.com/gofiber/utils/blob/c338034/convert.go#L18" rel="nofollow">https://github.com/gofiber/utils/blob/c338034/convert.go#L18</a>And… I actually don't have an issue with it to be honest. I've done the same myself.But this mutability should never escape. I'd never persist in using a library that would let it escape. But apparently… it's intentional: <a href="https://github.com/gofiber/fiber/issues/185" rel="nofollow">https://github.com/gofiber/fiber/issues/185</a>Oh well. You get what you ask for. Please don't complain about maps if you're using a broken library.
Hendrikto49 days ago
> Another takeaway here, as always, is not to trust everything LLMs say.I would go even farther and say to not trust anything they say. Always be skeptical, always verify.
- nasretdinov49 days ago
 Applies to humans as well :)
 - rplnt49 days ago
 Not at all. With human you can have some expectations based on context, expertise. They are also far less likely to make up extremely specific details.
 - nasretdinov49 days ago
 Sure. I was agreeing with the conclusion though, where you should aim to verify what you hear from other humans, no matter how confident they sound. Been burned by that a few times by blindly trusting some statements from some respected people only for it to blow up in production because they were wrong :).
 - lenkite49 days ago
 There are many humans who are far more reliable than LLM's on a 99.9999% win streak.
 - rat998849 days ago
 Yes, now generalize the theorem to any human to make it usable on a daily basis.
 - 9rx49 days ago
 That is, strangely, until those humans turn to a topic I know something about. Then their reliability drops like a hot potato. At least they get everything else right!
yosefk49 days ago
Rust HashSets are HashMaps with an empty type as the value type, but the compiler actually optimizes away the storage for the keys based on the type being empty. Go doesn't bother to either define a set type like most languages do, or to optimize the map implementation with an empty type as the value type
gethly49 days ago
Empty struct is good for representing non-nil zero-length information, for example this is ideal for many use cases where channels are involved. Or of you have a http route and you want to return empty response(200 OK or 204 No Content, instead of error).Boolean on the other hand inherently contains two information: either true or false. ie. there will always be information and it will always be one of two values.This is similar to *struct{} where we can signal no information, or false, by returning/passing nil or initiated pointer to empty struct as true/value present.For maps, bool makes more sense as otherwise we just want a list with fast access to determine whether value in the list exists or not. Which is often something we might want. But it should not detract form the fact that each type has its own place and just because new implementation for maps ignores this, in this particular use, case does not make them worse than previous version.tl;dr it is good to know this fact about the new swiss maps, but it should not have any impact on programming an design decisions whatsoever.
andunie49 days ago
So what is this article about?1. How to do sets in Go?2. What changed between Go 1.24 and 1.25?3. Trusting an LLM?4. Self-hosted compilers?It is not clear at all. Also there are no conclusions, it's purely a waste of time, basically the story of a guy figuring out for no reason that the way maps are implemented has changed in Go.And the title is about self-hosted compilers, whose "advantage" turned out to be just that the guy was able to read the code? How is that an advantage? I guess it is an advantage for him.The TypeScript compiler is also written in Go instead of in TypeScript. So this shouldn't be an advantage? But this guy likes to read Go, so it would also be an advantage to him.
- bxparks49 days ago
 I agree that the article is a bit unfocused about the supporting material. But the primary topic is clear: it's about the memory consumption of the Go map implementation.This is an article written by a real human person, who's going to meander a bit. I prefer that over an LLM article which is 100% focused, 100% confident, and 100% wrong. Let's give the human person a little bit of slack.
- gethly49 days ago
 I think it is quite obvious - the author has found out that a memory trick that used to work in previous Go versions no longer works - in this sigular use case.