Why Object of Arrays beat interleaved arrays: a JavaScript performance issue

(royalbhati.com)

46 points by howToTestFE26 days ago

10 comments

andersa20 days ago
I had a similar problem when I was making a tool processing a lot of data in the browser. I'd naively made a large array of identical objects each holding a bunch of fields with numbers.Turns out, this works completely fine in Firefox. However, in Chrome, it produces millions of individual HeapNumber allocations (why is that a thing??) in addition to the objects and uses GBs of RAM, and is slow to access, making the whole thing unusable.Replacing it with a SoA structure using TypedArray made it fast in both browsers and fixed the memory overhead in Chrome.As someone more familiar with systems programming than web, the concept of creating individual heap allocations for a single double baffles me beyond belief. What were they thinking?
- kg20 days ago
 Yeah, this is a historical design difference between Firefox's Spidermonkey JS engine and Chrome's V8.Spidermonkey uses (I'm simplifying here, there are cases where this isn't true) a trick where all values are 64-bits, and for anything that isn't a double-precision float they smuggle it inside of the bits of a NaN. This means that you can store a double, a float32, an int, or an object pointer all in a field of the same size. Great, but creates some problems and complications for asm.js/wasm because you can't rely on all the bits of a NaN surviving a trip through the JS engine.V8 instead allocates doubles on the heap. I forget the exact historical reason why they do this. IIRC they also do some fancy stuff with integers - if your integer is 31 bits or less it counts as a "smi" in that engine, or small int, and gets special performance treatment. So letting your integers get too big is also a performance trap, not just having double-precision numbers.EDIT: I found something just now that suggests Smis are now 32-bits instead of 31-bits in 64-bit builds of v8, so that's cool!
 - kannanvijayan19 days ago
 I suspect it's just circumstantial - two different design approaches. Both of the approaches have their advantages and disadvantages.IMHO the bigger issue with NaN-boxing is that on 64-bit systems it relies on the address space only needing <50 bits or so, as the discriminator is stored on the high bits. It's ok for now when virtual address spaces typically only need 48 bits of representation, but that's already starting to slip with newer systems.On the other hand, I love the fact that NaN-boxing basically lets you eliminate all heap allocations for doubles.I actually wrote a small article a while back on a hybrid approach called Ex-boxing (exponent boxing), which tries to get at the best of both worlds: decouple the boxing representation from virtual address significant bits, and also represent most (almost all) doubles that show up at runtime as immediates.<a href="https://medium.com/@kannanvijayan/exboxing-bridging-the-divide-between-tag-boxing-and-nan-boxing-07e39840e0ca" rel="nofollow">https://medium.com/@kannanvijayan/exboxing-bridging-the-divi...</a>
 - addaon19 days ago
 > IMHO the bigger issue with NaN-boxing is that on 64-bit systems it relies on the address space only needing <50 bits or so, as the discriminator is stored on the high bits.Is this right? You get 51 tag bits, of which you must use one to distinguish pointer-to-object from other uses of the tag bits (assuming Huffman-ish coding of tags). But objects are presumedly a minimum of 8-byte sized and aligned, and on most platforms I assume they'd be 16-byte sized and aligned, which means the low three (four) bits of the address are implicit, giving 53 (54) bit object addresses. This is quite a few years of runway...
 - kannanvijayan19 days ago
 There's a bit of time yes, but for an engine that relies on this format (e.g. spidermonkey), the assumptions associated with the value boxing format would have leaked into the codebase all over the place. It's the kind of thing that's far less painful to take care of when you don't need to do it than when you need to do it.But fair point on the aligned pointers - that would give you some free bits to keep using, but it gets ugly.You're right about the 51 bits - I always get mixed up about whether it's 12 bits of exponent, or the 12 includes the sign. Point is it puts some hard constraints on a pretty large number of high bits of a pointer being free, as opposed to an alignment requirement for low-bit tagging which will never run out of bits.
 - afiori19 days ago
 48/56 bits numbers would be useful in many circumstances
Koffiepoeder20 days ago
For those interested in browser differences, on my machine:Firefox<pre><code> AoS: 2951.00ms SoA: 1624.00ms Interleaved: 1961.00ms </code></pre> Chrome<pre><code> AoS: 2133.30ms SoA: 884.30ms Interleaved: 1457.60ms </code></pre> Seems the interleaved being slower is consistent across browsers!
gethly20 days ago
SoA is nothing new. Not sure why it needs to be even discussed...Anyway, here's few videos of interest:<a href="https://www.youtube.com/watch?v=WwkuAqObplU" rel="nofollow">https://www.youtube.com/watch?v=WwkuAqObplU</a><a href="https://www.youtube.com/watch?v=IroPQ150F6c" rel="nofollow">https://www.youtube.com/watch?v=IroPQ150F6c</a>Odin also supports SoA natively <a href="https://odin-lang.org/docs/overview/#soa-data-types" rel="nofollow">https://odin-lang.org/docs/overview/#soa-data-types</a>
saghm20 days ago
> Eliminating per-element object overhead — This is the biggest win (~5-6x)I feel like this phrasing might be easy to misinterpret. Without carefully considering the context, it seems like it's implying that objects have higher overhead than arrays, and my intuition is that this is true, but I'd argue that there's a potentially more relevant way of looking at things.In the "object of arrays" layout, you have one object and three arrays, but in the "array of objects" layout, you have one array and N objects, where N is the size of the array. Even if the overhead of objects was the same as can array, you'd be looking at more overhead as soon as you went past three elements. In fact, even if the overhead of an object was lower than the overhead of an array, you'd still reach more overhead with the "array of objects" layout if you have enough elements to make up for the difference. With a TypedArray (or an array of a single type that's optimized by the runtime in the way described fairly early on in the article), you're not looking at an extra level of indirection per element like you would with an object.I'd be curious to see what the results would be if they repeated the "array of objects" benchmark with an "array of arrays", where each element is an array of size 3. I could imagine them being quite similar, but I'm also not sure if there's even more nuance that I'm falling to account for (e.g. maybe the runtime would recognize that an array of N/3 elements each being an array of 3 numbers could be "flattened" to an underlying representation of array of size N in memory and perform the same optimizations.I think the meta lesson here may be that intuition about performance of arrays in JavaScript might be pretty tricky. At least in terms of external semantics, they're supposed to be roughly equivalent to objects with numbers as keys, but in practice there are probably enough optimizations being done (like the implicit recognition of an array of numbers as described in the article) that I suspect that intuition might be a bit naive in a lot of cases, and it's probably better to verify what's actually happening at runtime rather than trying to guess. (The meta meta lesson is that this is true for a lot of things in pretty much every language, and it's sometimes going to be necessary to verify your assumptions about what performance will be, but I think that's something easy to fail to do even when you're aware of it being an easy trap, so having some general things to look out for like arrays potentially not being intuitive can still be helpful).
- mr_toad20 days ago
 > I'd be curious to see what the results would be if they repeated the "array of objects" benchmark with an "array of arrays"Arrays are objects, so I’d be surprised if it made much difference.
 - saghm20 days ago
 Much of a difference from which, the array of objects or an object containing arrays? The article points out at least one major optimization that the runtime performs on arrays that doesn't (and as I understand it, can't) exist for objects. My point is that it's not obvious whether there are others, and if so, where they might apply.Pretty much the entire last paragraph of my comment that you responded to is an argument that it's potentially wrong to just naively assume "arrays are just objects". It's not clear to me why you're confident that this is wrong without giving any additional context that clarifies whether you've actually considered that possibility or not.
 - mr_toad20 days ago
 I mean if you replaced an array of one million objects with an array of one million arrays you’d probably end up with similar performance.The article is discussing how you get better performance from having arrays with one million primitives. It’s not at all surprising that this is faster.
 - saghm19 days ago
 > I mean if you replaced an array of one million objects with an array of one million arrays you'd probably end up with simila performance.Once again, my argument is that I think there's evidence against making assumptions like "you'd probably end up with similar performance" and that actually testing assumptions like this is worthwhile. I'm not sure how I could make this more clear at this point though, so I doubt it's worth it for me to try to spend more time understanding whether you don't understand what I'm suggesting or are just unwilling to explain why you disagree with it.
j1elo16 days ago
That Structure of Arrays performs much better than an Array of Structures is a very well known tidbit since long ago; what I'd love to read more is about how people design code around this fact.For bigger codebases where you might not even be the person (or team) who designed the data types, and are just given a class or struct that needs to be collected into a large array, it's not simple to just decompose structures into an array per property.In fact, this is a basic thing to want to do, but I've seen no language support for it ever. (in popular / industry frequently used langs). The `FancyList<Point>` idea from another comment is indeed interesting, but it would require reflection I guess.
kemayo19 days ago
A lot of words to say "it's quicker when you allocate three large arrays rather 1,000,000 objects"That feels sufficiently intuitive that describing it as "a JavaScript performance issue" is a bit confusing.(There's other optimizations they're applying, but that's the only one that really matters.)
gr4vityWall19 days ago
Running the benchmark from the article on my laptop (M4 Macbook Air) had a few interesting results:* when running the script with Node.js, the results are inline with the article (SoA is the fastest)* Bun is slower than Node.js with both SoA and AoS.* Bun has similar performance between SoA and AoS.* in Bun, Interleaved is the fastest one by a significant margin. This is consistent through runs.% bun bench.jsAoS: 924.54msSoA: 1148.57msInterleaved: 759.01msBun's performance profile seems very different from Firefox and V8-based runtimes there. I wonder how QuickJS would fare. The article didn't mention the CPU used either, the performance difference may be dependent on the architecture as well.
KolmogorovComp19 days ago
Are there any libs that take advantage of that?ie `FancyList<Point>` would internally create a list for every field of `Point` and reconstruct appropriately when indexing FancyList.
anematode20 days ago
> Sometimes even SIMD (eez nuts?)I'm a bit rusty here, does V8 actually do auto-vectorization of JavaScript code these days?
- adzm20 days ago
 I don't think v8 does any auto-vectorization unless it's been added since 2024 which was the last time I checked.
 - anematode20 days ago
 Thanks! That tracks; ~a year ago I tried fruitlessly to get some numerical stuff to vectorize in Node.js... gave up and switched to WASM SIMD.
jonny_eh20 days ago
> This test is a manufactured problem, a silly premise, false test cases and honestly dishonest if not ignorantIt’s amazing how vitriolicly wrong people can be. Before publicly criticizing someone in the above way, prove them wrong first. Don’t just assume they’re wrong.
- cheevly20 days ago
 But only llms hallucinate!