The gold standard of optimization: A look under the hood of RollerCoaster Tycoon

(larstofus.com)

518 points by mariuz22 hours ago

36 comments

netcoyote15 hours ago
Warcraft 1 (1994), Warcraft 2 (1995), and StarCraft (1998) all use power-of-2 aligned map sizes (64 blocks, 128 blocks, and 256 blocks) so the shift-factor could be pre-computed to avoid division/multiplication, which was dang slow on those old 386/486 computers.Each map block was 2x2 cells, and each cell, 8x8 pixels. Made rendering background cells and fog-of-war overlays very straightforward assembly language.All of Warcraft/etc. had only a few thousand lines of assembly language to render maps/sprites/fonts/fog-of-war into the offscreen buffer, and to blit from the offscreen buffer to the screen.The rest of the code didn't need to be in assembly, which is too time-consuming to write for code where the performance doesn't matter. Everything else was written in portable assembler, by which I mean C.Edit:By way of comparison, Blackthorne for Super Nintendo was all 85816 assembly. The Genesis version (Motorola 68000) and DOS version (Intel 80386) were manually transcribed into their respective assembly languages.The PC version of Blackthorne also had a lot of custom assembler macros to generate 100K of rendering code to do pixel-scrollable chunky-planar VGA mode X (written by Bryan Waters - <a href="https://www.mobygames.com/person/5641/bryan-waters/" rel="nofollow">https://www.mobygames.com/person/5641/bryan-waters/</a>).At Blizzard we learned from working on those console app ports that writing assembly code takes too much programmer time.Edit 2:I recall that Comanche: Maximum Overkill (1992, a voxel-based helicopter simulator) was written in all assembly in DOS real mode. A huge technical feat, but so much work to port to protected mode that I think they switched to polygon-rendering for later versions.
- CursedSilicon13 hours ago
 It's a shame that when a Redditor discovered the source code for the original StarCraft "gold master" on a CD, they sent it back to Blizzard in exchange for some fucking blizzard merch [1]EA a while back released the source code to (most) of the old Command & Conquer games [2] though interestingly left out Tiberian Sun and Red Alert 2, StarCraft's closest competitors at the time.Would've been nice for historical preservation to be able to peek behind the curtain and see StarCraft's code in a similar fashion[1] <a href="https://old.reddit.com/r/gamecollecting/comments/68xzxt/starcraft_gold_master_source_code_update/" rel="nofollow">https://old.reddit.com/r/gamecollecting/comments/68xzxt/star...</a>[2] <a href="https://github.com/electronicarts" rel="nofollow">https://github.com/electronicarts</a>
 - Blikkentrekker3 hours ago
 That person obviously did not want to be at risk for legal issues from Blizzard by publhsing it though. I personally wouldn't take that risk either.
 - imtringued8 hours ago
 This entire reddit thread aged really poorly now that Blizzard is a shell of its former self. If anything, the attitude in that thread is what paved Blizzard's decline: complete disrespect for its origin.The StarCraft source code is something that must be kept behind closed walls, under tight control by Blizzard, even though the original people working on the game at Blizzard have already left and there is nothing to protect here other than eternal shame.
- Lorin13 hours ago
 If you worked on Lost Vikings I'd like to thank you for the entertainment during my childhood. Given your background did you ever get involved in the demo scene?
- wjnc9 hours ago
 Both Comanche and Settlers 1 were so magic to me as a kid. You learned to work with DOS in text mode. Most shiny on the PC was Wordperfect. And suddely your text computer was capable of displaying graphics and ... games. Hooked me for life.
- ListenLinda13 hours ago
 Were you at blizzard when they lost their source code server and had no backups? I was there for a short time consulting around the time WC3 was released.
- quietsegfault15 hours ago
 Maximum overkill was an amazing game. I probably played hundreds and hundreds of hours.
ConceptJunkie56 minutes ago
> The same trick can also be used for the other direction to save a division:> NewValue = OldValue >> 3; > This is basically the same as> NewValue = OldValue / 8;> RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you.The author loses a lot of credibility by suggesting the compiler won't replace multiplying or dividing by a factor of 2 with the equivalent bit shift. That's a trivial optimization that's always been done. I'm sure compilers were doing that in the 70s.
applfanboysbgon20 hours ago
> Imagine a programmer asking a game designer if they could change their formula to use an 8 instead of a 9.5 because it is a number that the CPU prefers to calculate with. There is a very good argument to be made that a game designer should never have to worry about the runtime performance characteristics of binary arithmetic in their life, that’s a fate reserved for programmersNumeric characteristics are absolutely still a consideration for game designers even in 2026, one that influences what numbers they use in their game designs. The good ones, anyways. There are, of course, also countless bad developers/designers who ignore these things these days, but not because it is free to do so; rather, because they don't know better, and in many cases it is one of many silent contributing factors to a noticeable decrease in the quality of their game.
- cogman1017 hours ago
 > Numeric characteristics are absolutely still a consideration for game designers even in 2026, one that influences what numbers they use in their game designs. The good ones, anyways.I used to think like this, not anymore.What convinced me that these sort of micro-optimizations just don't matter is reading up on the cycle count of modern processors.One a Zen 5, Integer addition is a single cycle, multiplication 3, and division ~12. But that's not the full story. The CPU can have 5 inflight multiplications running simultaneously. It can have about 3 divisions running simultaneously.Back in the day of RCT, there was much less pipelining. For the original pentium, a multiplication took 11 cycles, division could take upwards of 46 cycles. These were on CPUs with 100 Mhz clock cycles. So not only did it take more cycles to finish, couldn't be pipelined, the CPUs were also operating at 1/30th to 1/50th the cycle rate of common CPUs today.And this isn't even touching on SIMD instructions.Integer tricks and optimizations are pointless. Far more important than those in a modern game is memory layout. That's where the CPU is actually going to be burning most it's time. If you can create and do operations on a int[], you'll be MUCH faster than if you are doing operations against a Monster[]. A cache miss is going to mean anywhere from a 100 to 1000 cycle penalty. That blows out any sort of hit you take cutting your cycles from 3 to 1.
 - moregrist14 hours ago
 > Integer tricks and optimizations are pointless.They’re not pointless; they’re just not the first thing to optimize.It’s like worrying about cache locality when you have an inherently O(n^2) algorithm and could have a O(n log n) or O(n) one. Fix the biggest problem first.Once your data layout is good and your cpu isn’t taking a 200 cycle lunch break to chase pointers, then you worry about cycle count and keeping the execution units fed.That’s when integer tricks can matter. Depending on the micro arch, you may have twice as many execution units that can take integer instructions. And those instructions (outside of division) tend to have lower latency and higher throughput.And if you’re doing SIMD, your integer SIMD instructions can be 2 or 4x higher throughput than float32 if you can use int16 / int8 data.So it can very much matter. It’s just usually not the lowest hanging fruit.
 - Dylan1680710 hours ago
 > And if you’re doing SIMD, your integer SIMD instructions can be 2 or 4x higher throughput than float32 if you can use int16 / int8 data.Your float instructions can also be 2x the throughput if you use f16. With no need to go for specific divisors.For values that even can pack into 8 bits, you rarely have a way to process enough at once to actually get more throughput than with wider numbers.I'm sure there's a program where it very much matters, but my bet is on it not even mildly mattering, and there basically always being a hundred more useful optimizations to work on.
 - benchloftbrunch7 hours ago
 Problem with f16 is that hardware support is still "new" and can't be relied on in consumer grade CPUs yet.
 - Pannoniae17 hours ago
 This is all true but IMO forest for the trees.... For example the compiler basically doesn't do anything useful with your float math unless you enable fastmath. Period. Very few transformations are done automatically there.For integers the situation is better but even there, it hugely depends on your compiler and how much it cheats. You can't replace trig with intrinsics in the general case (sets errno for example), inlining is at best an adequate heuristic which completely fails to take account what the hot path is unless you use PGO and keep it up to date.I've managed to improve a game's worst case performance better by like 50% just by shrinking a method's codesize from 3000 bytes to 1500. Barely even touched the hot path there, keep in mind. Mostly due to icache usage.The takeaway from this shouldn't be that "computers are fast and compilers are clever, no point optimising" but more that "you can afford not to optimise in many cases, computers are fast."
 - WalterBright16 hours ago
 I originally got into writing compilers because I was convinced I could write a better code generator. I succeeded for about 10 years in doing very well with code generation. But then all the complexities of the evolving C++ (and D!) took up most of my time, and I haven't been able to work much on the optimizer since.Fortunately, D compilers gdc and ldc take advantage of the gcc and llvm optimizers to stay even with everyone else.
 - tialaramex3 hours ago
 The thing which would really help IMNSHO is to nail down the IR to eliminate weird ambiguities where OK optimisation A is valid according to one understanding, optimisation B is valid under another but alas if we use both sometimes it breaks stuff.
 - cogman1016 hours ago
 I actually agree with you.My point wasn't "don't optimize" it was "don't optimize the wrong thing".Trying to replace a division with a bit shift is an example of worrying about the wrong thing, especially since that's a simple optimization the compiler can pick up on.But as you said, it can be very worth it to optimize around things like the icache. Shrinking and aligning a hot loop can ensure your code isn't spending a bunch of time loading instructions. Cache behavior, in general, is probably the most important thing you can optimize. It's also the thing that can often make it hard to know if you actually optimized something. Changing the size of code can change cache behavior, which might give you the mistaken impression that the code change was what made things faster when in reality it was simply an effect of the code shifting.
 - YesBox2 hours ago
 > A cache miss is going to mean anywhere from a 100 to 1000 cycle penalty. That blows out any sort of hit you take cutting your cycles from 3 to 1.A good example of this is using std::vector<bool> vs. std::vector<uint8_t> in the debug build vs release build.vector<bool> is much slower to access (it's a dynamic bitset). If you have a hot part of the code that frequently touches a vector<bool>, you'll see a multiple X slowdown in the debug build.However, in the release build, there is no performance difference between the two (for me at least, I'm making a fairly complicated game). The cache misses bury it.
 - fragmede12 minutes ago
 Fascinating, that's counterintuitive. I'd think the point of using vector <bool> is because the compiler would optimize it to be a bit field which is fewer bits and thus smaller and thus faster than using vector <uint_t8>. How did you come to figure that out?
 - rcxdude16 hours ago
 Also, it's unusual for a game to be CPU bottlenecked nowadays, and if it is, it's probably more constrained on memory bandwidth than raw FLOPS.
 - eru11 hours ago
 Yes, though it depends a bit on your style of game.
- timschmidt20 hours ago
 Absolutely. I have written a small but growing CAD kernel which is seeing use in some games and realtime visualization tools ( <a href="https://github.com/timschmidt/csgrs" rel="nofollow">https://github.com/timschmidt/csgrs</a> ) and can say that computing with numbers isn't really even a solved problem yet.All possible numerical representations come with inherent trade-offs around speed, accuracy, storage size, complexity, and even the kinds of questions one can ask (it's often not meaningful to ask if two floats equal each other without an epsilon to account for floating point error, for instance)."Toward an API for the Real Numbers" ( <a href="https://dl.acm.org/doi/epdf/10.1145/3385412.3386037" rel="nofollow">https://dl.acm.org/doi/epdf/10.1145/3385412.3386037</a> ) is one of the better papers I've found detailing a sort of staged complexity technique for dealing with this, in which most calculations are fast and always return (arbitrary precision calculations can sometimes go on forever or until memory runs out), but one can still ask for more precise answers which require more compute if required. But there are also other options entirely like interval arithmetic, symbolic algebra engines, etc.One must understand the trade-offs else be bitten by them.
 - ryandrake18 hours ago
 Back in the early, early days, the game designer was the graphic designer, who also was the programmer. So, naturally, the game's rules and logic aligned closely with the processor's native types, memory layout, addressing, arithmetic capabilities, even cache size. Now we have different people doing different roles, and only one of them (the programmer) might have an appreciation for the computer's limits and happy-paths. The game designers and artists? They might not even know what the CPU does or what a 32 bit word even means.Today, I imagine we have conversations like this happening:Game designer: We will have 300 different enemy types in the game.Programmer: Things could be really, really faster if you could limit it to 256 types.Game designer: ?????That ????? is the sign of someone who is designing a computer program who doesn't understand the basics of computers.
 - WalterBright16 hours ago
 I wrote the Intellivision Mattel Roulette cartridge game back in the 1970s. It was all in assembler on a 10 bit (!) CPU. In order to get the game to fit in the ROM, you had to do every feelthy dirty trick imaginable.
 - biglost13 hours ago
 Please, write a comment, pastebin, gist or whatever, I would love to read it, that are stories in computer science i enjoy the most
 - DougN715 hours ago
 I would love to hear more about that.
 - applfanboysbgon14 hours ago
 Not all, or even most, games are made by billion dollar studios. Overlapping roles are still the norm in small studios. And even those that do have bespoke designer roles would likely benefit from telling them that computers have certain limitations where trade-offs in game design need to be selected for, because many AAA games run like shit. Many times for reasons other than the game design, sure, but also sometimes because of ways that could be worked around more easily if the game design were accomodating the tradeoffs.
- danbolt14 hours ago
 Yeah, I’m quite surprised at this comment. Commercial video games are mass-produced products, and as much as I dislike designers being bogged down in technical minutiae, having a sense of industrial design for the thing you’re making is an incredible boon.Fumito Ueda was notably quite concerned with the technical/production feasibility of his designs for Shadow of the Colossus. [1] Doom was an exercise in both creativity and expertise.[1] <a href="https://www.designroom.site/shadow-of-the-colossus-oral-history/" rel="nofollow">https://www.designroom.site/shadow-of-the-colossus-oral-hist...</a>
- lukan19 hours ago
 "and in many cases it is one of many silent contributing factors to a noticeable decrease in the quality of their game"Game designers are not so constrained anymore by the limits of the hardware, unless they want to push boundaries. Quality of a game is not just the most efficient runtime performance - it is mainly a question if the game is fun to play. Do the mechanics work. Are there severe bugs. Is the story consistent and the characters relatable. Is something breaking immersion. So ... frequent stuttering because of bad programming is definitely a sign of low quality - but if it runs smooth on the targets audience hardware, improvements should be rather done elsewhere.
 - crq-yml13 hours ago
 There's an artistic thread in game coding - one that isn't the norm, but which I think RCT is exemplary of - that holds that mechanical sympathy is important to the game design process. A limit set around NPOT maximums and divisions and lengths of pathfinding is allowing the machine to opine, "you will actually do less work if you set the boundary here". Setting those limits tends to inform the shape of resulting assets as something tiny and easy to hardcode.The thing that changed during the 90's is that mechanical sympathy became optional to achieving a large production. The data input defining the game world was decoupled into assets authored in disconnected ways and "crunched down" to optimized forms - scans, video, digital painting, 3D models. RCT exhibits some of this, too, in that it's using PCM audio samples and prerendered sprites. If the game weren't also a massive agent simulator it would be unremarkable in its era. But even at this time more complex scripting and treating gameplay code as another form of asset was becoming normalized in more genres.From the POV of getting a desired effect and shipping product, it's irrelevant to engage with mechanical sympathy, but it turns out that it's a thing that players gradually unravel, appreciate and optimize their own play towards if they stick with it and play to competitive extremes, speedrun, mod, etc.The 64kb FPS QUOD released earlier this year is a good example of what can happen by staying committed to this philosophy even today: the result isn't particularly ambitious as a game design, but it isn't purely a tech demo, nor does it feel entirely arbitrary, nor did it take an outrageous amount of time to make(about one year, according to the dev).
 - scns19 hours ago
 > it is mainly a question if the game is fun to play.10000x this. Miyamoto starts with a rudimentary prototype and asks himself this. Sadly it seems for many fun is an afterthought they try to patch in somehow.
 - calvinmorrison16 hours ago
 When Halo 2 (anniversary edition? ) was released there was also a video on in the game about the development. The point that always stuck with me was "you must nail that 2 seconds that will keep people playing forever". The core mechanic of that game is just excellent.
 - ukuina14 hours ago
 "30 seconds of fun"<a href="https://youtu.be/0q69Msy8ttM?t=287" rel="nofollow">https://youtu.be/0q69Msy8ttM?t=287</a>
 - sublinear19 hours ago
 This way of thinking has caused at least a few prominent recurring bugs I can think of.Texture resolution mismatches causing blurriness/aliasing, floating point errors and bad level design causing collision detection problems (getting stuck in the walls), frame rate and other update rates not being synced causing stutter and lag (and more collision detection problems), bad illumination parameters ruining the look they were going for, numeric overflow breaking everything, bad approximations of constants also breaking everything somewhere eventually, messy model mesh geometry causing glitches in texturing, lighting, animation, collision, etc.There's probably a lot more I'm not thinking of. They have nothing to do "with the hardware", but the underlying math and logic.They're also not bugs to "let the programmer figure out". Good programmers and designers work together to solve them. I could just as easily hate on the many criminally ugly, awkward, and plain unfun games made by programmers working alone, but I'll let someone else do that. :)
 - WalterBright16 hours ago
 > getting stuck in the wallsI remember the early Simpsons video game. Sometimes, due to some bug in it (probably a sign error), you could go through the walls and see the rendered scenery from the other side. It was like you went backstage in a play. It would have made a great Twilight Zone episode!
 - jkestner5 hours ago
 That immediately made me think of the Treehouse of Horror episode where Homer got stuck in the third dimension.
 - lukan10 hours ago
 Those bugs I experienced in all sorts of games and via cheats, fly mode, I intentionally went backstage to explore.
 - lukan19 hours ago
 Game designer != game engine designer(But it definitely helps if the game designer knows of the technical limits)
 - sublinear18 hours ago
 Sorry, I'm not super familiar with professional game dev, but I am familiar with professional web dev. The problems seem similar, as evidenced by the constant complaining here on HN about the state of the web.Who formats or cleans up the assets and at least oversees that things are done according to a consistent spec, process, and guidelines? Is that not a game designer or someone under their leadership?I think in all the cases I gave, what might be completely delegated to "engine design" really should be teamwork with game design and art direction too. This is what the top-level comment was talking about. Even when a game is "well made", they just adopted someone else's standards and that sucks all the soul out of it. This is a common problem in all creative work.(adding this due to reply depth): Coordination is a big aspect of design and can often be the most impactful to the result.
 lukan18 hours ago
 It depends how big the studio is, but a job of a game designer is usually not cleaning up assets. It is to well, design the game. The big picture.
- lifis17 hours ago
 That makes no sense since multiplication has been fast for the last 30 years (since PS1) and floating point for the last 25 years (since PS2) and anyway numbers relevant for game design are usually used just a few times per frame so only program size matters, which has not been significantly constrained for the last 40 years (since NES)
 - applfanboysbgon14 hours ago
 I wasn't talking about the specific example in the article. There are many, many other ways in which numeric characteristics can constrain game design, particularly if your game has any kind of scale to it (say, simulations with tons of moving parts or many NPCs, like RCT, or large open worlds like Minecraft, or large multiplayer games like WoW, as examples all mentioned in the thread).If your game is small-scale, something like Super Mario Bros., you should be able to get away with not thinking about it in theory. But even then people manage to write simple games with bloated loading times and stuttery performance, so never underestimate the impressive ability of people who are operating solely at the highest level of abstraction to make computers cry.
- exmadscientist17 hours ago
 Related to that, for a consumer electronics product I worked on using an ARM Cortex-M4 series microcontroller, I actually ended up writing a custom pseudorandom number generation routine (well, modifying one off the shelf). I was able to take the magic mixing constants and change them to things that could be loaded as single immediates using the crazy Thumb-2 immediate instructions. It passed every randomness test I could throw at it.By not having to pull in anything from the constant pools and thereby avoid memory stalls in the fast path, we got to use random numbers profligately and still run quickly and efficiently, and get to sleep quickly and efficiently. It was a fun little piece of engineering. I'm not sure how much it mattered, but I enjoyed writing it. (I think I did most of it after hours either way.)Alas, I don't think it ever shipped because we eventually moved to an even smaller and cheaper Cortex-M0 processor which lacked those instructions. Also my successor on that project threw most of it out and rewrote it, for reasons both good and bad.
- WalterBright16 hours ago
 I remember the older driving games. They'd progressively "build" the road as you progressed on it. Curves in the road were drawn as straight line segments.Which wasn't a problem, but it clearly showed how the programmers improvised to make it perform.
 - Sharlin8 hours ago
 Limiting the drawing distance and rendering as little geometry as possible is absolutely still a thing, devs just can afford to hide it better these days. The golden rule of graphics programming has always been "cheat as much as you can get away with, and then a bit more".
- rkagerer16 hours ago
 Now that's what being a full stack programmer really means.
- eru11 hours ago
 Constraints breed creativity.
 - 7bit6 hours ago
 Today, constraints are simply ignored. (Looking at you, we devs and Microsoft devs).
- edflsafoiewq20 hours ago
 Examples?
 - mort9619 hours ago
 I think Minecraft's lighting system is a good example: there are 16 different brightness levels, from 0 to 15. This allows the game to store light levels in 4 bytes per block.Similarly, redstone has 16 power levels: 0 to 15. This allows it to store the power level using 4 bits. In fact, quite a lot of attributes in Minecraft blocks are squeezed into 4 bits. I think the system has grown to be more flexible these days, but I'm pretty sure the chunk data structure used to set aside 4 bits for every block for various metadata.And of course, the world height used to be at 255 blocks. Every block's Y position could be expressed as an 8-bit integer.A voxel game like that is a good example of where this kind of efficiency really matters since there's just so much data. A single 1616256 chunk is 65.5k blocks. If a game designer says they want to add a new light source with brightness level 20, or a new kind of redstone which can go 25 blocks, it might very well be the right choice to say no.
 - tosti17 hours ago
 I don't think Minecraft would be considered a cornerstone of optimal programming.
 - helterskelter17 hours ago
 The 4 bit stuff is a hangover from Mojang having to squeeze every bit of perf from their Java based engine that they could. Their original sound engine was so sketchy that C418's (music composer) minimalist sound is partly because it really couldn't handle much more than what got released.MS has been loosening up on the 4 bits limit and have created a CPP variant of Minecraft which performs better, but they've also introduced their unified login garbage that has almost made me give up Minecraft completely.
 Pannoniae17 hours ago
 Hey, this isn't entirely accurate!The 4-bit stuff is a hangover from Notch doing this (I'd maybe even say a similar-calibre programmer to Chris Sawyer...). The sound has nothing to do with technical limits, that's a post-facto rationalisation.The game never played midi samples, it was always playing "real" audio. The style was an artistic choice, many similar retro-looking games were using chiptune and the sorts. It's a deliberate juxtaposition...The CPP variant doesn't really perform better anymore either.
 helterskelter16 hours ago
 Fair enough, I mostly meant to point out some of those design decisions predate MS, as much as I love to hate on them. The music was just an interesting bit of trivia I read the other day.
 Pannoniae15 hours ago
 Yeah, 100% :) Ironically, the design constraints are one of the big things which made it work so much! If it was designed in a "traditional" way, it would have been much less ambitious.
 imtringued7 hours ago
 Bedrock Edition has a smaller simulation distance, which is kind of the opposite you'd expect from the more "optimized" version.
 - mort9616 hours ago
 Minecraft is, and always has been, handling vast amounts of data at pretty good performance. It's not an impossibly difficult task, many other people have made voxel game engines which are better, but it's something you can't do without paying attention to these things. Every voxel engine with remotely reasonable performance needs to carefully count bits used per block.
 - kulahan17 hours ago
 The entire program doesn't need to be a cornerstone of optimal programming for this one example to hold true.
 - andai20 hours ago
 <a href="https://en.wikipedia.org/wiki/Nuclear_Gandhi" rel="nofollow">https://en.wikipedia.org/wiki/Nuclear_Gandhi</a>From what I heard, there was a Civilization game which suffered from an unsigned integer underflow error where Gandhi, whose aggression was set to 0, would become "less aggressive" due to some event in the game, but due to integer underflow, this would cause his aggression to go to 255, causing him to nuke the entire map.The article says this was just an urban legend though. Well, real or not, it's a perfect example of the principle!
 - luaKmua20 hours ago
 Indeed an urban legend. Sid Meier himself debunked in his memoir, which is a pretty great read.
 - mrguyorama2 hours ago
 It's fascinating to live through the entire lifecycle of:Weird thing happens. People make up reasons why. One reason is possible. That becomes THE reason, and spread wildly, without confirmation, as an accurate explanation. "Actually that's not true". Now that not being the reason is widely disseminated and if we are lucky the original meme dies out!But it took 30 years. For a very meaningless rumor.
 - bombcar18 hours ago
 Read all of the Factorio Friday Facts <a href="https://factorio.com/blog/" rel="nofollow">https://factorio.com/blog/</a> - a number of the more obscure bug/performance issues come down to making something fit naturally into a value the CPU can handle.
 - hcs19 hours ago
 Not the same thing but I was reminded of a joke about the puzzle game Stephen's Sausage Roll:> I have calculated the value of Pi on Sausage Island and found it to be 2.<a href="https://web.archive.org/web/20240405034314/https://twitter.com/ianmaclarty/status/723848396675506177" rel="nofollow">https://web.archive.org/web/20240405034314/https://twitter.c...</a>
 - Waterluvian20 hours ago
 Not really an example that proves any point, but one that comes to mind from a 20-year-old game:World of Warcraft (at least originally) encoded every item as an ID. To keep the database simple and small (given millions of players with many characters with lots of items): if you wanted to permanently enchant your item with an upgrade, that was represented essentially as a whole new item. The item was replaced with a different item (your item + enchant). Represented by a different ID. The ID was essentially a bitmask type thing.This meant that it was baked into the underlying data structures and deep into the core game engine that you could never have more than one enchant at a time. It wasn't like there was a relational table linking what enchants an item in your character's inventory had.The first expansion introduced "gems" which you could socket into items. This was basically 0-4 more enchants per item. The way they handled this was to just lengthen item Ids by a whole bunch to make all that bitmask room.I might have gotten some of this wrong. It's been forever since I read all about these details. For a while I was obsessed with how they implemented WoW given the sheer scale of the game's player base 20 years ago.
 - plopz18 hours ago
 One of the main issues with Kerbal Space Program is instability caused by floating point numbers. I know Starcraft 2 was built upon integers.
 - Gigachad18 hours ago
 Floating point issues are less a problem of performance here but one of precision. Particularly being a space game, the coordinates can be massive resulting in the precision deteriorating enough to cause issues.
 - ErroneousBosh19 hours ago
 Going way back into history, the Alesis MIDIVerb reverb unit had a really simple DSP core made out of discrete logic chips. It could add a memory location to an accumulator and divide it by two, invert it, add it and divide it by two, or store it in ram either inverted or not and divide the accumulator by two.Four instructions, in about eight chips.By combining shifts and adds Keith Barr was able to devise all the different filter and delay coefficients for 63 different reverb programs (the 64th one was just dead passthrough).
youarentrightjr19 hours ago
> The same trick can also be used for the other direction to save a division: NewValue = OldValue >> 3; This is basically the same as NewValue = OldValue / 8; RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you.(emphasis mine)Not at all true. Assuming the types are such that >> is equivalent to /, modern compilers will implement division by a power of two as a shift every single time.
- mid-kid5 hours ago
 Yeah. I'm surprised this along with the money thing are listed in the article at all. These are the sort of things you learn within the first month of writing assembly, and were widely used across the industry at the time (and times prior). The bit shifting optimization is performed by GCC even at -O0, and likely already was at the time, as it's one of the simpler optimizations to make. It's like calling "xor eax, eax" a masterful optimization tactic for clearing a register.Looking at the macro-level optimizations like the rest of the article does is significantly more interesting.
 - ConceptJunkie51 minutes ago
 "XOR AXAX" was my license plate in the 90s.
- spiffyk1 hour ago
 That whole section is kind of weird. The mention of operator overloading also seems out of place, since the operator is not overloaded here at all.
- account423 hours ago
 Yeah and this can be verified with the minimum level of research you'd expect for this kind of article, e.g. by firing up compiler explorer.
- dmitrygr12 hours ago
 they will do it for unsigned. for signed they will do a bit more to do the same rounding as C promises
 - grumbelbart24 hours ago
 Here is how that looks like:<a href="https://godbolt.org/z/rooee4esd" rel="nofollow">https://godbolt.org/z/rooee4esd</a>
 - account423 hours ago
 It's unfortunate that C tied overflow behavior to the signedness of integer types.
hermitcrab12 minutes ago
>the game was written in the low-level language AssemblySurely it wasn't all assembly. There is little to be gained in performance from writing non-bottleneck parts of the code in assembly.
- its-summertime4 minutes ago
 > > What language was RollerCoaster Tycoon programmed in?> It's 99% written in x86 assembler/machine code (yes, really!), with a small amount of C code used to interface to MS Windows and DirectX.<a href="https://www.chrissawyergames.com/faq3.htm" rel="nofollow">https://www.chrissawyergames.com/faq3.htm</a>
 - hermitcrab1 minute ago
 Wow. It reminds me of those guys who run a marathon carrying a fridge. Impressive, but ...
TheGRS51 minutes ago
This is a fun read, its one of my favorite games growing up by far, countless hours sunk into it. I didn't need this write-up to know that Chris Sawyer was god among men and that the open source version is a huge labor of love, but its a good reminder :) I will need to give OpenRCT a try some time, I've tried a little OpenTTD and really enjoy it, but RCT was always my jam.For the lesson here, I think re-contextualizing the product design in order to ease development should be a core tenant of modern software engineering (or really any form of engineering). This is why we are usually saying that we need to shift left on problems, discussing the constraints up-front lets us inform designers how we might be able to tweak a few designs early in order to save big time on the calendar. All of the projects that I loved being a part of in my career did this well, all of the slogs were ones that employed a leadership-driven approach that amounted to waterfall.
HelloUsername21 hours ago
Fun read, thx! I'd also recommend more about RCT:"Interview with RollerCoaster Tycoon's Creator, Chris Sawyer (2024)" <a href="https://news.ycombinator.com/item?id=46130335">https://news.ycombinator.com/item?id=46130335</a>"Rollercoaster Tycoon (Or, MicroProse's Last Hurrah)" <a href="https://news.ycombinator.com/item?id=44758842">https://news.ycombinator.com/item?id=44758842</a>"RollerCoaster Tycoon at 25: 'It's mind-blowing how it inspired me'" <a href="https://news.ycombinator.com/item?id=39792034">https://news.ycombinator.com/item?id=39792034</a>"RollerCoaster Tycoon was the last of its kind [video]" <a href="https://news.ycombinator.com/item?id=42346463">https://news.ycombinator.com/item?id=42346463</a>"The Story of RollerCoaster Tycoon" <a href="https://www.youtube.com/watch?v=ts4BD8AqD9g" rel="nofollow">https://www.youtube.com/watch?v=ts4BD8AqD9g</a>
jwilliams2 hours ago
Definitely made me feel old to see bit-shifting needed an explainer! I must admit as I was reading I was "why is he explaining this? it's obvious!".
Validark10 hours ago
"Since the number is stored in a binary system, every shift to the left means the number is doubled.At first this sounds like a strange technical obscurity"Do we not know binary in 2026? Why is this a surprise to the intended audience?
- xeonmc9 hours ago
 <a href="https://xkcd.com/2501/" rel="nofollow">https://xkcd.com/2501/</a>
fweimer20 hours ago
What language is this article talking where compilers don't optimize multiplication and division by powers of two? Even for division of signed integers, current compilers emit inline code that handles positive and negative values separately, still avoiding the division instruction (unless when optimizing for size, of course).
- orthoxerox8 hours ago
 Well, Sawyer started writing Transport Tycoon in 1992, when free or affordable C compilers were not as widely available. Turbo C was never known for optimizations. GCC 1.40 was good enough for Linus, but I guess Chris was already a good assembly programmer.
- shakow19 hours ago
 That's what I would have thought as well, but looks like that on x86, both clang and gcc use variations of LEA. But if they're doing it this way, I'm pretty sure it must be faster, because even if you change the ×4 for a <<2, it will still generate a LEA.<a href="https://godbolt.org/z/EKj58dx9T" rel="nofollow">https://godbolt.org/z/EKj58dx9T</a>
 - shaggie7618 hours ago
 Not only is LEA more flexible I believe it's preferred to SHL even for simple operations because it doesn't modify the flags register which can make it easier to schedule.
 - fweimer9 hours ago
 It's more about the non-destructive destination part, which can avoid a move. Compilers tend to prefer SHL/SAL of LEA because its encoding is shorter: <a href="https://godbolt.org/z/9Tsq3hKnY" rel="nofollow">https://godbolt.org/z/9Tsq3hKnY</a>
 - Cold_Miserable13 hours ago
 shlx doesn't alter the flag register.
 - fweimer9 hours ago
 SHLX does not support an immediate operand. Non-destructive shifts with immediate operands only arrive with APX, where they are among the most commonly used instructions (besides paired pushes/pops).
 - Validark10 hours ago
 Using an lea is better when you want to put the result in a different register than the source and/or you don't want to modify the flags registers. shlx also avoids modifying flags, but you can't shift by an immediate, so you need to load the constant into a register beforehand. In terms of speed, all these options are basically equivalent, although with very slightly different costs to instruction caches and the register renaming in the scheduler. In terms of execution, a shift is always 1 cycle on modern hardware.
 - adrian_b19 hours ago
 They use LEA for multiplying with small constants up to 9 (not only with powers of two, but also with 3, 5 and 9; even more values could be achieved with two LEA, but it may not be worthwhile).For multiplying with powers of two greater or equal to 16, they use shift left, because LEA can no longer be used.
- cjbgkagh20 hours ago
 It was written in assembly so goes through an assembler instead of a compiler.
 - rawling19 hours ago
 I assume GP is talking about the bit in the article that goes> RCT does this trick all the time, and even in its OpenRCT2 version, this syntax hasn’t been changed, since compilers won’t do this optimization for you.
 - cjbgkagh18 hours ago
 That makes more sense, I second their sentiment, modern compilers will do this. I guess the trick is knowing to use numbers that have these options.
 - bombcar18 hours ago
 There was a recent article on HN about which compiler optimizations would occur and which wouldn't and it was surprising in two ways - first, it would make some that you might not expect, and it would not make others that you would - because in some obscure calling method, it wouldn't work. Fixing that path would usually get the expected optimization.
troad16 hours ago
> When reading through OpenRCT2’s source, there is a common syntax that you rarely see in modern code, lines like this:> NewValue = OldValue << 2;I disagree with the framing of this section. Bit shifts are used all the time in low-level code. They're not just some archaic optimisation, they're also a natural way of working with binary data (aka all data on a computer). Modern low-level code continues to use lots of bit shifts, bitwise operators, etc.Low-level programming is absolutely crucial to performant games. Even if you're not doing low-level programming yourself, you're almost certainly using an engine or library that uses it extensively. I'm surprised an article about optimisation in gaming, of all things, would take the somewhat tired "in ye olde days" angle on low-level code.
- Rendello15 hours ago
 I learned these low-level bit tricks by reading TempleOS' HolyC source code. I remember feeling like a genius when I worked out what this line does:dc->color=c++&15;Hint: it's from this "Lines" demo program, whose source is here: <a href="https://web.archive.org/web/20180906060723/https://templeos.holyc.xyz/Wb/Demo/Graphics/Lines.html" rel="nofollow">https://web.archive.org/web/20180906060723/https://templeos....</a>And this is what it looks like when it runs (ignore the fact it's running in Minecraft): <a href="https://youtu.be/pAN_Fza6Vy8?t=38" rel="nofollow">https://youtu.be/pAN_Fza6Vy8?t=38</a>
bluelightning2k19 hours ago
Great write up. Thank you. Really great!I was reminded of the factorio blog. That game's such a huge optimization challenge even by today's standards and I believe works with the design.One interesting thing I remember is if you have a long conveyor belt of 10,000 copper coils, you can basically simplify it to just be only the entry and exit tile are actually active. All the others don't actually have to move because nothing changes... As long as the belts are fully or uniformly saturated. So you avoid mechanics which would stop that.
- plopz18 hours ago
 I was pretty disappointed with how Factorio reworked how fluids worked in the expansion. The old system had its quirks and the new system is obviously more performant, but it throws realism out the window which is a bummer.
 - Cpoll15 hours ago
 I don't miss it. I also found Satisfactory's old fluid system (with concepts like sloshing) wildly unintuitive. I'll go so far as to say that accurate fluid dynamics is detrimental to any game that's not about beavers and water table management.
 - rkagerer11 hours ago
 That's the second time I heard the beaver game come up here... Guess I really ought to try it!
 - Linosaurus10 hours ago
 It’s rather neat, and recently hit 1.0.That game, Timberborn, shares some design elements with roller coaster tycoon.A block based 3d world they can be modified by the player.Units walking around on player defined paths, with their mood influenced by pretty bushes.But there are no obvious performance considerations like in the article.
 - Starlevel00416 hours ago
 The old system was nonfunctional and any base that used lots of fluids (like modded ones, or new space age ones) were constantly running up against nonsensical mechanics.
evandale15 hours ago
The pathfinding section reminded me that there's a YouTube steamer, Marcel Vos, who goes into a deep dive of how the pathfinding works.<a href="https://youtu.be/twU1SsFP-bE" rel="nofollow">https://youtu.be/twU1SsFP-bE</a>He has lots of videos that are deep dives into how RCT works and how things are implemented!
- mmcconnell16185 hours ago
 I've built a few transportation simulations where I started out with pathfinding methods like A* but the compute cost doesn't scale well with 10,000 or 100,000 agents running around. Pre-computing flow fields for common map destinations is one of those areas where you trade off storage for compute. The agents just look for the signpost telling them "this direction to destination x" instead of actually calculating a path.<a href="https://en.wikipedia.org/wiki/A*_search_algorithm#" rel="nofollow">https://en.wikipedia.org/wiki/A*_search_algorithm#</a><a href="https://www.youtube.com/watch?v=zr6ObNVgytk" rel="nofollow">https://www.youtube.com/watch?v=zr6ObNVgytk</a>
sroerick21 hours ago
I had always heard about how RCT was built in Assembly, and thought it was very impressive.The more I actually started digging into assembly, the more this task seems monumental and impossible.I didn't know there was a fork and I'm excited to look into it
- kevincox18 hours ago
 Programming in assembly isn't really "hard" it mostly takes lots of discipline. Consistency and patterns are key. The language also provides very little implicit documentation, so always document which arguments are passed how and where, what registers are caller and callee saved. Of course it is also very tedious.Now writing very optimized assembly is very hard. Because you need to break your consistency and conventions to squeeze out all the possible performance. The larger "kernel" you optimize the more pattern breaking code you need to keep in your head at a time.
- mikkupikku20 hours ago
 Macros. Lots of macros.
 - cogman1017 hours ago
 Yup. I've done a bit of assembly and it's really only a little harder than doing C. You simply have to get familiar with your assembler and the offered macros. Heck, I might even say that it's simpler than basic.
 - timschmidt19 hours ago
 And presumably generous use of code comments
- markus_zhang16 hours ago
 Back then a lot of people started with assembly because that was the only way to make games quick enough. Throughout the years they accumulated tons of experience and routines and tools.Not saying that it was not a huge feat, but it’s definitely a lot harder to start from scratch nowadays, even for the same platform.
- mrguyorama1 hour ago
 >I didn't know there was a fork and I'm excited to look into itOpenRCT2 isn't a fork, it's like OpenTTD, a recreation.Go look at GDC's Classic Game Postmortems. They have tens of videos of the people who built famous games from the 80s and 90s who often go into technical details of how they do it. For example, Robotron goes into how the code works.It's remarkably familiar. They basically built object oriented programming and classes using convention only. You treat every actor you want to work with as a chunk of memory with standard layout that includes pointers for behavior and slots for state, and you just try really hard to only operate on the right "Types" at the right places. From there you have your standard game loop of "Get input, update all Actors, render, loop"The Pitfall postmortem is wonderful. The Atari 2600 had roughly zero RAM to work with, and barely any cartridge space to hold your game. To make their large, somewhat open world, they made each screen built off just a few parameters, and created a bidirectional psuedorandomish function that would generate the parameters on a cycle, giving you a connected map space!
raffraffraff10 hours ago
On huge games produced by large game studios, I wonder if the idea of using a real world technical challenge as a "feature" within the game is considered genius? Consider a coder and a game designer who are on different teams and don't attend the same meetings.But if you look at creative writing, story arcs are all about obstacles. A boring story is made interesting by an obstacle. It is what our protagonist needs to overcome. A one-man-band game dev who simultaneously holds the story and the technical challenge their head, might spot the opportunity to use a glitch or limitation as, I dunno, a mini game that riffs on the glitch.
ionwake5 hours ago
also I remember the excitement of a new game that looked different to others.Somehow even as a child I just knew that it would be a whole new emergent game play experience.Ofcourse I didnt know waht went into making Rolelrcoaster Tycoon but I could just by a couple of screenshots how this was clearly a ground up new game with new mechanics that would be extremely fun to play.I dont get this feeling anymore, as I just assyne everything is just a clone of another game in the same engine generally.Unless its been a decade in production like Breath of the Wild of GTA 5 i just dont expect much.
derodero244 hours ago
Biggest lesson from string matching work: layout beats instructions every time. Batch your comparisons into a contiguous buffer so the prefetcher can actually help, and you'll outperform hand-rolled SIMD on random-access data. Compilers handle the arithmetic tricks fine now — they can't fix your cache misses though.
maxglute15 hours ago
Is there a place to find stories of recent game optimization? What's most ridiculous on like quick inverse square route. As someone who spent way too much time vraying in prior life, I still can't believe we got real time ray tracing.
pmg1029 hours ago
The article refers several times to the benefits of the game designer and the coder being the same person. I've often felt that this is the only way to build anything impressive, and in fact I'm amazed that corporations with their hierarchical organisation model ever get anything built at all but I suppose you can brute force anything with enough employees.It does make you wonder if the future of AI-assisted development will look more like the early days of coding, where one single mind can build and deliver a whole piece of software from beginning to end.
makapuf9 hours ago
Compilers won't do multiplication by power of two to bit shift for you ? I remember reading in ~2000: the only thing writing a<<2 instead of a/4 will do is make your compiler yawn
- zekica6 hours ago
 Even gcc's -O0 will do the bitshift, but even dividing with 5 on x86_64 will not do idiv:<pre><code> imul rdx, rdx, 1717986919 shr rdx, 32 sar edx sar eax, 31 sub edx, eax mov eax, edx</code></pre>
bze129 hours ago
> This part is especially fascinating to me, since it turns an optimization done out of technical necessity into a gameplay feature.Reminds me of blood moons in Zelda <a href="https://www.polygon.com/legend-zelda-tears-kingdom/23834440/totk-blood-moon-hidden-trick/" rel="nofollow">https://www.polygon.com/legend-zelda-tears-kingdom/23834440/...</a>
egypturnash14 hours ago
I have to wonder how much of the original assembly source looked a lot more succinct than whatever's in OpenRCT due to the use of macros. Looking up his gameography on Mobygames, Chris had been writing stuff since 1984 when RCT came out in 1999, it's hard to imagine he was still writing every single opcode out by hand given that I had some macros in the assembler I was fooling around with on my c64 back in the eighties.
random__duck1 hour ago
So this is what programing on hard mode looks like ?
MisterTea17 hours ago
While it has been a while since playing RCT, one thing that was really nice about the game is that it runs flawlessly under Wine.I really wish I could see the source code.
lefty220 hours ago
> The same trick can also be used for the other direction to save a division:> NewValue = OldValue >> 3;You need to be careful, because this doesn't work if the value is negative. A
- whizzter19 hours ago
 Most CPU's has signed and unsigned right shift instructions (left shift is the same), so yes it works (You can test this in C by casting a signed to unsigned before shifting).The biggest caveat is that right shifting -1 still produces -1 instead of 0, but that's usually fine for much older game fixed-point maths since -1 is close enough to 0.
- adrian_b18 hours ago
 It works fine when the value is negative.However, there is a quirk of the hardware of most CPUs that has been inherited by the C language and by other languages.There are multiple ways of defining integer division when the dividend is not a multiple of the divisor, depending on the rounding rule used for the quotient.The 2 most frequently used definitions is to have a positive remainder, which corresponds to rounding the quotient by using the floor function, and to have a remainder of the same sign with the quotient, which corresponds to rounding the quotient by truncation.In most CPUs, the hardware is designed such that for signed integers the division instruction uses the second definition, while the right shift uses the first definition.This means that when the dividend is a multiple of the divisor, division and right shift are the same, but otherwise the quotient may differ by one unit due to different rounding rules.Because of this, compilers will not replace automatically divisions with right shifts, because there are operands where the result is different.Nevertheless, the programmer can always replace a division by a power of two with a right shift. In all the programs that I have ever seen, either the rounding rule for the quotient does not matter or the desired definition for the division is the one with positive remainder, i.e. the definition implemented by right shift.In those cases when the rounding rule matters, the worrisome case is when you must use division not when you can use right shift, so you must correct the result to correspond to rounding by floor, instead of the rounding by truncation provided by the hardware. For this, you must not use the "/" operator of the C language, but one of the "div" functions from "stdlib.h", or you may use "/" but divide the absolute values of the operands, after which you compute the correct signed results.
londons_explore20 hours ago
> it turns an optimization done out of technical necessity into a gameplay featureAnd this folks is why an optimizing compiler can never beat sufficient quantities of human optimization.The human can decide when the abstraction layers should be deliberately broken for performance reasons. A compiler cannot do that.
- nulltrace18 hours ago
 The LEA-vs-shift thread here kind of proves the point. Compilers are insanely good at that stuff now. Where they completely fall short is data layout. I had a message parser using `std::map<int, std::string>` for field lookup and the fix was just... a flat array indexed by tag number. No compiler is ever going to suggest that. Same deal with allocation. I spent a while messing with SIMD scanning and consteval tricks chasing latency, and the single biggest win turned out to be boring. Switched from per-message heap allocs to a pre-allocated buffer with `std::span` views into the original data. ~12 allocations per message down to zero. Compiler will optimize the hell out of your allocator code, it just won't tell you to stop calling it.
- timschmidt19 hours ago
 Agreed. It really requires an understanding of not just the software and computer it's running on, but the goal the combined system was meant to accomplish. Maybe some of us are starting to feed that sort of information into LLMs as part of spec-driven development, and maybe an LLM of tomorrow will be capable of noticing and exploiting such optimizations.
- hrmtst938379 hours ago
 If you think compilers can't punch through abstractions, you haven't seen what whole-program optimization does to an overengineered stack when the programer gives it enough visibility. They still miss intent, so the game-specific hack can win.
- gwern18 hours ago
 End-to-end optimization in action! Although I'd've liked more than 1 example (pathfinding) here.
throw_m2393392 hours ago
Fantastic write-up, that's exactly why I came to HN many years ago, to find such articles about mundane things or products, but the technical aspect is just fascinating.
rajan29 hours ago
This is insane
rajan29 hours ago
This is Insane
atrealadam9 hours ago
I’m quite surprised at this comment.
sghiassy18 hours ago
Another great optimization is storing the year as two digits, because you only need the back half…… oh wait, nvm. Don’t preoptimize!
- seba_dos117 hours ago
 There's a vast space between premature optimization and not caring about optimization until it bites you, and both extremes make you (or someone else) miserable.
- neonstatic11 hours ago
 It's a fun optimization to make in the 9th decade of a century :)
almostdeadguy4 hours ago
The pathfinder algorithm is a great example of why constraints are so important for creativity and creative development.If AI has any benefit to creative endeavors at all it will be because of the challenges of coaxing a machine defined to produce an averaging of a large corpus of work (producing inherently mediocre slop) provides novel limitations, not because it makes art any more "accessible".
itsnexis1 hour ago
[dead]
ZebusJesus17 hours ago
[flagged]
rajan29 hours ago
This is insane!