Build a Compiler in Five Projects

(kmicinski.com)

143 points by azhenley1 day ago

4 comments

fjfaase3 hours ago
It surprises that they are still teaching parsing techniques that are based on limitation from 40 years ago, when memory was limited and you had to parse a file one character at the time. Why not start with a back-tracking recursive descent parser on a file stored in memory? Can be made efficient with some simple caching. In an introduction course there is no need to aim for maximum performance if parsing a 10k lines program takes less than a second.
- vintagedave2 hours ago
 Are you referring to lookahead, as in allowing more ambiguous grammars?
 - fjfaase1 hour ago
 No, even if a grammar is ambigious it can require unbound look-ahead to be parsed, although this is very rare the case for meaningfull grammars such as the ones you would write for a programming language.What I wanted to say that you do not need complex algorithms to implement parser if you do not have a grammar that can be parsed with look-ahead lexical element.
UncleOxidant11 hours ago
The Essentials of Compilation book mentioned is only ~$24 on Amazon. Usually books like this are much more expensive. I ordered a copy.
- declank2 hours ago
 There is also the Python version too and available free. I do like the register allocation/graph colouring chapter.
- almostgotcaught11 hours ago
 looks like a fun book but just be forewarned real compiler engineering is nothing like what's covered there.
 - kryptiskt4 hours ago
 I'm knee deep in clang at the moment and I'm so fed up with real compiler engineering. Give me Chez Scheme and the nanopass compiler any day. That is so much better than the big ball of mud that goes into a "real" compiler.
 - anta4010 hours ago
 Any recommendation for a more realistic book?I think hacking GCC/LLVM can be pretty challenging, but hey they are real, production-grade compilers and not just typical academic projects.
 - almostgotcaught10 hours ago
 there are no good modern compiler books - everything that's been written down pales in comparison to what GCC/LLVM really involve. recently i found Engineering a Compiler by Cooper and Torczon when reviewing/prepping for interviews - it wasn't bad. also there's now LLVM Code Generation by Quentin Colombet but that's basically a code walk-through of LLVM (it doesn't cover any of the algos). and it was probably out of the date the second it got published lol (not really but maybe). the truth is that trying to learn how to build a compiler from a single book is like trying to learn how to build a skyscraper from a single book.
 - sph6 hours ago
 > the truth is that trying to learn how to build a compiler from a single bookI think you conflate “learning to build a compiler for a toy language” with “being effective at working on a modern optimizing compiler suite like GCC/LLVM”The book is perfectly fine for the first use case, and never claims to touch upon the latter.
 - kmicinski9 hours ago
 Respectfully, I think what you mean is that there are no books which give you the experience of hacking on LLVM for several years.
 - yu3zhou49 hours ago
 Is Dragon book still relevant? Do you recommend any other learning resources other than reading the source and contributing to llvm?
 f1shy5 hours ago
 IMHO absolutely. The basics of lexer and parser are still there. Some of the optimizations are also relevant. You just cannot expect to read the book and be able to write GCC or LLVM from scratch(1).For learning deeper about other advanced topics there is:<a href="https://www.cs.cornell.edu/courses/cs6120/2025fa/" rel="nofollow">https://www.cs.cornell.edu/courses/cs6120/2025fa/</a>and<a href="https://mcyoung.xyz/2025/10/21/ssa-1/" rel="nofollow">https://mcyoung.xyz/2025/10/21/ssa-1/</a>So maybe writing a compiler with exactly one FE (for a simple language) and one BE (for a simple architecture), with say 80% of the optimizations could be a doable project.(1) We should define what we mean by that, because there are thousands of front-ends and back-ends.
 bmn__6 minutes ago
 > Is Dragon book still relevant?No, not at all, the teachings and techniques have been surpassed since four decades or so.The algorithm LALR is flawed, it only works for a subset of CFG instead of all. That alone is already a death blow. If you want to try out BNF grammars in the wild, it is nearly guaranteed that they are complex enough for LALR to shit itself with S-R conflicts.The technique of generating and dumping source code is awkward and the reasons that made that a necessity back then are no longer relevant. A good parser is simply a function call from a code library.The technique of tokenising, then parsing in a second pass is awkward, introduces errors and again the reasons that made that a necessity back then are no longer relevant. A good parser works "on-line" (term of art, not meaning "over a computer network" here) by tokenising and parsing at the same time/single-pass.The book precedes Unicode by a long time and you will not learn how to properly deal with text according to the rules laid out in its various relevant reports and annexes.The book does not take into consideration the syntactic and semantic niceties and features that regex have gained since and thus should definitely also be part of a grammar parser.> recommend any other learning resourcesDepends on what your goals are. For a broad and shallow theoretical introduction and to see what's out there, browse the slide decks of university lectures for this topic on the Web.
 anta408 hours ago
 I heard that new volume is updated with newer stuffs like data flow analysis, garbage collection, etc. Anyway the book doesn't teach you how to build a basic working compiler, so need to consult another materials.Try Andrew Appel's "Modern Compiler implementation in Java/C/ML" or Writing a C Compiler (<a href="https://norasandler.com/book" rel="nofollow">https://norasandler.com/book</a>) which is much more recent.Eventually, you'd want to hack GCC/LLVM because they are production-grade compilers.
 kronnpp4 hours ago
 I taught in the past and still like the trilogy of books> Modern Compiler Implementation by Andrew W. AppelIt comes in three flavors C, ML (Meta Language), and Java<a href="https://www.cs.princeton.edu/~appel/modern/" rel="nofollow">https://www.cs.princeton.edu/~appel/modern/</a>Writing a compiler in Standard ML is as natural as writing a grammar and denotational semantics.Compiler writing is becoming an extinct art.
 alabhyajindal1 hour ago
 I heard that the ML version was a translation of the C version, and is thus not easy to follow along. Or it may have been the other way around!
 yu3zhou43 hours ago
 Thanks!Are you sure it’s an extinct art though? LLVM is flourishing, many interesting IRs come to life like MLIR, many ML-adjacent projects build their own compilers (PyTorch, Mojo, tinygrad), many big tech like Intel, AMD, Nvidia, Apple and others contribute to multiple different compilers, projects integrate one to another at different levels of abstraction (PyTorch -> Triton -> CUDA) - there is a lot of compilation going on from one language to anotherNot to mention many languages in a mainstream that weren’t that popular 10 years ago - think Rust, Zig, Go
 - fragmede10 hours ago
 Real compiler engineering covers a lot of ground. This book is an intro to it, not the whole everything. No need to posture about it.
AdityaSanthosh11 hours ago
Hi, seems like an interesting course. I haven't studied compilers in my undergrad( I'm an electronics student) but I have been working as a programmer who studied c and bit of low level languages. Is there any prerequisite compiler knowledge required for this course?
- ktimespi11 hours ago
  The only prerequisite here is probably Racket, to follow along with the book
evnix5 hours ago
Most of these compiler projects and books would be 100x more popular and accessible if they were in Javascript
- wiseowise4 hours ago
  Just use LLM to translate example to whatever language you want.
  - derrida3 hours ago
    Now now, isn't that what a compiler does?