Is Dragon book still relevant? Do you recommend any other learning resources other than reading the source and contributing to llvm?
IMHO absolutely. The basics of lexer and parser are still there. Some of the optimizations are also relevant. You just cannot expect to read the book and be able to write GCC or LLVM from scratch(1).<p>For learning deeper about other advanced topics there is:<p><a href="https://www.cs.cornell.edu/courses/cs6120/2025fa/" rel="nofollow">https://www.cs.cornell.edu/courses/cs6120/2025fa/</a><p>and<p><a href="https://mcyoung.xyz/2025/10/21/ssa-1/" rel="nofollow">https://mcyoung.xyz/2025/10/21/ssa-1/</a><p>So maybe writing a compiler with exactly one FE (for a simple language) and one BE (for a simple architecture), with say 80% of the optimizations could be a doable project.<p>(1) We should define what we mean by that, because there are thousands of front-ends and back-ends.
> Is Dragon book still relevant?<p>No, not at all, the teachings and techniques have been surpassed since four decades or so.<p>The algorithm LALR is flawed, it only works for a subset of CFG instead of all. That alone is already a death blow. If you want to try out BNF grammars in the wild, it is nearly guaranteed that they are complex enough for LALR to shit itself with S-R conflicts.<p>The technique of generating and dumping source code is awkward and the reasons that made that a necessity back then are no longer relevant. A good parser is simply a function call from a code library.<p>The technique of tokenising, then parsing in a second pass is awkward, introduces errors and again the reasons that made that a necessity back then are no longer relevant. A good parser works "on-line" (term of art, not meaning "over a computer network" here) by tokenising and parsing at the same time/single-pass.<p>The book precedes Unicode by a long time and you will not learn how to properly deal with text according to the rules laid out in its various relevant reports and annexes.<p>The book does not take into consideration the syntactic and semantic niceties and features that regex have gained since and thus should definitely also be part of a grammar parser.<p>> recommend any other learning resources<p>Depends on what your goals are. For a broad and shallow theoretical introduction and to see what's out there, browse the slide decks of university lectures for this topic on the Web.
I heard that new volume is updated with newer stuffs like data flow analysis, garbage collection, etc. Anyway the book doesn't teach you how to build a basic working compiler, so need to consult another materials.<p>Try Andrew Appel's "Modern Compiler implementation in Java/C/ML" or Writing a C Compiler (<a href="https://norasandler.com/book" rel="nofollow">https://norasandler.com/book</a>) which is much more recent.<p>Eventually, you'd want to hack GCC/LLVM because they are production-grade compilers.
I taught in the past and still like the trilogy of books<p>> <i>Modern Compiler Implementation</i> by Andrew W. Appel<p>It comes in three flavors C, ML (Meta Language), and Java<p><a href="https://www.cs.princeton.edu/~appel/modern/" rel="nofollow">https://www.cs.princeton.edu/~appel/modern/</a><p>Writing a compiler in Standard ML is as natural as writing a grammar and denotational semantics.<p>Compiler writing is becoming an extinct art.
I heard that the ML version was a translation of the C version, and is thus not easy to follow along. Or it may have been the other way around!
Thanks!<p>Are you sure it’s an extinct art though? LLVM is flourishing, many interesting IRs come to life like MLIR, many ML-adjacent projects build their own compilers (PyTorch, Mojo, tinygrad), many big tech like Intel, AMD, Nvidia, Apple and others contribute to multiple different compilers, projects integrate one to another at different levels of abstraction (PyTorch -> Triton -> CUDA) - there is a lot of compilation going on from one language to another<p>Not to mention many languages in a mainstream that weren’t that popular 10 years ago - think Rust, Zig, Go