How to make a fast dynamic language interpreter

(zef-lang.dev)

204 points by pizlonator12 hours ago

10 comments

pansa27 hours ago
In a similar vein, see this page about the performance of the interpreter for the dynamic language Wren: <a href="https://wren.io/performance.html" rel="nofollow">https://wren.io/performance.html</a>Unlike the Zef article, which describes implementation techniques, the Wren page also shows ways in which language design can contribute to performance.In particular, Wren gives up dynamic object shapes, which enables copy-down inheritance and substantially simplifies (and hence accelerates) method lookup. Personally I think that’s a good trade-off - how often have you really needed to add a method to a class after construction?
- versteegen4 hours ago
 Yes, language design is a hugely important determinant of interpreter or JIT speed. There are many highly optimised VMs for dynamic languages but LuaJIT is king because Lua is such a small and suitable language, and although it does have a couple difficult to optimise features, they are few enough that you can expend the effort. It's nothing like Python. It's not much of an exaggeration to say Python is designed to minimise the possibility of a fast JIT, with compounding layers of dynamism. After years of work, the CPython 3.15 JIT finally managed ~5% faster than the stock interpreter on x86_64.
 - pjmlp1 hour ago
 CPython current state is more a reflection of resources spent, than what is possible.See experience with Smalltalk and Self, where everything is dynamic dispatch, everything is an object, in a live image that can be monkey patched at any given second.PyPy and GraalPy, and the oldie IronPython, are much better experiences than where CPython currently stands on.
 - dontlaugh3 hours ago
 Python is worse, but not by all that much. After all, PyPy has been several times faster for many years.
- psychoslave6 hours ago
 That’s basically what is done all the time in languages where monkey patching is accepted as idiomatic, notably Ruby. Ruby is not known for its speed-first mindset though.On the other side, having a type holding a closed set of applicable functions is somehow questioning.There are languages out there that allows to define arbitrary functions and then use them as a methods with dot notation on any variable matching the type of the first argument, including Nim (with macros), Scala (with implicit classes and type classes), Kotlin (with extension functions) and Rust (with traits).
 - pjmlp1 hour ago
 It is getting better, now that they finally got the Smalltalk lessons from 1984."Efficient implementation of the smalltalk-80 system"<a href="https://dl.acm.org/doi/10.1145/800017.800542" rel="nofollow">https://dl.acm.org/doi/10.1145/800017.800542</a>
jiusanzhou6 hours ago
The jump from change #5 to #6 (inline caches + hidden-class object model) doing the bulk of the work here really tracks with how V8/JSC got fast historically — dynamic dispatch on property access is where naive interpreters die, and everything else is kind of rounding error by comparison. Nice that it's laid out so you can see the contribution of each step in isolation; most perf writeups just show the final number.
- jimmypk25 minutes ago
 @jiusanzhou The interesting implementation detail in change #6 is how the inline caching is done in an AST-walking interpreter specifically. In bytecode interpreters, IC rewriting is natural — the "cache site" is a stable byte offset in the bytecode stream you can patch. Here, the cache site is an AST node, so @pizlonator uses placement new to construct a specialized AST node on top of the generic one in-place (via constructCache<>). It's self-modifying code at the AST level.The tradeoff is that this requires mutable AST nodes, which conflicts with the immutable-AST assumption most compilers rely on (e.g., for sharing subtrees or parallelizing compilation). For a single-threaded interpreter it works cleanly, but it'd be a problem if you wanted to JIT-compile from the same AST on a background thread while the interpreter is mutating nodes.
- Someone3 hours ago
 I agree, but there’s a tiny caveat that this is for one specific benchmark that, I think, doesn’t reflect most real-world code.I’m basing that on the 1.6% improvement they got on speeding up sqrt. That surprised me, because, to get such an improvement, the benchmark must spend over 1.6% of its time in there, to start with.Looking in the git repo, it seems that did happen in the nbody simulation (<a href="https://github.com/pizlonator/zef/blob/master/ScriptBench/nbody.zef" rel="nofollow">https://github.com/pizlonator/zef/blob/master/ScriptBench/nb...</a>).
grg011 hours ago
Interesting, thanks for sharing. It is a topic I'd like to explore in detail at some point.I also like how, according to Github, the repo is 99.7% HTML and 0.3% C++. A testament to the interpreter's size, I guess?
- pizlonator11 hours ago
 I committed the statically generated site, which is wastefully large because how I generate the code browsersBut yeah the interpreter is very small
tnelsond43 hours ago
I use the bounds checker in TCC to check for memory errors in C, should I switch to Fil-C instead to debug my code? Obviously yolo-C is my target.
tiffanyh9 hours ago
I see Lua was included, wish LuaJIT was as well.
- pizlonator9 hours ago
 I bet LuaJIT crushes Zef! Or rather, I would hope that it does, given how much more engineering went into itThere are many runtimes that I could have included but didn’t.Also, it’s quite impressive how much faster PUC Lua is than QuickJS and Python
 - raincole8 hours ago
 Because QuickJS is really slow. Don't be fooled by the name. It's almost an order of magnitude slower than node/v8.(I suppose the quick in QuickJS means "quick for a pure interpreter without JIT compilation or something...)
 - pizlonator8 hours ago
 based on this data, it’s probably slower than JSC’s or V8’s interpreterSo like that’s wild
 - zephen9 hours ago
 > it’s quite impressive how much faster PUC Lua is than QuickJS and PythonPython's execution time is mostly spent looking up stuff. I don't think lua is quite as dynamic.
 - pizlonator9 hours ago
 Lua is way more dynamic
 - zephen8 hours ago
 I suppose it depends on where you are looking for dynamicity. In some ways, lua is much more laissez faire of course.But in Python, everything is an object, which is why, as I said, it spends much of its time looking things up. And things like bindings for closures are late, so that's more lookups as well.In lua, many things aren't objects, and, for example, you can add two numbers without looking anything up. Another issue, of course, when you do that, is that you could conceivably overflow an integer, but that can't happen in Python either.The Python interpreter has some fast paths for specific object types, but it is really limited in the optimizations it can do, because there simply aren't any unboxed types.
 pizlonator48 minutes ago
 I think you’re describing deficiencies in the Python impl not anything about the language
 psychoslave5 hours ago
 Nop, Python is not full object. Not even Ruby is fully object, try `if.class` for example. Self, Smalltalk, Lisp, and Io are fully object in that sense. But none as far as I know can handle something like `(.class`.
 zelphirkalt4 hours ago
 Aren't you mixing up syntax and the concepts it expresses? Why would (.class have to be a thing? Is space dot class a thing? I don't think this makes sense and it doesn't inform about languages "being fully object". Such syntax is merely for producing an AST and that alone doesn't mean "object" or "not object". It could just as well be all kinds of different things, or functions, or stack pushes and pops or something.
 psychoslave2 hours ago
 >Why would (.class have to be a thing?It doesn’t have to in the absolute. It just that if some speech seel that a programing language is completely object oriented, it’s fun to check to which point it actually is.There are many valid reasons why one would not to do that, of course. But if it’s marketed as if implicitly one could expect it should, it seems fair to debunk the myth that it’s actually a fully object language.>Is space dot class a thing?Could be, though generally spaces are not considered like terms – but Whitespace shows it’s just about what is conventionally retained.So, supposing that ` .class` and `.class` express the same value, the most obvious convention that would come to my mind then would be to consider that it’s applied to the implicit narrower "context object" in the current lexical scope.Raku evaluate `.WHAT` and `(.WHAT)` both as `(Any)` for giving a concrete example of related choice of convention.>Such syntax is merely for producing an AST and that alone doesn't mean "object" or "not object".Precisely, if the language is not providing complete reflection facility on every meaningful terms, including syncategorematic ones, then it’s not fully object. Once again, being almost fully object is fine, but it’s not being fully object.<a href="https://en.wikipedia.org/wiki/Syncategorematic_term" rel="nofollow">https://en.wikipedia.org/wiki/Syncategorematic_term</a>
 gdwatson3 hours ago
 I think the idea is that SmallTalk replaced conditional syntax with methods on booleans. You could call `ifTrue:` on a boolean, passing it a code block; a true boolean would execute the block, and a false boolean would not. (There was also an `ifFalse:` method.)This feels more like a party trick than anything. But it does represent a deep commitment to founding the whole language on object orientation, even when it seems silly to folks like me.
 zephen4 hours ago
 You obviously realize that different languages have different syntactic requirements, yet you are willing to cut one language a break when its minimal syntactical elements aren't objects, and refuse to cut other languages a break because they have a few more syntactical elements?
boulos10 hours ago
How's your experience with Fil-C been? Is it materially useful to you in practice?
- pizlonator10 hours ago
 I’m biased since I’m the Fil.It was materially useful in this project.- Caught multiple memory safety issues in a nice deterministic way, so designing the object model was easier than it would have been otherwise.- C++ with accurate GC is a really great programming model. I feel like it speeds me up by 1.5x relative to normal C++, and maybe like 1.2x relative to other GC’d languages (because C++’s APIs are so rich and the lambdas/templates and class system is so mature).But I’m biased in multiple ways- I made Fil-C++- I’ve been programming in C++ for like 35ish years now
 - HarHarVeryFunny50 minutes ago
 Are you using malloc + GC in preference to smart pointers, and if so why?It doesn't seem like that is necessarily a performance win, especially since you could always use a smart pointer's raw pointer (preferably const) in a performance critical path.
 - vlovich1238 hours ago
 I’m curious. Given the overheads of Fil-C++, does it actually make sense to use it for greenfield projects? I like that Fil-C fills a gap in securing old legacy codebases, I’m just not sure I understand it for greenfield projects like this other than you happen to know C++ really well.
 - pizlonator8 hours ago
 It made sense because I was able to move very quickly, and once perf became a problem I could move to Yolo-C++ without a full rewrite.> happen to know C++ really wellThat’s my bias yeah. But C++ is good for more than just perf. If you need access to low level APIs, or libraries that happen to be exposed as C/C++ API, or you need good support for dynamic linking and separate compilation - then C++ (or C) are a great choice
 - vlovich1237 hours ago
 Hmmm… I did about 20+ years of C++ coding and since I’ve been doing Rust I haven’t seen any of these issues. It has trivial integrations with c/c++ libraries (often with wrappers already written), often better native libraries to substitute those c++ deps wholesale, and separate compilation out of the box. It has dynamic linking if you really need it via the C ABI or even rlib although I’ll grants the latter is not as mature.The syntax and ownership rules can take some getting used to but after doing it I start to wonder how I ever enjoyed the masochism of the rule of 5 magic incantation that no one else ever followed and writing the class definition twice. + the language gaining complexity constantly without ever paying back tech debt or solving real problems.
 throwawayzzzzzz36 minutes ago
 [dead]
injidup7 hours ago
What is this YOLO-c++ compiler that is referenced in the article? Google searches turn up nothing and chatgpt seems not to know it either.
- electroly7 hours ago
  The author of Fil-C, who is also the author of this language, uses "Yolo-C/C++" to mean regular C/C++ without Fil-C.
valorzard6 hours ago
Do you think this exercise has taught you anything that could make fil c itself better?
- pizlonator46 minutes ago
 Yeah I really need to have a better fix for how I handle unions.And the fact that having outline calls to methods of value objects is so expensive
catlifeonmars3 hours ago
Do you run an optimization pass on the AST between parsing and evaluation?
- HarHarVeryFunny44 minutes ago
 For an interpreter / AST executor, I think a big win would be efficient parsing in the first place, in particular using a precedence parser for expressions vs recursive descent, which would avoid the need to optimize the AST to remove the 1:1 "unit productions" in the grammar.
- pizlonator46 minutes ago
 I run a “resolve” pass.That’s where for example getter inference happens.
Futurmix11 hours ago
[flagged]