the "lambda the ultimate" papers and the birth of scheme was a loong time ago, so it grates on my ears to hear this topic presented as "an optimization". Yes, it is sometimes an optimization a compiler can make, but the idea is much better presented as a useful semantic of a language.<p>in the same way that passing parameters to a subfunction "creates" a special set of local variables for the subfunction, the tail recursion semantic updates this set of local variables in an especially clean way for loop semantics, allowing "simultaneous assignment" from old values to new ones.<p>(yes, it would be confusing with side effected C/C++ operators like ++ because then you'd need to know order of evaluation or know not to do that, but those are already issues in those languages quite apart from tail recursion)<p>because it's the way I learned it, I tend to call the semantic "tail recursion" and the optimization "tail call elimination", but since other people don't do the same it's somewhat pointless; but I do like to crusade for awareness of the semantic beyond the optimization. If it's an optimization, you can't rely on it because you could blow the stack on large loops. If it's a semantic, you can rely on it.<p>(the semantic is not entirely "clean" either. it's a bit of a subtle point that you need to return straightaway the return value of the tail call or it's not a tail call. fibonacci is the sum of the current with the next so it's not a tail call unless you somewhat carefully arrange the values you pass/keep around. also worth pointing out that all "tail calls" are up for consideration, not just recursive ones)
In a weird way it kinda reminds me of `exec` in sh (which replaces the current process instead of creating a child process). Practically, there's little difference between these two scripts:<p><pre><code> #!/bin/sh
foo
bar
</code></pre>
vs<p><pre><code> #!/bin/sh
foo
exec bar
</code></pre>
And you could perhaps imagine a shell that does "tail process elimination" to automatically perform the latter when you write the former.<p>But the distinction <i>can be</i> important due to a variety of side effects and if you could only achieve it through carefully following a pattern that the shell might or might not recognize, that would be very limiting.
this is pretty much exactly how my "forth" handles tail call elimination, and it's the main thing that's added the quotes so far since it shifts the mental burden to being aware of this when writing code to manipulate the return stack.<p>as you imply towards the end, i'm not confident this is a trick you can get away with as easily without the constraints of concatenative programming to railroad you into it being an easily recognizable pattern for both the human and the interpreter.
One of the issues with Java is that it is two levels of language. You compile Java into Java Byte code which is further compiled into native machine code. There is no concept of tail call recursion in Java Byte code. So, it is difficult to propagate the semantics. So it really has to be a programmer or compiler optimization to implement the tail call optimization into the generated intermediate bytecode before that is further compiled.<p>.NET is an interesting contrast. The equivalent of Java Bytecode in .NET (CIL) does have the concept of tail calls. This allows a functional language like F# to be compiled to the intermediate form without losing the tail call concept. It is still up to the first level compiler though. C# for example does not support tail calls even though it’s intermediate target (CIL) does.
Sigh. I have been kicking this horse forever as well: an "optimization" implies just a performance improvement.<p>Tail call elimination, if it exists in a language, allows coding certain (even infinite) loops as recursion - making loop data flow explicit, easier to analyze, and at least in theory, easier to vectorize/parallelize, etc<p>But if a language/runtime doesn't do tail call elimination, then you CAN'T code up loops as recursion, as you would be destroying you stack. So the WAY you code, structure it, must be different.<p>Its NOT an optimization.<p>I have no idea who even came up with that expression.
I mean, in the particular case demonstrated in this blog post it can only be an optimization, because semantically guaranteeing it would require language features that Java doesn't have.
Every compiler should recognize and optimize for tail recursion. It's not any harder than most other optimizations, and some algorithms are far better expressed recursively.<p>Why is this not done?
In general, tail recursion destroys stacktrace information, e.g. if f calls g which tail calls h, and h crashes, you won't see g in the stacktrace, and this is bad for debuggability.<p>In lower level languages there are also a bunch of other issues:<p>- RAII can easily make functions that appear in a tail position not actually tail calls, due to destructors implicitly running after the call;<p>- there can be issues when reusing the stack frame of the caller, especially with caller-cleanup calling conventions;<p>- the compiler needs to prove that no pointers to the stack frame of the function being optimized have escaped, otherwise it would be reusing the memory of live variables which is illegal.
I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had. Tail recursion is equivalent to looping. People should stop complaining when it gives them the same information as looping.
> I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had.<p>That is a common criticism. You're referring to the functional programmers. They would typically argue that building up state based on transient loop variables is a mistake. The body of a loop ideally should be (at the time any stack trace gets thrown) a pure function of constant values and a range that is being iterated over while being preserved. That makes debugging easier.
I mean, if I were doing an ordinary non-recursive function call that just happened to be in tail position, and it got eliminated, and this caused me to not be able to get the full stack trace while debugging, I might be annoyed.<p>In a couple languages I've seen proposals to solve this problem with a syntactic opt-in for tail call elimination, though I'm not sure whether any mainstream language has actually implemented this.
Language designers could keep taking ideas from Haskell, and allow functions to opt in to appearing in stack traces. Give the programmer control, and all.
Kotlin has a syntactic opt-in for tail call elimination (the "tailrec" modifier).
<a href="https://clojuredocs.org/clojure.core/recur" rel="nofollow">https://clojuredocs.org/clojure.core/recur</a>
Some of the issues partially alleviated by using limited part of tail recursion optimization. You mark some function with tailrec keyword, and compiler verifies that this function calls itself as the last statement. You also wouldn't expect complete stack trace from that function. At the same time it probably helps with 90% of recursive algorithms which would benefit from the tail recursion.
AFAIK Zig is the only somewhat-big and known low-level language with TCO. Obviously, Haskell/Ocaml and the like support and it are decently fast too, but system programming languages they are not.
For guarantee:<p><a href="https://crates.io/crates/tiny_tco" rel="nofollow">https://crates.io/crates/tiny_tco</a><p><a href="https://crates.io/crates/tco" rel="nofollow">https://crates.io/crates/tco</a><p>As an optimization my understanding is that GCC and LLVM implement it so Rust, C, and C++ also have it implicitly as optimizations that may or may not apply to your code.<p>But yes, zig does have a formal language syntax for guaranteeing tail calls to happen at the language level (which I agree with as the right way to expose this optimization).
Zig's tco support is not much different than Clang's `[[clang::musttail]]` in C++. Both have the big restriction that the two functions involved are required to have the same signature.
> Both have the big restriction that the two functions involved are required to have the same signature.<p>I did not know that! But I am a bit confused, since I don't really program in either language. Where exactly in the documentation could I read more about this? Or see more examples?<p>The language reference for @call[0] was quite unhelpful for my untrained eye.<p>[0] <a href="https://ziglang.org/documentation/master/#call" rel="nofollow">https://ziglang.org/documentation/master/#call</a>
Generally I also find Zig's documentation pretty lacking, instead I try looking for the relevant issues/prs. In this case I found comments on this issues [1] which seem to still hold true. That same issue also links to the relevant LLVM/Clang issue [2], and the same restriction is also being proposed for Rust [3]. This is were I first learned about it and prompted me to investigate whether Zig also suffers from the same issue.<p>[1]: <a href="https://github.com/ziglang/zig/issues/694#issuecomment-1567447672" rel="nofollow">https://github.com/ziglang/zig/issues/694#issuecomment-15674...</a>
[2]: <a href="https://github.com/llvm/llvm-project/issues/54964" rel="nofollow">https://github.com/llvm/llvm-project/issues/54964</a>
[3]: <a href="https://github.com/rust-lang/rfcs/pull/3407" rel="nofollow">https://github.com/rust-lang/rfcs/pull/3407</a>
This limitation is to ensure that the two functions use the exact same calling convention (input & output registers, and values passed via stack). It can depend on the particular architecture.
C++:<p>> All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade)<p><a href="https://stackoverflow.com/questions/34125/which-if-any-c-compilers-do-tail-recursion-optimization" rel="nofollow">https://stackoverflow.com/questions/34125/which-if-any-c-com...</a> (2008)
Depends on what you mean by "systems programming", you can definitely do that in OCaml.
"Unix system programming in OCaml"<p><a href="https://ocaml.github.io/ocamlunix/ocamlunix.html" rel="nofollow">https://ocaml.github.io/ocamlunix/ocamlunix.html</a><p>MirageOS<p><a href="https://mirage.io/" rel="nofollow">https://mirage.io/</a><p>House OS,<p><a href="https://programatica.cs.pdx.edu/House/" rel="nofollow">https://programatica.cs.pdx.edu/House/</a><p>Just saying.
I know of these. Almost added a disclaimer too -- that was not my point, as I am sure, you understand. Also Ocaml has a GC, unsuitable for many applications common to systems programming.
My bigger issue with tail call optimization is that you really want it to be enforceable since if you accidentally deoptimize it for some reason then you can end up blowing up your stack at runtime. Usually failure to optimize some pattern doesn’t have such a drastic effect - normally code just runs more slowly. So tail call is one of those special optimizations you want a language annotation for so that if it fails you get a compiler error (and similarly you may want it applied even in debug builds).
Parroting something i have heard at a Java conference several years ago, tail recursion remove stack frames but the security model is based on stack frames, so it has to be a JVM optimization, not a compiler optimization.<p>I've no idea if this fact still holds when the security manager will be removed.
The security manager was removed (well, “permanently disabled”) in Java 24. As you note, the permissions available at any given point can depend on the permissions of the code on the stack, and TCO affects this. Removal of the SM thus removes one impediment to TCO.<p>However, there are other things still in the platform for which stack frames are significant. These are referred to as “caller sensitive” methods. An example is Class.forName(). This looks up the given name in the classloader of the class that contains the calling code. If the stack frames were shifted around by TCO, this might cause Class.forName() to use the wrong classloader.<p>No doubt there are ways to overcome this — the JVM does inlining after all — but there’s work to be done and problems to be solved.
In theory, if all you do is implement algorithms, this sounds fine. But most apps implement horrible business processes, so what would one do with missing stacktraces? Maybe in languages that can mark functions as pure.
Very nice article demonstrating a neat use of ASM bytecode. The Java language devs are also working on Project Babylon (code reflection), which will bring additional techniques to manipulate the output from the Java compiler: <a href="https://openjdk.org/projects/babylon/articles/code-models" rel="nofollow">https://openjdk.org/projects/babylon/articles/code-models</a>
Scala has been using this technique for years with its scala.annotation.tailrec annotation. Regardless, it's cool to see this implemented as a bytecode pass.
Kotlin as well, with the "tailrec" keyword, e.g. "tailrec fun fibonacci()"<p><a href="https://kotlinlang.org/docs/functions.html#tail-recursive-functions" rel="nofollow">https://kotlinlang.org/docs/functions.html#tail-recursive-fu...</a><p>Kotlin also has a neat other tool, "DeepRecursiveFunction<T, R>" that allows defining deep recursion that is not necessarily tail-recursive.<p>Really useful if you wind up a problem that is most cleanly solved with mutual recursion or similar:<p><a href="https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-deep-recursive-function/" rel="nofollow">https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-deep-r...</a>
It's been a long time since I've messed with Java bytecode [1], but shouldn't the private method call use INVOKESPECIAL?<p>In general I don't think you can do this to INVOKEVIRTUAL (or INVOKEINTERFACE) as it covers cases where your target is not statically resolved (virtual/interface calls). This transformation should be limited to INVOKESTATIC and INVOKESPECIAL.<p>You also need lots more checks to make sure you can apply the transformations, like ensure the call site is not covered by a try block, otherwise this is not semantics preserving.<p>1: <a href="https://jauvm.blogspot.com/" rel="nofollow">https://jauvm.blogspot.com/</a>
I never understood the need for tail recursion optimization in imperative languages. Sure, you need it in FP if you don't have loops and recursion is you only option, but what is the benefit of recursive algorithms, that could benefit from tail optimization (i.e recursive loops), in a language like Java?
Cool, now ABCL can have TCO!
This isn't a _general_ tail call optimization--just tail recursion. The issue is that this won't support mutual tail recursion.<p>e.g.:<p>(defun func-a (x) (func-b (- x 34))
(defun func-b (x) (cond ((<= 0 x) x)
('t (func-a (-x 3))))<p>Because func-a and func-b are different (JVM) functions, you'd need an inter-procedural goto (i.e. a tail call) in order to natively implement this.<p>As an alternative, some implementations will use a trampoline. func-a and func-b return a _value_ which says what function to call (and what arguments) for the next step of the computation. The trampoline then calls the appropriate function. Because func-a and func-b _return_ instead of actually calling their sibling, the stack depth is always constant, and the trampoline takes care of the dispatch.
Finally.<p>The ANTLR guys went through terrible contortions for their parsers.<p>Never felt like working those details out for ABCL.
[dead]