Since it mentions my paper again: "It also brings into question whether a “lean” approach that grabs the “environment pointer” or the “stack frame” pointer directly, as in n3654 is a good idea to start with." which is annoying since the approach promoted in n3654 is not even tested here. At least he continues with "But, it’s premature to condemn n3654 because it’s unknown if the problem is the fact that the use of accessing variables through what is effectively __builtin_stack_address and a trampoline is why performance sucks so bad, or if it’s the way the trampoline screws with the stack". So why insinuating it when you have no idea? It is well known that the trampoline approach is costly, and it seems particular bad on arm. But n3654 <i>does</i> not promote trampolines and instead wide function pointers which is similar to std::function_ref, but without wasting the static chain register and avoiding the need to create a thunk when converting a regular function pointer.<p>I am also not entirely sure whether "manorboy" is a good benchmark, but for estimating the function call overhead it should be ok. For the access to variables part, other considerations are far more important IMHO. So I am not sure I would put too much weight on the accessing "k" part of the post.<p>n3654 contains a criticism of the lambda approach from a language-design perspective. The issue is that it seems not to be a good fit for C. Copying values is always cheap, but this obviously works in toy examples but not necessarily for interesting data structures. In C++ you could then capture a pointer as a value, but then you haven't avoided an indirection either. In general, this is fine in C++ as smart pointer than deal with the memory management, but in C having a captured value in a lambda you can not have explicit access to anymore does not make too much sense.<p>The "unfortunately, is that unlike C++ there are no templates in C" is also interesting. I fled from C++ back to C exactly because of templates. In a performance context, the fallacy is that you can always create super-optimized code using compile-time techniques that absolutely shines in microbenchmarks (such as this one) but cause a lot of bloat on larger scale (and long compilations times). If you want this, I think you should stick to C++.
> I fled from C++ back to C exactly because of templates<p>Obviously I sympathize as someone who spent the front half of their career writing C and has no interest in writing C++ beyond toy examples. Nevertheless, IMO you're cutting off a choice here, forcing yourself to make potentially sub-optimal choices.<p><i>Sometimes</i> the templates would have been better and writing X-macros is just worse, ergonomically at least. Perhaps C++ people use templates more often than they should, but I'm sure "never" is not the correct amount.<p>Still, if C gets fat pointers that's at least a step in the right direction. So I must wish you the best with that part.
I used to write expression template libraries, so I know what C++ can do. I just figured out I am a lot more productive in C than in C++ and that in the time I waste on C++ I can do a lot more in C. I also do rarely use macros in C, so what C++ people get wrong is assuming that C programmer would do the same thing just using macros. This is not the case. One has to realize that compile-time programming is 95% of times a waste of time and effort and the same thing can be achieved in an easier and better way.<p>I also recently removed some code written with templates in C++ from one of my projects, because this single file caused compilation times for a ca. 450 file C project to increasse by 50% from 20s to ca. 30s.
> Still, if C gets fat pointers that's at least a step in the right direction.<p>AFAIK the C standard doesn't prevent an implementation from using fat pointers, this is one of the reasons why the conversion from pointer to integer only works in one direction. This is actually necessary for segmented memory. The compiler is allowed to optimize based on the assumption, which allocation a pointer comes from (including the allocations boundaries, i.e. size), even if bytewise the pointers would be equal, so you could argue, that C abstract machine already has fat pointers.
Well his "Normal Functions" (benchmarks/closures/source/normal_functions.cpp in his repo) looks quite similar to what I had with my GNU nested functions using a stand in "wide pointer", and hence no generated trampoline.<p>(<a href="https://news.ycombinator.com/item?id=46243298">https://news.ycombinator.com/item?id=46243298</a>)<p>Which rather suggests to me that such a scheme, but generated by the compiler, should have a similar performance to said "Normal Functions" and hence similar to his preferred lambda form.<p>Since his benchmark environment is so unwieldy, I may have a go at extracting those two code sets to a standalone environment, and measure them so see...
So here are my preliminary benchmarks with my own implementation on an AMD EPYC 9334 32-Core processo. I need to double checks things - so take this with a grain of salt for now. Time is in seconds for 100000 iterations of manorboy(10). So far, the only implementation which clearly sucks is std::function<>. Even trampolines are suprisingly good (but I can imagine that they are much worse on other CPUs / architectures)<p><pre><code> xgcc (GCC) 16.0.0 20260103 (experimental)
1.50 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack
1.11 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack -DREFARG
7.21 gcc -ftrampoline-impl=heap
7.34 gcc -ftrampoline-impl=heap -DREFARG
0.93 gcc -DWIDEPTR
1.38 gcc -DWIDEPTR -DREFARG
1.40 gcc -DDIRECT
1.05 gcc -xc++ -std=c++26 -DFUNCREF -DDEDUCING
19.68 gcc -xc++ -std=c++26 -DDEDUCING
20.73 gcc -xc++ -std=c++26
6.31 gcc -xc++ -std=c++26 -DDEDUCING -DREFARG
6.31 gcc -xc++ -std=c++26 -DREFARG
Debian clang version 16.0.6 (15~deb12u1)
21.11 clang -xc++
6.16 clang -xc++ -DREFARG
1.66 clang -fblocks
1.70 clang -fblocks -DREFARG</code></pre>
Actually the author is building onto my suggestion <a href="https://github.com/soasis/idk/pull/17/files" rel="nofollow">https://github.com/soasis/idk/pull/17/files</a> which I created several years ago, but adapted a bit too fast while submitting. And yes it looks very similar to GNU nested functions, since I started tinkering with these first.<p>I am really not sure if all these observations mentioned in the article are 100% correct, though.<p>First : Code seems to be compiled with clang. On Linux with gcc the native function one is way faster than the clang one.<p>Second: The author does run the code on ARM64/MacOS .<p>At least on my ryzen CPU on Linux with gcc the "normal C code" is way faster than anything else. Not that we do not need to thing about "closure" type functionality, but one should be careful to extrapolate implementations from one compiler on one platform to the rest of the pack.<p>Regarding N3654 I am not sure how to benchmark it here, since C could potentially use __builtin_call_with_static_chain , but I am not sure how to write the function to use the chain for accessing the variables.<p>I tried to estimate N3654 it by using "tinygo" which is AFAIK using the usual Calling ABI, but it was a factor of two slower than clang. Even "go" with its very specific ABI is still much slower. I discovered this isn't representative since runtime calling costs had been totally shadowed by costs of allocations.<p>Even the rust example I am usually using <a href="http://www.reddit.com/r/rust/comments/2t80mw/the_man_or_boy_test_in_rust/" rel="nofollow">http://www.reddit.com/r/rust/comments/2t80mw/the_man_or_boy_...</a> is much slower than anything else, presumably because of the "Cell" needed<p>TLDR: This micro benchmark might be misleading
See my other comment in this thread with my preliminary benchmarking results. I have a patched GCC with two new builtins __builtin_static_chain and __builtin_nested_code (needs a better name), that give you the static chain pointer and the code pointer without creating a trampoline. Then I put both into a structure to simulate a wide pointer. Later I call it with __builtin_call_with_static_chain.<p>A trick one can do is to let it create the trampoline and then read off the two pointer from the position in the code where it is stored. Not portable and you still have the overhead for creating the trampoline, but you do not need the executable stack anymore.
This article is so badly written, it makes it hard to understand something that should be easy to understand...
Those graph axes units are... perplexing, to say the least.<p><i>So, we need to swap to the logarithmic graphs to get a better picture</i><p>I wish more people would know about decibels.
Logarithmic axes are inifinitely better than decibels, because dB are just to overloaded semantically
>> So, we need to swap to the logarithmic graphs to get a better picture<p>> I wish more people would know about decibels.<p>Huh? Is there any difference? <a href="https://en.wikipedia.org/wiki/Decibel" rel="nofollow">https://en.wikipedia.org/wiki/Decibel</a>:<p><i>“The decibel […] expresses the ratio of two values of a power or root-power quantity on a logarithmic scale”</i>
Why struggling using qsort? std::sort from C++ is much better in terms of usability and performance.
FWIW, when the C compiler has access to the qsort implementation body and the comparer function like the C++ compiler has, it will also stamp out a specialized version that will in most cases be identical with the C++ compiler output for std::sort (assuming it implements the same sorting algorithm as C's qsort).<p>That whole 'zero cost abstraction' idea is not unique to C++ since all the important work to make the 'zero cost thing' happen is performed down in the optimizer passes, and those are identical between C and C++.
I see no inlining of qsort in at least two major compilers: <a href="https://godbolt.org/z/59xehaqnv" rel="nofollow">https://godbolt.org/z/59xehaqnv</a>.
Yeah but C++ encourages you to put things into templates which are inside headers, so of course the implementation body is exposed to the compiler all the time. Whereas a C compiler usually won’t have access to the qsort implementation body.
Because some people will never move beyond C, no matter what.
People will never move beyond C (among other reasons) because C allows precise control over memory allocation. Closures and precise control over memory allocation doesn't play very well together.
Urban myths by people frozen in time, systems programming languages predate C by a decade, and it isn't as if the world stopped outside Bell Labs in the 1970's.
Which systems (as in zero-runtime) language would you choose if not C (even if we disregard the fact that no SoC vendor support anything but C)?<p>- Ada? Many cool features, but that's also the problem - it's damn complicated.<p>- C++ combines simplicity of Ada with safety of C.<p>- Rust and Zig are only ~10 years old and haven't really stabilized yet. They also start to suffer from npm-like syndrome, which is much more problematic for a systems language.<p>- ATS? F#? Not all low-level stuff needs (or can afford) this level of assurance.<p>- Idris? Much nicer than ATS, but it's even younger than Rust and still a running target (and I'm not sure if zero-runtime support is there yet).<p>I mean, yes, C is missing tons of potentially useful features (syntactic macros, for one thing), but closures are not one of them.
You can start by learning that even C has a tiny runtime, followed by history of systems programming languages, starting with JOVIAL in 1958.<p>Even if not perfect, "Typescript for C" came out in 1983.
> even C has a tiny runtime<p>Which functions are required for a C program to run aside from those that it run explicitly? memcpy? It's basically a missing CPU instruction. malloc will never get called unless you call it yourself.<p>> followed by history of systems programming languages, starting with JOVIAL in 1958.<p>All system languages (e. g. BLISS, PL/S, NEWT) designed as such before C was vendor-specific. Some of these had nice things missing from C, but none propagated beyond their original platform. And today we have no option but C.<p>> "Typescript for C" came out in 1983.<p>C++ is not just "not perfect", it is far worse in every way. Let's let people overload everything. And let's overload shift operator to do IO. And make every error message no less than 1000 lines, with actual error somewhere in the middle. Let's break union and struct type punning because screw you that's why. You say C macros are unreadable? Behold, template metaprogramming! C is not perfect, but it has the justification of being minimal. C++ is a huge garbage dumpster on fire.
Everything that runs before main(), and on exit, floating point emulation when needed and IEEE support, signal handlers, threading support since C11, at least, then depends on what else Compiler C has as extensions.<p>C was also vendor specific to Bell Labs.<p>C++ is definitely Typescript for C.
It is <i>insane</i> to assert that Rust hasn't "really stabilized yet" in comparison to C, a language which has been replaced <i>twice</i> (as C17 and C23) after Rust 1.0
C17 didn't introduce any changes to the language. C23 mostly standardized existing compiler-specific goodies. None of this broke existing code. When I run `sudo dnf upgrade gcc` I can be 99.9% sure my old code still compiles.<p>Compare e. g. <a href="https://github.com/rust-lang/rust/pull/102750/" rel="nofollow">https://github.com/rust-lang/rust/pull/102750/</a> (I'm not following Rust development closely, picked up the first one). Yes, developers do rely on non-guaranteed behavior, that's their nature. C would likely standardize old behavior. Basically all of the C standard is a promoted vendor-specific extension - for better or worse.<p>Here is another good one: <a href="https://internals.rust-lang.org/t/type-inference-breakage-in-1-80-has-not-been-handled-well/21374" rel="nofollow">https://internals.rust-lang.org/t/type-inference-breakage-in...</a>
C17 ostensibly "didn't change" the language but it did still need to fix a bunch of stuff and it chose to do this by just issuing an entirely new language standards document.<p>C23 on the other hand landed a bunch of hard changes, and "it didn't break my code" only matches the reality most people observe with Rust too. The changes you mention didn't break my Rust either.<p>But some trivial C did break because C23 is a new language<p><a href="https://c.godbolt.org/z/n9vhMGYW5" rel="nofollow">https://c.godbolt.org/z/n9vhMGYW5</a><p>In fact those Rust changes <i>aren't</i> language changes unlike C23, the first you linked is a choice for the compiler to improve on layout it didn't guarantee, anybody who was relying on that layout (and there were a few) was <i>never</i> promised this wouldn't change, next version, next week or indeed the next time they compiled the same code. You can even ask Rust's compiler to help light a fire under you on this by arbitrarily changing some layout between builds, so that stuff which depends on unpromised layout breaks earlier, reminding you to actually tell Rust e.g. repr(C) "I need the same layout guarantees for my type T as I'd get in C" or repr(transparent) "I need to ensure my type wrapper T has the same layout as the type I'm wrapping"<p>The second isn't a language change at all, it's a change to a library feature, so now we're down to "C isn't stable until the C library is forever unchanging" which hopefully you recognise as entirely silly. Needless to say nobody does that.
Except that those are more like bug fixes, it's unlikely to have something newer than C11 and most are still on C99.
Actually lots is still on C89.<p>I'm trying to drag one program at $employer up to C99 (plus C11 _Generic), so I can then subsequently drag it to the bits of C23 which GCC 13 supports.<p>This all takes times, and having to convince colleagues during code reviews.<p>What C23 has done is authorise some of the extensions which GCC has had for some time as legitimate things (typeof, etc).<p>However the ability to adopt is also limited by what third party linters in use at $employer may also support.
Show us that you haven't read ISO C documents without telling us.
True, at most I read excerpts when I have a question. Can you tell me what gave it away? I thought the saying is that C17 is the sane version of C11 and C23 has quite some changes, but is way to new to be counted on.
C17 is indeed a bug fix release. C23 finally removed some features that were deprecated a long time ago already in the first standardized version (auto as storage classifier, K&R function definitions, empty parenthesis as "arbitrary arguments") and also support for ones' complement. So yes, C is extremely backwards compatible.
> Closures and precise control over memory allocation doesn't play very well together.<p>How so? In C++ a lambda is just a regular type that does not allocate any memory by itself. You have in fact precise control over how/where a lambda is allocated.
True, but once a capture needs to survive the parent function scope you'll need to store it somewhere, either via a std::function object which has opaque rules on whether a heap allocation happens or via your own std::function implementation where you define the heap allocation rules but then will have to face discussions about why you're rewriting the stdlib ;)<p>Any C implementation of capturing lambdas has the same problem of course, that's why the whole idea doesn't really fit into the C language IMHO.
I did a version with explicit capture objects for fun: <a href="https://github.com/fuhsnn/c-extensions#extended-capture" rel="nofollow">https://github.com/fuhsnn/c-extensions#extended-capture</a><p>Basically, with each capturing lambda, a context type _Ctxof(fn) is created that users can then declare on stack or on heap themselves.
Because C is a much nicer language than C++ by some measures.
Nah, C++ gives you the safety tools, even if with some flaws, that WG14 does not have any interest to ever add to C.
Nah - more that a lot of commercial code is written in it; and it doesn't make sense to replace (or rewrite) it at this time.<p>For example, I'm maintaining some 20 year old C code, which the employer adopted around 10 years ago. It will likely stay in use at least until the current product is replaced, whenever that may be.
Let me rephrase that: I feel myself addressed by "some people will never move beyond C, no matter what" and I prefer C over C++, because it is a much simpler language. Each time I am leaving the cozy world of C and write C++ I am annoyed and miss things.
If you care about performance and are willing to use a different programming language surely it would make more sense to use Rust's unstable sort ?
Lisp solves the performance hit of closures with macros. However, given what macros look like in C, I hope it never amounts to that!
> Cost of a Closure in C<p>C does not have closures. You could simulate closures, but it is neither robust not automatic compared to languages tha truly support them.
I think maybe this post would make more sense if you were familiar to its antecedent [0] which compares existing closure extensions to C. IIRC the comparison was in the context of considering designs for standardization.<p>[0]: <a href="https://thephd.dev/the-cost-of-a-closure-in-c-c2y" rel="nofollow">https://thephd.dev/the-cost-of-a-closure-in-c-c2y</a>
C does not currently have closures, the post it looking at their performance properties with an eye for what form closures should be added to the standard.
The post is about a potential <i>new</i> feature of C. The author is working on the C committee and has contributed new C features in the past.
Completely missed the point of the article, this is about ongoing C2y efforts at WG14.