I'd measure twice before cutting. Almost everyone not deep into cross-language interop and VM design intuits, incorrectly, that FFI mechanisms themselves drive interop costs. In practice, it's almost never the case. While, in principle, compiling a libffi signature to native code could be a win, doing so matters a lot less often than you think.<p>Keep in mind that optimizing the <i>call</i> doesn't optimize the <i>marshaling</i>: even with an AOT-compiled FFI trampoline, if you're, say, sending a string from one place to another, you usually need to transform the string in some manner (copy it, change encoding, add/remove length prefixes, etc.) and JITing the libffi parameter passing won't help you do the string stuff any faster.<p>In fact, trying to AOT the connections can make your program worse, both by bloating it (causing some likely small, but still, cache pressure) and by complicating your build and deployment process.<p>libffi bytecode is good. I wouldn't bother with native code until I had a profile in hand showing the bytecode to be the bottleneck, and even then, I'd check it a three or four times to make sure I didn't get the profiling wrong. FFI is just seldom the problem in real-world systems.