System call instrumentation on Linux/x86‑64 using memory‑indirect calls, part I

(humprog.org)

23 points by matt_d4 days ago

2 comments

freestanding1 hour ago
that is graphomania. syscalls are easy and dont require so much bloat. beside its lefty GNUnix license
quotemstr2 hours ago
Linux is unusual in OS kernels in that direct system calls from arbitrary userspace code are supported and ABI-stable. This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.If, instead, as on OpenBSD, the kernel enforced the rule that all system calls had to go through libc (or perhaps a big ntdll.dll-like VDSO), then the whole problem the linked article tries in vain to solve would disappear. If you wanted to hook a system call, you'd just change the libc/VDSO dispatch. No need to rewrite any instructions.If I were Linus, I'd make a new rule: starting today, all new system calls must go through VDSO. No exceptions. SYSCALL from anywhere else? SIGKILL.This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.
- razighter7774 minutes ago
 Direct system calls are an amazing idea. The NtDll and bsd models are worse. The whole libc becomes a security boundary without the protection of kernel space. So much windows malware and process tampering happens because now you have a library (ntdll) fully in userspace that is given special privileges, which now becomes a huge attack surface. Then you have to deal with breakages between the built in libc versions and the kernelThis syscall overhead isn't as much as you suppose it is; for workloads where the syscall overhead actually makes a difference there are robust low-syscall paths for io/latency sensitive operations with DPDK, io_uring, and futex being a few examples.And there are robust performant methods on linux for syscall interception/tracing, see seccomp unotify, bpf tracepoints, ftrace.
- yjftsjthsd-h2 hours ago
 > This model has always been a terrible idea. It robs the system of an ability to intercept system calls in userspace before doing an expensive privilege-mode transition.This model has always been a trade-off. It has downsides, but it also has upsides, including an immense boost in flexibility; decoupling from any particular userspace is useful.> This way, you can just LD_PRELOAD in front of the VDSO and system call interception in userspace Just Works.Can you LD_PRELOAD in front of the vDSO? I was under the (possibly mistaken) impression that the kernel injects it directly.
 - mananaysiempre16 minutes ago
 > Can you LD_PRELOAD in front of the vDSO? I was under the (possibly mistaken) impression that the kernel injects it directly.The kernel puts the vDSO in memory and tells ld.so where it is, but where if anywhere ld.so will put it in the search order it implements is its own concern. (TBH I don’t actually know whether ld.so will actually allow LD_PRELOAD to override the vDSO, but there’s no reason for it not to, except I guess for the syscalls that are needed to perform the dynamic linking itself.)
- throwaway73561 hour ago
 > all system calls had to go through libc (or perhaps a big ntdll.dll-likeWhich makes containers crap on Windows and *BSD as they have to run the currect libc or equivalent. Thus you need to build a different container per OS version which sucks compared to Linux.
 - Joker_vD1 hour ago
 Windows doesn't even have its own libc.
 - yjftsjthsd-h41 minutes ago
 They said "or equivalent", so ntdll
- freestanding1 hour ago
 thats why OpenBSD is unconvinient for development - because it binds to libc bloatware
- Gualdrapo2 hours ago
 > If I were Linus, I'd make a new ruleOr, you know, just propose your idea to him
 - yjftsjthsd-h1 hour ago
 Based on <a href="https://www.phoronix.com/news/Linus-Torvalds-No-Random-vDSO" rel="nofollow">https://www.phoronix.com/news/Linus-Torvalds-No-Random-vDSO</a> , I had been under the impression that he wasn't fond of adding more use of vDSO. On rereading, I can't tell if that's a vDSO thing or a preference against fast randomness being provided by the kernel.