Gathering Linux Syscall Numbers in a C Table

(t-cadet.github.io)

97 points by phi-system21 days ago

9 comments

halb16 days ago
There is an existing project that tracks and gather syscalls in the linux kernel, for all ABIs: <a href="https://github.com/mebeim/systrack" rel="nofollow">https://github.com/mebeim/systrack</a> . The author maintains a table here, which is incredibly useful: <a href="https://syscalls.mebeim.net/?table=x86/64/x64/latest" rel="nofollow">https://syscalls.mebeim.net/?table=x86/64/x64/latest</a>
- tanelpoder16 days ago
 I also wrote a little Python tool that iterates through syscall tracepoint declarations in debugfs (/sys/kernel/debug) and lists available syscalls and their arguments available in your currently running system:<a href="https://tanelpoder.com/posts/list-linux-system-call-arguments-with-syscallargs/" rel="nofollow">https://tanelpoder.com/posts/list-linux-system-call-argument...</a>Debugfs does not show platform-specific syscall internal numbers though (but the stable syscall IDs).Apparently debugfs does not show all syscalls, excluding "some weird ones" as mentioned by mebeim/systrack author in an earlier HN discussion:<a href="https://news.ycombinator.com/item?id=41018135#41020166">https://news.ycombinator.com/item?id=41018135#41020166</a>
- westurner15 days ago
 TIL about systrack, which extracts syscalls from vmlinuz kernel images. <a href="https://github.com/mebeim/systrack" rel="nofollow">https://github.com/mebeim/systrack</a>/? tool to dump a list of all syscalls in a binary on Linux, like nm objdump, transitively searches dynamically linked <a href="https://www.google.com/search?q=tool+to+dump+a+list+of+all+syscalls+in+a+binary+on+Linux%2C+like+nm+objdump%2C+transitively+searches+dynamically+linked" rel="nofollow">https://www.google.com/search?q=tool+to+dump+a+list+of+all+s...</a> :- list-syscalls.rb "A script to statically list syscalls used by a given binary" <a href="https://gist.github.com/koute/166f82bfee5e27324077891008fca6eb" rel="nofollow">https://gist.github.com/koute/166f82bfee5e27324077891008fca6...</a>- "B-Side: Binary-Level Static System Call Identification" (2024) x86-64 <a href="https://arxiv.org/abs/2410.18053v1" rel="nofollow">https://arxiv.org/abs/2410.18053v1</a>- Systemd has SyscallFilter=From <a href="https://news.ycombinator.com/item?id=44947469">https://news.ycombinator.com/item?id=44947469</a> :> desbma/shh generates SyscallFilter and other systemd unit rules from straces similar to how audit2allow generates SELinux policies by grepping for AVC denials in permissive modedesbma/shh: <a href="https://github.com/desbma/shh" rel="nofollow">https://github.com/desbma/shh</a>
- rwmj16 days ago
 And <a href="https://gpages.juszkiewicz.com.pl/syscalls-table/syscalls.html" rel="nofollow">https://gpages.juszkiewicz.com.pl/syscalls-table/syscalls.ht...</a> , <a href="https://github.com/hrw/syscalls-table/" rel="nofollow">https://github.com/hrw/syscalls-table/</a>
- phkahler15 days ago
 Cosmopolitan also deals with things at this level across OSes too:<a href="https://cosmo.zip/" rel="nofollow">https://cosmo.zip/</a>
yjftsjthsd-h16 days ago
I remain surprised that people have to make unofficial syscall lists at all. Linux takes the view that syscalls are the official stable ABI of the kernel. I always assumed, therefore, that if that was the official interface, then it would of course be a documented interface. That assumption lasted right up until recently when I decided to learn assembly, and discovered that no, if I wanted to know what numbers to shove in registers before running syscall, I have to look up unofficial tables/docs. Like... that's weird, right? If this was another OS, they'd say that libc was the official interface so of course syscalls are an undocumented implementation detail. But when syscalls are the official interface, for them to be undocumented seems bizarre. Am I missing something?
- dundarious15 days ago
 There's nothing strange in this domain about relying on people to read the C header. On a running system `man syscall` points you to sys/syscall.h. Read /usr/include/sys/syscall.h which says it gets the actual numbers from kernel header asm/unistd.h. Read that and it redirects in simple if-else fashion to the specific header, e.g., asm/unistd_64.h. That file is clear as day:<pre><code> #ifndef _ASM_UNISTD_64_H #define _ASM_UNISTD_64_H #define __NR_read 0 #define __NR_write 1 #define __NR_open 2 ... #endif /* _ASM_UNISTD_64_H */ </code></pre> That was all on my x86-64 machine. Same again on an aarch64:<pre><code> #define __NR_io_setup 0 #define __NR_io_destroy 1 #define __NR_io_submit 2 ... </code></pre> I'm not saying that wanting a table on the web or a spreadsheet or whatever is bad or wrong, but it is not a difficult or obtuse task. I think people who write code that does such things are generally familiar with just reading some C headers, or if they're already using C they just `#include <sys/syscall.h>` and call it a day.Then on the calling convention, etc., the nolibc project (in the kernel tree) is great for learning or smaller projects (but of course Agner Fog's docs are the "canon" there).
 - adrian_b15 days ago
 The header mentioned by you does not belong to Linux.It is a glibc header. It is the right header to use when you invoke syscalls using the generic syscall wrappers provided by glibc.However, glibc frequently is not synchronized with your current Linux kernel, but with some older version, so this header typically does not contain the most recently added syscalls. Even for the older syscalls, I am not certain that glibc provides all of them.The authoritative list of syscalls must be gathered from the Linux kernel headers, not from glibc. What must be done for this is not as simple as you would expect, hence the several places mentioned by various posters where this tedious work has been done.
 - dundarious15 days ago
 True, I've had to deal with that for newer syscalls and the few that glibc neglects to cover. I didn't mention it, and I suppose the original post was about lack of kernel documentation, so mentioning a glibc source (or musl or whatever) is misleading in a way I didn't originally consider.
- t-315 days ago
 It is somewhat documented. man 2 syscalls tells me they are defined in /usr/include/asm/unistd.h. That file include /usr/include/asm/unistd_{32,64}.h, which contain the definitions on my amd64 linux box. On my aarch64 they're in /usr/include/asm-generic/unistd.h, but the syscalls manfile doesn't mention the changed path.
 - adrian_b15 days ago
 Those are glibc headers not Linux kernel headers.The glibc headers do not necessarily match your current Linux kernel.You should use the glibc headers when you use the glibc generic syscall wrappers, but otherwise you must not consider them as an authoritative source for syscalls, because they frequently do not contain all the syscalls provided by your current kernel.
 - jcalvinowens15 days ago
 > Those are glibc headers not Linux kernel headers.You're right in principle: but more precisely, they are the kernel headers for the kernel version which the system glibc was built against. But they are actually from the kernel source, not the glibc source.
 - dundarious15 days ago
 The manpage defines the interface in terms of sys/syscall.h where it leads to the right place on every platform I've ever worked on, and which is where I would first look, but yeah, maybe not all sections are clear.
 - t-315 days ago
 That's the syscall(9) manpage, not the syscalls(2) manpage. syscall(9) is present on BSD as well, and on my OpenBSD box points me to sys/syscall.h which has the syscalls. On linux sys/syscall.h is empty and includes asm/unistd.h.
 - adrian_b15 days ago
 The manpage syscall(2) exists on both Linux and FreeBSD, while syscall(9) does not exist on either of them.On Linux there is also a syscalls(2) manpage, while no syscalls page exists on FreeBSD.These man pages belong to libc (e.g. glibc on Linux), not to the kernel. This distinction does not matter on FreeBSD and other *BSD, where the kernel and the standard C library are always synchronized, but it matters on Linux, where glibc and the kernel are developed independently, so their lists of syscalls are not the same. Typically glibc is synchronized with an older Linux kernel, not with your current kernel.
 t-315 days ago
 Ah, you're right. syscall(2) must have been moved to syscall(9) on OpenBSD when the syscall function was removed and pinsyscalls was added.
 - adrian_b15 days ago
 That syscall man page is from glibc and provides information about how you can invoke syscalls through glibc.It does not have any direct connection with the Linux kernel. Because the Linux kernel promises that the syscall interface is stable, in the normal situation when the kernel is newer or at least of the same age with glibc, all the syscalls that can be invoked through glibc should be supported by the kernel, but the kernel may support extra syscalls.If you install a kernel that is older than glibc, which may happen in some embedded systems that are compatible only with some old kernels, then it may be that the kernel does not support all the syscalls from the glibc headers.
- surajrmal16 days ago
 Fun fact, those numbers are also not consistent per architecture. Any similarly named syscalls can have very different abi on different architectures.
 - yjftsjthsd-h15 days ago
 IIRC, there are also differences in what syscalls even exist per-arch... although I suppose technically that doesn't have to preclude their numbers lining up.For that matter, how you even make syscalls varies by arch, eg.<pre><code> syscall </code></pre> on x86_64 vs<pre><code> int 0x80 </code></pre> on i386.
- andrewmcwatters16 days ago
 All people unfamiliar with Linux at a documentation level assume that because Linux is Linux it must be pretty well documented, but in reality, just building the thing and creating an init is extremely poorly documented process for such mature software.You’re not missing anything. It’s amazing Linux makes any progress at all, because the most high touch points about the damn thing are basically completely undocumented.And if they are, the documentation is out of date, and written by some random maintainer and describes a process no longer used or it’s by a third-party and obviously wrong or superfluous and they have no idea what they’re talking about.Edit: Oh it’s a cultural issue, too. Almost everything revolving around Linux documentation is also an amateur shitshow. Systemd, that init system and so much more that everyone uses? How do you build it and integrate it into a new image?I don’t know. They don’t either. It’s assumed you’re already using it from a major distribution. There’s no documentation for it.
 - hugmynutus15 days ago
 docs.kernel.org is generated from in tree readmes, docs, type/struct/function definitions. Making it a lot easier to read/browse documentation that would (previously) require grepping the source code to find.I realize the site also hosts some fairly out-of-date articles, there is room for improvement. Those hand written articles start with an author & timestamp, so they're easy to filter.
pjmlp16 days ago
> In an ideal world, there would be a header-only C library provided by the Linux kernel; we would include that file and be done with it. As it turns out, there is no such file, and interfacing with syscalls is complicated.Because Linux is the exception, UNIX public API is the C library as defined later by POSIX.The goal to create C and rewrite UNIX V4 into C was exactly to move away from this kind of platform details.Also UNIX can be seen as C's runtime, in a way, thus traditionally the C compiler was the same of the platform vendor, there were not pick and chose among C compilers and standard libraries, that was left for non-UNIX platforms.
- wahern16 days ago
 All true, but note that BSD introduced, and both Linux/glibc and Linux/musl support, a syscall(2) wrapper routine that takes a syscall number, a list of arguments (usually as long's), and performs the syscall magic. The syscall numbers are defined as macros beginning with SYS_. The Linux kernel headers export syscall numbers with macros using the prefix __NR_, but to match the BSD interface Linux libc headers usually translate or otherwise define them using a SYS_ prefix. Using the macros is much better because the numbers often vary by architecture for the same syscall.See <a href="https://man7.org/linux/man-pages/man2/syscall.2.html" rel="nofollow">https://man7.org/linux/man-pages/man2/syscall.2.html</a>
 - pjmlp16 days ago
 Except with BSDs you are on your own if you go down that route, because there are no stability guarantees.It is more of an implementation detail for the rest of the C APIs than anything else.
 - markjdb16 days ago
 At least FreeBSD's syscall ABI is guaranteed to be stable, one can run ancient binaries on a modern kernel. I believe the same is not true of OpenBSD and maybe NetBSD however.
 - wahern16 days ago
 Indeed. Another reason to use the system's macros rather than hardcoding integer literals--the numbers can change between releases. Though that doesn't guarantee the syscall works the same way between releases wrt parameters and return value semantics, if it still exists at all. And I believe OpenBSD removed the syscall wrapper altogether after implementing the pinsyscalls feature.
 - toast015 days ago
 Do you need a guarantee or is enough that it's painful enough for the BSD maintainers when they remove syscalls that they rarely do it? It's even worse if they renumber them so that really doesn't happen outside of syscalls that were only briefly available in a development branch.Varies a bit by flavor: OpenBSD values security more than stability, so they are willing to break old binaries more often; FreeBSD does require compat modules/etc for some things, but those are available for a long time and sometimes something slips through.If they break old syscalls, it breaks your code that skips libc, but it also breaks running an old userland with a new kernel and that needs to work for upgrade scenarios. It also breaks binaries that were statically linked with an older libc. When a new kernel breaks old binaries, people stop upgrading the kernel and that's not what maintainers want.
- adrian_b16 days ago
 Even if one would want to use Linux only through libc, that is not always possible.Linux has evolved beyond POSIX and many newer syscalls, which can enhance performance in certain scenarios, are not available as libc functions.They may be invoked either using the generic syscall wrappers provided by glibc besides the standard functions, or by using custom wrappers or possibly by using some special libraries, if such libraries are available.
 - pjmlp16 days ago
 That isn't a valid reason, given the existence of Solaris, HP-UX, DG/UX, Tru64, NeXTSTEP, and so many other UNIXes that grew beyond AT&T UNIX System V.All of them provide C APIs to their additional features not covered by POSIX.What Linux has is that due to the way syscalls are exposed there is a certain laziness to cover everything on glibc, or its replacements like musl.
 - pm21516 days ago
 I think rather than "laziness" I would say it's an instance of the widespread phenomenon of "shipping the orgchart". Because for Linux the kernel developers and the libc developers are separate communities, the boundary between those components becomes more meaningful, more visible to the end-user, and more likely to have lag where one side supports something and the other doesn't yet. (That goes both ways, incidentally -- the handling of POSIX threads was driven more from the libc side of the fence and it took a while before the kernel provided enough facilities to make it cleanly doable, and there are still corners like setuid() where there is a mismatch between what the standard wants and the primitives the kernel provides). Where an OS has a more tightly integrated development team the kernel/libc boundary is more likely to stay an internal one.
 - cb32116 days ago
 This description matches my own experience. E.g., I recall having to use my own macro-based syscall() things when the inotify system was first introduced because glibc did not have support for years and then it was years more for slow moving Linux distros to pick up the new glibc version.Unsaid was that much of this project separation comes from glibc being born as (and probably still being) a "portable libc with extra GNU-ish features", not a Linux-specific thing.Honestly, some of this pain might have been avoided had the Bell Labs guys made two libraries - the syscall interface part of `libc`, called say `libos`, and the more articulated language run-time (string/buffered IO/etc./etc) the actual `libc`. Then the kernel could "easily" ship with libos and libc's could vary. To even realize this might be helpful someday likely required foresight beyond reason in the mid-1970s. Then, afterwards, Makefile's and other build system stuff probably wanted to stay with "-lc" in various places and then glibc/others wanted to support that and so it goes. Integration can be hard to un-do.
 - masklinn16 days ago
 IIRC it’s not just laziness, there are things glibc explicitly doesn’t want to expose for various reasons, and since the two projects are essentially unrelated you get the intersection of what both sides are happy with.Traditional unices develop the kernel and the libc together, as a system, so any kernel feature they want to expose they can just do so.
 - adrian_b16 days ago
 You are right, but I did not comment on what might be desirable, but on what is the current status.Because I do not like certain decisions in the design of glibc, I am skeptical about their ability do define good standard APIs for the more recent syscalls, so perhaps it is better that they did not attempt to do this.
- muvlon16 days ago
 I actually love this about Linux. The syscall API is much better than libc (both the one defined by POSIX and libc as it actually exists on different Unixen). No errno (which requires weird and inefficient TLS bullshit), no hooks like atfork/atexit/etc., no locales, no silly non-reentrant functions using global variables, no dlopen/dlclose. Just the actual OS interface. Languages that aren't as braindead as C can have their own wrappers around this and skip all that nonsense.Also, there are syscalls which are basically not possible to directly expose as C functions, because they mess with things that the C runtime considers invariant. An example would be `SYS_clone3`. This is an immensely useful syscall, and glibc uses it for spawning threads these days. But it cannot be called directly from C, you need platform-specific assembly code around it.
 - oguz-ismail216 days ago
 > But it cannot be called directly from CNo system call can, you need a wrapper like syscall() provided by glibc. glibc also provides a dedicated wrapper for the clone system call which properly sets up the return address for the child thread. No idea what you're angry about
 - muvlon16 days ago
 Sure, you need a tiny bit of asm to do the actual syscall. That's not what I'm talking about. Most syscalls are easy to wrap, clone is slightly harder but doable (as evidenced by glibc). clone3 is for all intents and purposes impossible to write a general C wrapper for. It allows you to create situations such as threads that share virtual memory but not file descriptors, or vice-versa. That is, it can leave the caller in a situation that violates core assumptions by libc.
 - oguz-ismail216 days ago
 You're mixing things up. C the language doesn't know about virtual memory or file descriptions. Those are OS features.
 adrian_b16 days ago
 The C library maintains its own set of file descriptors, which are mapped to the OS file descriptors (because the stdio file descriptors and the OS file descriptors have different types and different behaviors).I do not know whether this is true, but perhaps the previous poster means that using clone3 with certain arguments may break this file descriptor mapping so invoking after that stdio functions may have unexpected results.Also the state kept by the libc malloc may get confused after certain invocations of clone3, because it has memory pages that have been obtained through mmap or sbrk and which may sometimes be returned to the OS.So libc certainly cares about the OS file descriptors and virtual memory mappings, because it maintains its own internal state, which has references to the corresponding OS state. I have not looked to see when an incorrect state can result after a clone3, but it is plausible that such cases may exist, so that glibc allows calling clone3 only with a restricted combination of arguments and it does not provide a wrapper that would allow other combinations of arguments.
 pm21516 days ago
 Yes; this is why QEMU's user-space-emulation clone syscall handling restricts the caller to only those combinations of clone flags which match either "looks like fork()" or "looks like creating a new pthread", because QEMU itself is linked with the host libc and weird clone flag combinations will put the new process/thread into a state the libc isn't expecting.
 oguz-ismail216 days ago
 All fair points. What do other languages' standard libraries do to walk around clone3 then? If two threads share file descriptors but not virtual memory, do they perform some kind of IPC to lock them for synchronizing reads and writes?
 muvlon16 days ago
 > What do other languages' standard libraries do to walk around clone3 then?They don't offer generic clone3 wrappers either AFAIK. All the code I've seen that uses it - and a lot of it is not in standard libraries but in e.g. container runtime implementations - has its own special-purpose code around a specific way to call it.My point is not that other standard libraries do it better, but that clone3 as a syscall interface is highly versatile, moreso than it could be as a function in either C or most other languages. That is, the syscall API is the right layer for this feature to be.
jmgao16 days ago
> I was expecting a unified interface across all architectures, with perhaps one or two architecture-specific syscalls to access architecture-specific capabilities; but Linux syscalls are more like Swiss cheese.There's lots of historical weirdness, mostly around stuff where the kernel went "oops, we need 64-bit time_t or off_t or whatever" and added, for example, getdents64 to old platforms, but new platforms never got the broken 32-bit version. There are some more interesting cases, though, like how until fairly recently (i.e. about a decade ago for the mainline kernel), on x86 (and maybe other platforms?) there weren't individual syscalls for each socket syscall, they were all multiplexed through socketcall.
leni53616 days ago
> In an ideal world, there would be a header-only C library provided by the Linux kernel; we would include that file and be done with it. As it turns out, there is no such file, and interfacing with syscalls is complicated.Isn't that nolibc.h?
- adrian_b16 days ago
 Nolibc, which is incorporated in the Linux kernel sources (under "tools"), contains generic wrappers for the Linux syscalls and a set of simplified implementations for the most frequently needed libc functions.It is useful for very small executables or for some embedded applications.It is not useful for someone who would want to use the Linux syscalls directly from another programming language than C, bypassing glibc or other libc implementations, except by providing models of generic wrappers for the Linux syscalls.It also does not satisfy the requirement of the parent article, because it does not contain a table of syscalls that could be used for separate compilations.Nolibc implements its functions like this:<pre><code> long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg) { return my_syscall3(__NR_ioctl, fd, cmd, arg); } </code></pre> where the syscall numbers like "__NR_ioctl" are collected from the Linux kernel headers, because nolibc is compiled in the kernel source tree.As explained in the parent article, there is no unique "__NR_ioctl" in the kernel sources. The C headers that are active during each compilation are selected or generated automatically based on the target architecture and on a few other configuration options, so searching for the applicable "__NR_ioctl" can be tedious, hence the value of the parent article and of a couple of other places mentioned by other posters, where syscall tables can be found.
 - remix200016 days ago
 Funny how you chose `ioctl` specifically to illustrate your point, when that's quite uniquely just a syscall inside a syscall… Ideally, high level library devs should abstract ioctl while treating libc as the stable userland kernel ABI, as has always been the case for the BSD's.I think the real problem is GNU libc devs' unwillingness to stabilize it (not sure why, perhaps the menace of HURD still haunting them?)
 - adrian_b16 days ago
 I chose "ioctl" precisely because it has maximum simplicity, in order to show that in "nolibc" it needs externally provided syscall numbers.Some other syscall wrappers from "nolibc" may be somewhat more complex, by doing some processing on the arguments, before invoking a generic syscall wrapper like "my_syscall3", "my_syscall5" etc. (where the number from the name of the generic syscall wrapper refers to the number of syscall arguments).
 - remix200015 days ago
 Ioctls are the single most complex example for API design, cause like, that's another opaque interface inside one opaque interface. Ioctls will be routed to the desired kernel module (driver) depending on the FD, after all.Basically all I'm saying is that a syscall "ABI" is but a red herring for everyone but the [mainline] Linux devs themselves.
jiehong16 days ago
That's nice!But, perhaps this table could also be expanded with other kernels?Windows Kernel Syscalls and Mac OS Kernel Syscalls.Side note: maybe libc should be renamed lib-linux someday.
jesse__16 days ago
I've been thinking about doing this for a little side project for some time. Looking forward to the eventual conclusion :)
duckydude2016 days ago
i was also trying to do something like this. but i got lost with other projects.<a href="https://github.com/sku0x20/no-libc/tree/main" rel="nofollow">https://github.com/sku0x20/no-libc/tree/main</a>
deivid16 days ago
Would it be cheating to use the kernel's nolibc?
- adrian_b16 days ago
 See another comment.Using nolibc is fine when you compile it together with the kernel.The parent article is about a C header that you can use to compile your program independently of the source files of the Linux kernel.Even the presence of the Linux kernel sources on your computer is not enough to enable the compilation of a program that uses directly the syscalls, as the raw sources do not contain any suitable header. You must first compile the kernel with the desired configuration, because header files are selected or auto-generated accordingly. That is enough for nolibc, which lives in the kernel source tree, but it would still be difficult to identify which are the header files that could be used to compile an external program.Moreover, including Linux header files in your program is an extremely bad idea, because they are not stable. It is frequent that a minor version increase of the Linux kernel breaks the "#include" directives of external programs (such as out-of-tree device drivers), because items are moved between headers or some headers disappear and other headers appear.
 - deivid15 days ago
 That makes sense, I guess this was not a problem for the times I needed nolibc.I do agree that trying to extract data/logic from linux is a pain -- I've tried a few times to extract some of the eBPF verifier handling, but end up pulling most of the kernel along.