The way CTRL-C in Postgres CLI cancels queries is incredibly hack-y

(neon.com)

86 points by andrenotgiant2 days ago

6 comments

ZiiS1 hour ago
A good write up explaining how assumptions of network and security design have changed so much over the years. Also you have to give credit nowadays for not overly sensationalizing 'heebie-jeebies level 6'. I certainly continue reusing a connection I assumed was TLS after a cancel so was vulnerable to a DoS; but equally if the next statement was canceled I would switch to a new connection no harm no foul.
i18nagentai42 minutes ago
What strikes me most about this is how it illustrates the tension between backward compatibility and security in long-lived systems. The cancel key approach made total sense in the context of early Unix networking assumptions, but those assumptions have quietly eroded over decades. The fact that the cancel token is only 32 bits of entropy and sent in cleartext means it was never really designed for adversarial environments -- it was a convenience feature that became load-bearing infrastructure. I wonder if the Postgres community will eventually move toward a multiplexed protocol layer (similar to what HTTP/2 did for HTTP) rather than trying to bolt security onto the existing out-of-band mechanism.
- dmurray30 minutes ago
 Doesn't it also make sense in the context of modern networking assumptions?I've never had to connect to PostGres in an adversarial environment. I've been at work or at home and I connected to PostGres instances owned by me or my employer. If I tried to connect to my work instance from a coffee shop, the first thing I'd do would be to log in to a VPN. That's your multiplexed protocol layer right there: the security happens at the network layer and your cancel happens at the application layer.This is a different situation from websites. I connect to websites owned by third parties all the time, and I want my communication there to be encrypted at the application layer.
rlpb4 hours ago
TCP has an "urgent data" feature that might have been used for this kind of thing, used for Ctrl-C in telnet, etc. It can be used to bypass any pending send buffer and received by the server ahead of any unread data.
- mike_hearn3 hours ago
 Fun fact: Oracle implements cancellation this way.The downside is that sometimes connections are proxied in ways that lose these unusual packets. Looking at you, Docker...
- ZiiS1 hour ago
 Unfortunately the can be many buffers between you and the server which "urgent data" doesn't skip by design. (the were also lots of implementation problems)
michalc5 hours ago
I think I can understand why this wasn’t addressed for so long: in the vast majority of cases if your db is exposed on a network level to untrusted sources, then you probably have far bigger problems?
- hrmtst938372 hours ago
 That's the kind of hand-wave that turns into a CVE later. Network exposure is one thing, but weird signal handling in local tooling can still become a cross-session bug or a nasty security footgun on shared infra, terminals, or jump boxes.If you have shared psql sessions in tmux or on a jump box one bad cancel can trash someone else's work. 'Just firewall it' is how you end up owned by the intern with shell access.
- pilif3 hours ago
 it's also very tricky to do given the current architecture on the server side where one single-threaded process handles the connection and uses (for all intents and purposes) sync io.In such a scenario, listening (and acting) on cancellation requests on the same connection becomes very hard, so fixing this goes way beyond "just".
jtwaleson5 hours ago
From the title I was hoping for this being hacky on the server application side, like how it aborts and clears the memory for a running query.Still an interesting read. Just wondering, why can't the TCP connection of the query not be used to send a cancellation request? Why does it have the be out of band?
- mike_hearn3 hours ago
 Because Postgres is a very old codebase and was written in a style that assumes there are no threads, and thus there's nothing to listen for a cancellation packet whilst work is getting done. A lot of UNIXes had very poor support for threads for a long time and so this kind of multi-process architecture is common in old codebases.The TCP URG bit came out of this kind of problem. It triggers a SIGURG signal on UNIX which interrupts the process. Oracle works this way.These days you'd implement cancellation by having one thread handle inbound messages and another thread do the actual work with shared memory to implement a cooperative cancellation mechanic.But we should in general have sympathy here. Very little software and very few protocols properly implements any form of cancellation. HTTP hardly does for normal requests, and even if it did, how many web servers abort request processing if the connection drops?
 - Someone26 minutes ago
 > The TCP URG bit came out of this kind of problem. It triggers a SIGURG signal on UNIX which interrupts the process. Oracle works this way.<a href="https://datatracker.ietf.org/doc/html/rfc6093" rel="nofollow">https://datatracker.ietf.org/doc/html/rfc6093</a>:“it is strongly recommended that applications do not employ urgent indications. Nevertheless, urgent indications are still retained as a mandatory part of the TCP protocol to support the few legacy applications that employ them. However, it is expected that even these applications will have difficulties in environments with middleboxes.”
 - johannes12343211 hour ago
 It isn't really easy to do. A client may send tons of data over the connection, probably data which is calculated by the client as the client's buffer empties. If the server clears the buffers all the time to check for a cancellation it may have quite bad consequences.
- toast05 hours ago
 I don't know much about postgres, but as I understand it, it's a pretty standard server application. Read a request from the client, work on the request, send the result, read the next request.Changing that to poll for a cancellation while working is a big change. Also, the server would need to buffer any pipelined requests while looking for a cancellation request. A second connection is not without wrinkles, but it avoids a lot of network complexity.
- bob10295 hours ago
 MSSQL uses a special message over an existing connection:<a href="https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/dc28579f-49b1-4a78-9c5f-63fbda002d2e" rel="nofollow">https://learn.microsoft.com/en-us/openspecs/windows_protocol...</a>
- hlinnaka2 hours ago
 Because then the cancellation request would get queued behind any other data that's in flight from the client to the server. In the worst case the TCP buffers are full, and the client cannot even send the request until the server processes some of the existing data that's in-flight.
 - adrian_b1 hour ago
 As others have said, TCP allows sending urgent packets, precisely for solving this problem.At the receiver, a signal handler must be used, which will be invoked when an urgent packet is received, with SIGURG.
- CamouflagedKiwi4 hours ago
 It's basically got a thread per connection, while it's working on a query that thread isn't listening to incoming traffic on the network socket any more.
gpderetta26 minutes ago
TLS is not async signal safe. But having a dedicated thread whose responsibility is to only send cancel tokens via a TLS connection and is woken up by a posix semaphore seems a small, self contained change that doesn't require any major refactoring.