Better JIT for Postgres

(github.com)

111 points by vladich9 hours ago

7 comments

sourcegrift5 hours ago
We have everything optimized, and yet somehow DB queries need to be "interpreted" at runtime. There's no reason for DB queries to not be precompiled.
- jpfr7 minutes ago
 The "byte-code" coming from the query planner typically only has a handful of steps in a linear sequence. Joins, filters, and such. But the individual steps can be very costly.So there is not much to gain from JITing the query plan execution only.JITing begins to make more sense, when the individual query plan steps (join, filter, ...) themselves be specialized/recompiled/improved/merged by knowing the context of the query plan.
- catlifeonmars4 hours ago
 This is a neat idea. I want to take it further and precompile the entire DBMS binary for a specific schema.
 - WJW3 hours ago
 How will you handle ALTER TABLE queries without downtime?
 - catlifeonmars2 hours ago
 That would definitely present a bit of a challenge, but:- not all databases need migrations (or migrations without downtime)- alternatively, ship the migrations as part of the binaryAdhoc modifications would still be more difficult but tbh that’s not necessarily a bug
- Asm2D3 hours ago
 Many SQL engines have JIT compilers.The problems related to PostgreSQL are pretty much all described here. It's very difficult to do low-latency queries if you cannot cache the compiled code and do it over and over again. And once your JIT is slow you need a logic to decide whether to interpret or compile.I think it would be the best to start interpreting the query and start compilation in another thread, and once the compilation is finished and interpreter still running, stop the interpreter and run the JIT compiled code. This would give you the best latency, because there would be no waiting for JIT compiler.
 - aengelke2 hours ago
 > It's very difficult to do low-latency queries if you cannot cache the compiled codeThis is not too difficult, it just requires a different execution style. Salesforce's Hyper for example very heavily relies on JIT compilation, as does Umbra [1], which some people regard as one of the fastest databases right now. Umbra doesn't cache any IR or compiled code and still has an extremely low start-up latency; an interpreter exists but is practically never used.Postgres is very robust and very powerful, but simply not designed for fast execution of queries.Disclosure: I work in the group that develops Umbra.[1]: <a href="https://umbra-db.com/" rel="nofollow">https://umbra-db.com/</a>
 - chrisaycock2 hours ago
 > I think it would be the best to start interpreting the query and start compilation in another threadThis technique is known as a "tiered JIT". It's how production virtual machines operate for high-level languages like JavaScript.There can be many tiers, like an interpreter, baseline compiler, optimizing compiler, etc. The runtime switches into the faster tier once it becomes ready.More info for the interested:<a href="https://ieeexplore.ieee.org/document/10444855" rel="nofollow">https://ieeexplore.ieee.org/document/10444855</a>
- levkk2 hours ago
 See prepared statements.
- SigmundA4 hours ago
 Postgresql uses a process per connection model and it has no way to serialize a query plan to some form that can be shared between processes, so the time it takes to make the plan including JIT is very important.Most other DB's cache query plans including jitted code so they are basically precompiled from one request to the next with the same statement.
 - zaphirplane4 hours ago
 What do you mean ? Cause the obvious thing is a shared cache and if there is one thing the writers of a db know it is locking
 - SigmundA3 hours ago
 Sharing executable code between processes it not as easy as sharing data. AFAIK unless somethings changed recently PG shares nothing about plans between process and can't even share a cached plan between session/connections.
 - _flux2 hours ago
 Write the binary to a file, call it `libquery-id1234.so`, and link that to whichever processes that need it?
 - llm_nerd3 hours ago
 Executable code is literally just data that you mark as executable. It did the JIT code, and the idea that it can't then share it between processes is incomprehensible.I was actually confused by this submission as it puts so much of an emphasis on initial compilation time, when every DB (apparently except for pgsql) caches that result and shares it/reuses it until invalidation. Invalidation can occur for a wide variety of reasons (data composition changing, age, etc), but still the idea of redoing it on every query, where most DBs see the same queries endlessly, is insane.
 - hans_castorp3 hours ago
 > and it has no way to serialize a query plan to some form that can be shared between processes<a href="https://www.postgresql.org/docs/current/parallel-query.html" rel="nofollow">https://www.postgresql.org/docs/current/parallel-query.html</a>"PostgreSQL can devise query plans that can leverage multiple CPUs in order to answer queries faster."
 - SigmundA3 hours ago
 Nothing to do with plan caching, thats just talking about plan execution of parallel operations which is that thread or process based in PG?If process based then they can send small parts of plan across processes.
 - hans_castorp3 hours ago
 Ah, didn't see the caching part.Plans for prepared statements are cached though.
 AlisdairO38 minutes ago
 Only on a per-connection basis
eru7 hours ago
> However, standard LLVM-based JIT is notoriously slow at compilation. When it takes tens to hundreds of milliseconds, it may be suitable only for very heavy, OLAP-style queries, in some cases.I don't know anything here, but this seems like a good case for ahead of time compilation? Or at least caching your JIT results? I can image much of the time, you are getting more or less the same query again and again?
- olau7 hours ago
 Yes.Some years ago we ported some code from querying out the data and tallying in Python (how many are in each bucket) to using SQL to do that. It didn't speed up the execution. I was surprised by that, but I guess the Postgres interpreter is roughly the same speed as Python, which when you think about it perhaps isn't that surprising.But Python is truly general purpose while the core query stuff in SQL is really specialized (we were not using stored procedures). So if Pypy can get 5x speedup, it seems to me that it should be possible to get the same kind of speed up in Postgres. I guess it needs funding and someone as smart as the Pypy people.
- bob10295 hours ago
 At some level the application needs to participate in the performance conversation too.<a href="https://www.postgresql.org/docs/current/sql-prepare.html" rel="nofollow">https://www.postgresql.org/docs/current/sql-prepare.html</a>
 - masklinn2 hours ago
 Postgres’s PREPARE is per-connection so it’s pretty limited, and then connection poolers enter the fray and often can’t track SQL-level prepares.And then the issue is not dissimilar to Postgres’s planner issues.
 - SigmundA4 hours ago
 Unless you cache query plans like other RDBMS's then the client manually managing that goes away and its not limited to a single connection.MS SQL still has prepared statements and they really haven't been used in 20 years since it gained the ability to cache plans based on statement text.
the_biot3 hours ago
What sort of things are people doing in their SQL queries that make them CPU bound? Admittedly I'm a meat-and-potatoes guy, but I like mine I/O bound.Really amazed to see not one but several generic JIT frameworks though, no idea that was a thing.
- martinald3 hours ago
 Anything jsonb in my experience is quickly CPU bound...
- wreath2 hours ago
 I think reading queries that are always served from cache are CPU bound because it also involves locking the buffers etc and there is no I/O involved.
- throwaway1401263 hours ago
 PostgreSQL is Turing complete, so I guess they do what ever they want?
fabian2k6 hours ago
The last time I looked into it my impression was that disabling the JIT in PostgreSQL was the better default choice. I had a massive slowdown in some queries, and that doesn't seem to be an entirely unusual experience. It does not seem worth it to me to add such a large variability to query performance by default. The JIT seemed like something that could be useful if you benchmark the effect on your actual queries, but not as a default for everyone.
- pjmlp6 hours ago
 That is quite strange, given that big boys RDMS (Oracle, SQL Server, DB2, Informix,...) all have JIT capabilities for several decades now.
 - SigmundA4 hours ago
 The big boys all cache query plans so the amount it time it take to compile is not really a concern.
 - aengelke2 hours ago
 That's not generally correct. Compile-time is a concern for several databases.
 - SigmundA20 minutes ago
 Most systems submit many of the same queries over and over again.Ad-hoc one off queries usually can accept higher initial up-front compile cost because the main results usually take much longer anyway, vs worrying about an extra 100ms of compile.Maybe it was too strong to say its not a concern at all, but nothing like PG where every single request needs to replan and potentially jit unless the client manually prepares and keeps the connection open.
swaminarayan5 hours ago
Have you tested this under high concurrency with lots of short OLTP queries? I’m curious whether the much faster compile time actually moves the point where JIT starts paying off, or if it’s still mostly useful for heavier queries.
- masklinn2 hours ago
  > By default, jit_above_cost parameter is set to a very high number (100'000). This makes sense for LLVM, but doesn't make sense for faster providers. It's recommended to set this parameter value to something from ~200 to low thousands for pg_jitter (depending on what specific backend you use and your specific workloads).
larodi5 hours ago
sadly, no windows version yet AFAICT
asah7 hours ago
awesome! I wonder if it's possible to point AI at this problem and synthesize a bespoke compiler (per-architecture?) for postgresql expressions?
- kvdveer7 hours ago
 Two things are holding back current LLM-style AI of being of value here:* Latency. LLM responses are measured in order of 1000s of milliseconds, where this project targets 10s of milliseconds, that's off by almost two orders of magnitute.* Determinism. LLMs are inherently non-deterministic. Even with temperature=0, slight variations of the input lead to major changes in output. You really don't want your DB to be non-deterministic, ever.
 - qeternity5 hours ago
 > LLMs are inherently non-deterministic.This isn't true, and certainly not inherently so.Changes to input leading to changes in output does not violate determinism.
 - magicalhippo3 hours ago
 > This isn't trueFrom what I understand, in practice it often is true[1]:Matrix multiplication should be “independent” along every element in the batch — neither the other elements in the batch nor how large the batch is should affect the computation results of a specific element in the batch. However, as we can observe empirically, this isn’t true.In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism.[1]: <a href="https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/" rel="nofollow">https://thinkingmachines.ai/blog/defeating-nondeterminism-in...</a>
 - yomismoaqui3 hours ago
 Quoting:"But why aren’t LLM inference engines deterministic? One common hypothesis is that some combination of floating-point non-associativity and concurrent execution leads to nondeterminism based on which concurrent core finishes first."From <a href="https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/" rel="nofollow">https://thinkingmachines.ai/blog/defeating-nondeterminism-in...</a>
 - simonask7 hours ago
 > 1000s of millisecondsBetter known as "seconds"...
 - olau7 hours ago
 The suggestion was not to use an LLM to compile the expression, but to use an LLM to build the compiler.