AWS engineer reports PostgreSQL perf halved by Linux 7.0, fix may not be easy

(phoronix.com)

347 points by crcastle13 hours ago

16 comments

lfittl13 hours ago
Its worth reading this follow-up LKML post by Andres Freund (who works on Postgres): <a href="https://lore.kernel.org/lkml/yr3inlzesdb45n6i6lpbimwr7b25kqkn37qzlvvzgad5hfd7ut@xv4cihno76wu/" rel="nofollow">https://lore.kernel.org/lkml/yr3inlzesdb45n6i6lpbimwr7b25kqk...</a>
- aftbit11 hours ago
 >If this somehow does end up being a reproducible performance issue (I still suspect something more complicated is going on), I don't see how userspace could be expected to mitigate a substantial perf regression in 7.0 that can only be mitigated by a default-off non-trivial functionality also introduced in 7.0.
 - cr125rider1 hour ago
 They said the magic words to get Linus to start flipping tables. Never break userspace. Unusably slow is broken
- fxtentacle52 minutes ago
 .. which confirms all of my stereotypes. Looks like the AWS engineer who reported it used a m8g.24xlarge instance with 384 GB of RAM, but somehow didn't know or care to enable huge pages. And once enabling them, the performance regression disappears.
 - bushbaba12 minutes ago
 Because such settings aren’t obvious to those not familiar with them. LLMs should make discoverability easier though
- justinclift11 hours ago
 Note that it's just not a single post, and there's additional further information in following the full thread. :)
 - adrian_b6 hours ago
 Yes, and in the following messages the conclusion was that the regression is mitigated when using huge pages.
 - jeltz5 hours ago
 Which you always should use anyway if you can.
- anal_reactor6 hours ago
 > Maybe we should, but requiring the use of a new low level facility that was introduced in the 7.0 kernel, to address a regression that exists only in 7.0+, seems not great.Completely right. This sounds like a communication failure. Maybe Linux maintainers should pick a few applications that have "priority support" and problems with these applications are also problems with Linux itself. Breaking Postgres is a serious regression.Reminds me of a situation where Fedora couldn't be updated if you had Wine installed and one side of the argument was "user applications are user problem" while the other was "it's Wine, like come on".
 - falcor844 hours ago
 I for one liked the old and simple WE DO NOT BREAK USERSPACE attitude.<a href="https://linuxreviews.org/WE_DO_NOT_BREAK_USERSPACE" rel="nofollow">https://linuxreviews.org/WE_DO_NOT_BREAK_USERSPACE</a>
 - gcr2 hours ago
 Performance regressions are different from ABI incompatibilities. If the kernel refused to do any work that slowed down any userspace program, the pace would go a lot slower.
 - shadowgovt1 hour ago
 Or be a lot uglier. See: Microsoft replacing its own API surfaces with binary-compatible representations to workaround companies like Adobe adding perf improvements like bypassing the kernel-provided kernel object constructors because it saved them a few cycles to just hard-code the objects they wanted and memcpy them into existence.
 - reisse3 hours ago
 Not sure it is true anymore. I've encountered few userspace breaks in io_uring, at least.
- jeffbee12 hours ago
 Funny how "use hugepages" is right there on the table and 99% of users ignore it.
 - bombcar12 hours ago
 I’m absolutely flabbergasted by the performance left on the table; even by myself - just yesterday I learned Gentoo’s emerge can use git and be a billion times faster.
 - globular-toast5 hours ago
 The time spent by emerge is utterly dwarfed by the time spent to build the packages, so who cares? Maybe it's different if installing a binary system but don't think most people are doing that.
 - LtdJorge1 hour ago
 When using multiple overlays, emerge-webrsync is ungodly slower compared to git.
 - GandalfHN4 hours ago
 [dead]
 - GandalfHN5 hours ago
 [dead]
- TacticalCoder11 hours ago
 AIUI in that thread they're saying "0.51x" the perf on a 96-core arm64 machine and they're also saying they cannot reproduce it on a 96-core amd64 machine.So it's not going to affect everybody both running PostgreSQL and upgrading to the latest kernel. Conditions seems to be: arm64, shitloads of core, kernel 7.0, current version of PostgreSQL.That is not going to be 100% of the installed PostgreSQL DBs out there in the wild when 7.0 lands in a few weeks.
 - torginus6 hours ago
 It's a huge issue of ARM based systems, that hardly anyone uses or tests things on them (in production).Yes, Macs going ARM has been a huge boon, but I've also seen crazy regressions on AWS Graviton (compared to how its supposed to perform), on .NET (and node as well), which frankly I have no expertise or time digging into.Which was the main reason we ultimately cancelled our migration.I'm sure this is the same reason why its important to AWS.
 - p_l3 hours ago
 Macs are actually part of pain point with ARM64 Linux, because the Linux arm set er tend to use 64 kB pages while Mac supports only 4 and 16, and it causes non trivial bugs at times (funnily enough, I first encountered that in a database company...)
 - zamalek9 hours ago
 It was later reproduced on the same machine without huge pages enabled. PICNIC?
 - anarazel8 hours ago
 Yes, I did reproduce it (to a much smaller degree, but it's just a 48c/96t machine). But it's an absurd workload in an insane configuration. Not using huge pages hurts way more than the regression due to PREEMPT_LAZY does.With what we know so far, I expect that there are just about no real world workloads that aren't already completely falling over that will be affected.
 - pgaddict1 hour ago
 So why does it happen only with hugepages? Is the extra overhead / TLB pressure enough to trigger the issue in some way? Of is it because the regular pages get swapped out (which hugepages can't be)?
 anarazel1 hour ago
 I don't fully know, but I suspect it's just that due to the minor faults and tlb misses there is terrible contention with the spinlock, regardless of the PREEMPT_LAZY when using 4k pages (that easily reproducible). Which is then made worse by preempting more with the lock held.
 - MBCook10 hours ago
 So perhaps this is a regression specifically in the arm64 code, or said differently maybe it’s a performance bug that has been there for a long time but covered up by the scheduler part that was removed?
 - adrian_b6 hours ago
 The following messages concluded that using huge pages mitigates the regression, while not using huge pages reproduces it.
 - db48x9 hours ago
 Could be either of those, or something else entirely. Or even measurement error.
 - jeltz5 hours ago
 Turns out the amd machine had huge tables enabled and after disabling those the regression was there on and too. So arm vs amd was a red herring.Of course not a nice regression but you should not run PostgreSQL on large servers without huge pages enabled so thud regression will only hurt people who have a bad configuration. That said I think these bad configurations are common out there, especially in containerized environments where the one running PostgreSQL may not have the ability to enable huge pages.
 db48x4 hours ago
 Yes, I had a good laugh at that. It might technically be a regression, but not one that most people will see in practice. Pretty weird that someone at Amazon is bothering to run those tests without hugepages.
 whizzter4 hours ago
 Still that huge a regression that affects multiple platforms doesn't sound too neat, did they narrow down the root cause?
 - master_crab11 hours ago
 For production Postgres, i would assume it’s close to almost no effect?If someone is running postgres in a serious backend environment, i doubt they are using Ubuntu or even touching 7.x for months (or years). It’ll be some flavor of Debian or Red Hat still on 6.x (maybe even 5?). Those same users won’t touch 7.x until there has been months of testing by distros.
 - crcastle11 hours ago
 Ubuntu is used in many serious backend environments. Heroku runs tens of thousands (if not more) instances of Ubuntu on its fleet. Or at least it did through the teens and early 2020s.<a href="https://devcenter.heroku.com/articles/stack" rel="nofollow">https://devcenter.heroku.com/articles/stack</a>
 - rixed7 hours ago
 There is serious as in "corporate-serious" and serious as in "engineer-serious".
 - nine_k10 hours ago
 Do they upgrade to the new LTS the day it is released?
 sakjur5 hours ago
 Ubuntu's upgrade tools wait until the .1 release for LTSes, so your typical installation would wait at least half a year.
 crcastle10 hours ago
 Not historically.
 rvnx9 hours ago
 and they are right, this is because a lot of junior sysadmins believe that newer = better.But the reality:<pre><code> a) may get irreversible upgrades (e.g. new underlying database structure) b) permanent worse performance / regression (e.g. iOS 26) c) added instability d) new security issues (litellm) e) time wasted migrating / debugging f) may need rewrite of consumers / users of APIs / sys calls g) potential new IP or licensing issues </code></pre> etc.A couple of the few reasons to upgrade something is:<pre><code> a) new features provide genuine comfort or performance upgrade (or... some revert) b) there is an extremely critical security issue c) you do not care about stability because reverting is uneventful and production impact is nil (e.g. Claude Code) </code></pre> but 99% of the time, if ain't broke, don't fix it.<a href="https://en.wikipedia.org/wiki/2024_CrowdStrike-related_IT_outages" rel="nofollow">https://en.wikipedia.org/wiki/2024_CrowdStrike-related_IT_ou...</a>
 miki1232117 hours ago
 On the other hand, I suspect LLMs will dramatically decrease the window between a vulnerability being discovered and that vulnerability being exploited in the wild, especially for open-source projects.Even if the vulnerability itself is discovered through other means than by an LLM, it's trivial to ask a SOTA model to "monitor all new commits to project X and decide which ones are likely patching an exploitable vulnerability, and then write a PoC." That's a lot easier than finding the vulnerable itself.I won't be surprised if update windows (for open source networked services) shrink to ~10 minutes within a year or two. It's going to be a brutal world.
 mr_toad3 hours ago
 Too often I see IT departments use this as an excuse to only upgrade when they absolutely have to, usually with little to no testing in advance, which leaves them constantly being back-footed by incompatibility issues.The idea of advanced testing of new versions of software (that they’ll be forced to use eventually) never seems to occur, or they spend so much time fighting fires they never get around to it.
 gjvc5 hours ago
 all fair points, on the other hand, as a general rule, isn't it important to stay on currently-supported versions of pieces of software that you run?ymmv, but in my experience projects like postgresql which have been reliable, tend to continue to be so.
 - pmontra8 hours ago
 A customer of mine is running on Ubuntu 22.04 and the plan is to upgrade to 26.04 in Q1 2027. We'll have to add performance regression to the plan.
 - wongogue5 hours ago
 Are you running ARM servers?
monocasa11 hours ago
I feel like using spinlocks in user space at all without kernel support like rseq is just asking for weird performance degradations.
- anarazel51 minutes ago
 I really dislike the use of spinlocks in postgres (and have been replacing a lot of uses over time), but it's not always easy to replace them from a performance angle.On x86 a spinlock release doesn't need a memory barrier (unless you do insane things) / lock prefix, but a futex based lock does (because you otherwise may not realize you need to futex wake). Turns out that that increase in memory barriers causes regressions that are nontrivial to avoid.Another difficulty is that most of the remaining spinlocks are just a single bit in a 8 larger byte atomic. Futexes still don't support anything but 4 bytes (we could probably get away with using it on a part of the 8 byte atomic with some reordering) and unfortunately postgres still supports platforms with no 8 byte atomics (which I think is supremely silly), and the support for a fallback implementation makes it harder to use futexes.The spinlock triggering the contention in the report was just stupid and we only recently got around to removing it, because it isn't used during normal operation.Edit: forgot to add that the spinlock contention is not measurable on much more extreme workloads when using huge pages. A 100GB buffer pool with 4KB pages doesn't make much sense.
- jcalvinowens11 hours ago
 > I feel like using spinlocks in user space at all without kernel support like rseq is just asking for weird performance degradations.Yeah, exactly. "Doctor, help, somebody replaced my wooden hammer with a metal one, and now I can't hit myself in the face with it as many times."If you use spinlocks in userspace, you're gonna have a bad time.
 - mgaunard7 hours ago
 Most people looking for performance will reach for the spinlock.The expectation is that the kernel should somehow detect applications that are spinning, and avoid preempting them early.
 - IshKebab6 hours ago
 Well that seems like an unreasonable expectation no? Also isn't the point of spinlocks that they get released before the kernel does anything? Otherwise you could just use a futex... Which maybe you should do anyway...<a href="https://matklad.github.io/2020/01/04/mutexes-are-faster-than-spinlocks.html" rel="nofollow">https://matklad.github.io/2020/01/04/mutexes-are-faster-than...</a>
- jeltz5 hours ago
 PostgreSQL is old and had to support kernels which did not support spinlocks. But, yes, maybe PostgreSQL should stop doing so now that kernels do.
dsr_13 hours ago
Nobody sensible runs the latest kernel; nobody running PG in production should be afraid of setting a non-default at either boot time or as a sysctl. So this will, most likely, be another step in building a PG database server (turn off pre-emption if your kernel is 7.0 or later and PG is pre-whatever-version).At worst it might become a permanent part of building a PG server and a FAQ... but if it affects one thing this badly, it will affect others.
- Meekro12 hours ago
 > Nobody sensible runs the latest kernelFrom the article: "Linux 7.0 stable is due out in about two weeks. This is also the kernel version powering Ubuntu 26.04 LTS to be released later in April."Unfortunately, lots of people will be running it in less than a month. At the moment, it'll take a kernel patch (not a sysctl) to undo this-- hopefully something changes.
 - Neywiny12 hours ago
 Not nobody but not everybody upgrades to the newest distros immediately. That's the advantage of LTS. I've even found that a lot of programs have poorer support on 24.04 than 22.04 due to security changes, so I'm fine sticking with 22.04 as my main dev system.
 - justinclift11 hours ago
 > ... not everybody upgrades to the newest distros immediately.While that's true, for new deployments the story is often "deploy on the latest release of things available at the time".So, there will probably be a substantial deployment of new projects / testing projects using the Linux 7.0 kernel along with the latest available software packages in a few weeks.
 - Maxion5 hours ago
 I would argue it's mainly inexperienced devs who deploy on the very latest. Once you get some more years under your belt you realize the value of LTS versions, even if you don't get the shiniest shiny.
 yunohn4 hours ago
 > kernel version powering Ubuntu 26.04 *LTS*
 josh-sematic1 hour ago
 Yes it’s LTS but the point is that the LTS system has overlapping support so you can wait on an older LTS for a bit before upgrading to a newer one. And it’s somewhat prudent to do so if you value stability highly, because often a few new issues will be discovered and patched after LTS goes live for a bit.
 - stingraycharles12 hours ago
 This seems to be brushing off a major performance regression just because you personally don’t upgrade for 4 years. I don’t think that’s common at all.
 - Neywiny1 hour ago
 <a href="https://fr.archive.ubuntu.com/stats/stats_of_day-16.html?version=last#:~:text=April%2030%2C%202006,data%20are%20excluded%20from%20statistics)%20:" rel="nofollow">https://fr.archive.ubuntu.com/stats/stats_of_day-16.html?ver...</a> no need to think, data backs it up
 - vasco6 hours ago
 Someone said "its fine nobody uses this" and someone else gave the world's biggest slam dunk of "Ubuntu in 1 month" and your reply is that "not everyone does it". How far from the point can you be!In the Linux world this is the worst possible scenario, distro with the largest adoption, LTS.
 - Neywiny1 hour ago
 22.04 is still potentially more prevalent than 24.04 according to <a href="https://fr.archive.ubuntu.com/stats/stats_of_day-16.html?version=last#:~:text=April%2030%2C%202006,data%20are%20excluded%20from%20statistics)%20:" rel="nofollow">https://fr.archive.ubuntu.com/stats/stats_of_day-16.html?ver...</a> . 26.04 will take some time before it's largely adopted.
 - ndsipa_pomu1 hour ago
 Not trying to downplay the importance of this, but the LTS versions aren't until the first point release, so 26.04.1 (typically six months or so after the release).
 - esafak11 hours ago
 That's the advantage of LTS? 24.04 is the LTS, not the one you use, 22.04.
 - SoftTalker11 hours ago
 22.04 is also an LTS release, supported for another year still.<a href="https://ubuntu.com/about/release-cycle" rel="nofollow">https://ubuntu.com/about/release-cycle</a>We're just now looking at moving production machines to 24.04.
 apelapan5 hours ago
 If you are on a maintenance contract with Ubuntu, 22.04 is supported until 2032.If it aint broken, don't fix it.
 - cortesoft11 hours ago
 All even number .04 releases are LTS in Ubuntu
 - tankenmate3 hours ago
 Not necessarily;``` $ grep PREEMPT_DYNAMIC /boot/config-$(uname -r) CONFIG_PREEMPT_DYNAMIC=y CONFIG_HAVE_PREEMPT_DYNAMIC=y CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y ```if your kernel has CONFIG_PREEMPT_DYNAMIC then you can go back to the pre 7.0 default by adding preempt=none to your grub config. I haven't seen any plans by Ubuntu to drop CONFIG_PREEMPT_DYNAMIC from the default kernel config.
 - tankenmate3 hours ago
 actually i just checked, yeah, ubuntu would have to add none back to the kernel and `CONFIG_PREEMPT_NONE=y` the config so that it can be selected at boot.
 - teekert5 hours ago
 I think most people on enterprise-y systems wait for (at least) 26.04.1, the window is 3 years (when on 24.04, which is supported until ~2029-04-30, it's 1 year when on 22.04) starting now, hardly anyone switches immediately.
 - 99990000099910 hours ago
 Depends on your shop.As someone with a heavy QA/Dev Opps background I don't think we have enough details.Is it only ARM64 ? How many ARM64 PG DBs are running 96 cores?However...This is the most popular database in the world. Odds are this will effect a bunch of other lesser known applications.
 - whilenot-dev7 hours ago
 Please follow the complete thread: <a href="https://lore.kernel.org/lkml/xxbnmxqhx4ntc4ztztllbhnral2adogseot2bzu4g5eutxtgza@dzchaqremz32/" rel="nofollow">https://lore.kernel.org/lkml/xxbnmxqhx4ntc4ztztllbhnral2adog...</a>> [...] used huge_pages=on - as that is the only sane thing to do with 10s to 100s of GB of shared memory [...] if I disable huge pages, I actually can reproduce the contention [...]
- bombcar12 hours ago
 We need some sensible people running the latest and greatest or we won't catch things like this.
- stingraycharles12 hours ago
 That may be the case, but it’s still not a great situation to be in and one has to wonder: if PostgreSQL is affected, what else is?
 - bombcar11 hours ago
 That's the big thing - PSQL will be tested, noticed, and fixed (and likely have a version that handles 7.0 by the time it's in common use).But other software won't and may not even be noticed, except as a (I hate using the term) enshittification.Better to introduce the "correct way" in 7.0 but not regress the old (translate the "correct" into the old if necessary) - and then in 8.0 or some future release implement the regression.
 - stingraycharles11 hours ago
 Exactly, this is how it’s usually done. As the developer on the mailing list mentions, implementing a new low level construct in 7.0 and a performance regression that requires said new low level construct to mitigate is not great. You need a grace period in which both old and new approach is fast.
- Seattle35039 hours ago
 If you're running in a docker container you share the host kernel. You might not have a choice.
- cwillu11 hours ago
 The option to set PREEMPT_NONE was removed for basically all platforms.
- GandalfHN7 hours ago
 [dead]
- GandalfHN8 hours ago
 [dead]
harshreality12 hours ago
Background on PREEMPT_LAZY:<a href="https://lwn.net/Articles/994322/" rel="nofollow">https://lwn.net/Articles/994322/</a>
longislandguido12 hours ago
Anyone check to see if Jia Tan has submitted any kernel patches lately?
- rs_rs_rs_rs_rs8 hours ago
  They don't need to, there's about a billion bugs they can exploit.
bob10296 hours ago
I'm struggling a bit with why we need all these fancy dynamic preemption modes. Is this about hyperscalars shoving more VMs per physical machine? What does a person trying to host a software solution gain from this kernel change?If a user wants to spin in an infinite loop all day every day, I don't see the problem with that. Even if the spinning will provably never do any useful work.
- ponco4 hours ago
 more throughput WITHOUT huge tail latency is my understanding. A user above posted this link <a href="https://lwn.net/Articles/994322/" rel="nofollow">https://lwn.net/Articles/994322/</a> which goes into the background. My mental model is "give the kernel more explicit information" and it will be able to make better decisions
cperciva12 hours ago
This makes me feel better about the 10% performance regression I just measured between FreeBSD 14 and FreeBSD 15.0.
- db48x9 hours ago
  Heh. Did they at least add useful features to balance out that cost?
Deeg9rie9usi7 hours ago
Once again phoronix shoot out an article without further researching nor letting the mail thread in question cool down. The follow up mails make clear that the issue is more or less a non-issue since the benchmark is wrong.
- adrian_b6 hours ago
 The following up mails conclude that the regression happens only when huge pages are not used.While using huge pages whenever possible is the right solution and this should be enough for PostgreSQL, perhaps there are applications that cannot use huge pages and which are affected by the regression.So I do not think that it is right to just ignore what happened.
 - Deeg9rie9usi5 hours ago
 I agree with you. The lurid headlines of phoronix.com just annoy me...
galbar12 hours ago
It's not a good look to break userspace applications without a deprecation period where both old and new solutions exist, allowing for a transition period.
FireBeyond13 hours ago
Once upon a time, Linus would shout and yell about how the kernel should never "break" userspace (and I see in some places, some arguments of "It's not broken, it's just a performance regression" - personally I'd argue a 50% hit to performance of a pre-eminent database engine is ... quite the regression).Now, the kernel engineer who introduced the brand new mechanism (introduced in Linux 7.0) for handling pre-emption says the "fix" is for Postgres to start using this new mechanism (I think the sister comment below links to what one of the Postgres engineers thinks of that, and I'm inclined mostly to agree).
- shakna7 hours ago
 Freund seems to suggest that hugepages is the right way to run a system under this sort of load - which is the fix.> Hah. I had reflexively used huge_pages=on - as that is the only sane thing to do with 10s to 100s of GB of shared memory and thus part of all my benchmarking infrastructure - during the benchmark runs mentioned above.> Turns out, if I disable huge pages, I actually can reproduce the contention that Salvatore reported (didn't see whether it's a regression for me though). Not anywhere close to the same degree, because the bottleneck for me is the writes.But, they can speak for themselves here [0].[0] <a href="https://news.ycombinator.com/item?id=47646332">https://news.ycombinator.com/item?id=47646332</a>
- perching_aix12 hours ago
 Entertaining perspective - I thought that the whole "it's not an outage it's a (horizontal or vertical) degradation" thing was exclusive to web services, but thinking about it, I guess it does apply even in cases like this.
- MBCook10 hours ago
 It wouldn’t be the first time one of the other maintainers ran afoul of “Linus’s law“.He may simply be waiting until more is known on exactly what’s causing it.
- bear864212 hours ago
 > I'd argue a 50% hit to performance [...] is ... quite the regressionIndeed! Especially if said regression happens to impact anything trade/market related...
- arjie10 hours ago
 Well, the reason he'd yell about it is that someone did it. If no one ever did it, he'd never yell and we'd never have the rule. So one can only imagine that this is one of those things where someone has to keep holding the line rather than one of those things where you set some rule and it self-holds.Doubtless someone will have to do the yelling.
- quietsegfault11 hours ago
 This was my immediate thought - kernel doesn’t break software, or at least it didn’t used to.
anal_reactor8 hours ago
Can someone explain to me what's the problem? I have very little knowledge of Linux kernel, but I'm curious. I've tried reading a little, but it's jargon over jargon.
- alienchow6 hours ago
 I'm not familiar with the jargon either, but based on some reading it comes down to how the latest kernel treats process preempts.Postgres uses spinlocks to hold shared memory for very critical processes. Spinlocks are an infinite loop with no sleep to attempt to hold a lock, thus "spinning". Previous kernels allowed spinlocking processes to run with PREEMPT_NONE. This flag tells the kernel to let the locking process complete their work before doing anything. Now the latest kernel removed this functionality and is interrupting spinlocking processes. So if a process that is holding a lock gets interrupted, all other postgres spinlocks processes that need the same lock spin in place for way longer times, leading to performance degradation.
 - anal_reactor6 hours ago
 Why does it only appear on arm64 and not x86?
 - adrian_b5 hours ago
 It was not architecture-related. Not using huge pages also reproduced the regression on x86.I do not know why using huge pages mitigates the regression, but it could be just because when the application uses huge pages it uses spinlocks much less frequently so the additional delays do not accumulate enough to cause a significant performance reduction.
 - tux35 hours ago
 The problem is the spinlock being interrupted by a minor fault (you're touching a page of memory for the first time, and the kernel needs to set it up the first time it's actually used)If your pages are 1GB instead of 4kB, this happens much less often.
- tijsvd5 hours ago
 From what I understand in the follow up: postgres uses shared memory for buffers. This shared memory is read by a new connection while locked.In postgres, connections are handled with a process fork, not a new thread. If such a fork first reads memory, even if it already exists, that causes a minor page fault, which goes back to the kernel so it can update memory mapping tables.The operation under lock is only a few instructions, but if it takes longer than expected, then that causes lock contention. Regression in the kernel handling minor faults?The whole thing is then made worse because it's a spinlock, causing all waiting processes to contend over the cpus which adds to kernel processing.Mitigated by using huge pages, which dramatically reduces the number of mapping entries and faults. I reckon that it could also be mitigated in postgres by pre-faulting all shared memory early?
cdelsolar9 hours ago
<a href="https://lkml.org/lkml/2012/12/23/75" rel="nofollow">https://lkml.org/lkml/2012/12/23/75</a>
up2isomorphism7 hours ago
Not sure why people have to upgrade to the newest major kernel version as soon as it is released.
- conradludgate7 hours ago
 It's the performance team's job to test these things. Doesn't mean they're going to deploy it immediately.Someone should be testing these things and reporting regressions
- jeltz4 hours ago
 If nobody tests and reports these things when the version is released the regression would not be fixed when people start using it in production.
- IshKebab6 hours ago
 Don't make excuses.
lossoth5 hours ago
[dead]
carlsborg4 hours ago
Perhaps in due time we will see workload specific forks of Linux maintained by a team of agents