<i>"Overall, we find a Postgres server can handle up to 144K of these writes per second. That’s a lot, equivalent to 12 billion writes per day."</i><p>Based on a problem I'm facing with Postgres today, I wonder if this really progresses as linearly as the article wants to make it out.<p>We're in the middle of evaluating Postgres as a replacement for MySQL, and experience notable slow-down for plain multi-row inserts due to index growth as soon as the table reaches just a few dozen million rows. It's an uncomplicated and flat (no constraints or foreign keys etc.) medium width table of about 10-15 columns and just a handful of non-composite btree indices - and/or hash indices; we've tried mixing and matching just to see what happens - but ingestion drops to less than half already before 50m rows. At 100m rows the insertion performance is down to a fraction and from there it just gets worse the larger the table and its indices grow. It's as if there's some specific exponential cut-off point where everything goes awry. However, if we simply remove all indices from the table, Postgres will happily insert hundreds of millions rows at a steady and near identical pace from start to end. The exact same table and indices on MySQL, as closely as we can match between MySQL and Postgres, running on the same OS and hardware, maintains more or less linear insertion performance well beyond 500m rows.<p>Now, there's a lot to say about the whys and why-nots when it comes to keeping tables of this size in an RDBMS and application design relying on it to work out, and probably a fair amount more about tuning Postgres' config, but we're stumped as to why PG's indexing performance falters this early when contrasted against InnoDB/MySQL. 50-100m rows really isn't much. Would greatly appreciate if anyone with insight could shed some light on it and maybe offer a few ideas to test out.<p>(add.: during these stress tests the hardware is nowhere close to over-encumbered, and there's consistent headroom on both memory, CPU and disk I/O)
It scales beyond the needs that most people have in most situations.<p>The constant problem is that "big scale" always means "larger than I've seen", so on any project larger than a person has encountered, they assume they need to pull out the big guns. Also, people worry about things like what happens if they really *do* scale 10 years from now.<p>Neither is a practical concern for nearly anyone who will ever face this decision.<p>And then yes, of course, some people have problems that actually can't be solved by Postgres. But verify this first, don't assume.
Yes, you can scale it quite well vertically.<p>But how about horizontally? It would be nice to have high availability, or even to be able to upgrade the OS and postgres itself without downtime.
Practically trivial to do in 2026 even by hand, and there are a couple of ready to use solutions that even make it automated.<p>If you're running it in kubernetes with cloudnativepg it's even easier.<p>The only thing it doesn't do well is master master replication which is why most of these does it scale posts mostly talk about how slow writes are. And they are pretty slow.
Yep, this is what I think about when “scaling” is mentioned. Maybe I’m too distributed-compute brained, but throwing CPU at a db isn’t what I was hoping would be the answer.
Only played around with it but you can use patroni, etcd and HAproxy to achieve this. It’s a pain, but I believe there was some kind of coolify-style open source application to do this for you but I can’t for the life of me remember its name
autobase[1] is the one I can think of<p>[1] <a href="https://github.com/autobase-tech/autobase" rel="nofollow">https://github.com/autobase-tech/autobase</a>
You might be thinking of Pigsty?<p>Atleast I hope you are! Nothing else has been as well battletested. Unfortunately, perhaps because if its name, it gets no facetime on HN. Its last few mentions here barely received attention it deserved.
DBOS is amazing when it comes to Durable Workflows. There are others in the space - the most popular one being Temporal but I argue, Temporal is also the most complicated one. I often say Temporal is like Kubernetes while DBOS is like `docker compose`. (and for those taking me literally, you can use DBOS in Kubernetes!)<p>I don't realize why DBOS is not nearly as popular as Temporal but it has made a world of difference building Durable Queues and Long Running, Durable Workflows in Python (it supports other languages too).<p>As they show in this article, Postgres scales impressively well (4 billion workflows per day, on a db.m7i.24xlarge, enough for most applications), which is why, if you have your PostgreSQL backup/restore strategy knocked out and dialed in, you should really take a close look at DBOS to handle your cloud agnostic or self hosted Durable Queues and Durable Workflows. It's an amazing piece of software founded by the original author of Ingres (precusor to Postgres - the story of DBOS itself is captivating. I believe it started from being unable to scale Spark job scheduling)
when discussing DB it becomes so so interesting not because db itself but the people trying to ask some infeasible questions