5 comments

  • daneel_w20 minutes ago
    <i>&quot;Overall, we find a Postgres server can handle up to 144K of these writes per second. That’s a lot, equivalent to 12 billion writes per day.&quot;</i><p>Based on a problem I&#x27;m facing with Postgres today, I wonder if this really progresses as linearly as the article wants to make it out.<p>We&#x27;re in the middle of evaluating Postgres as a replacement for MySQL, and experience notable slow-down for plain multi-row inserts due to index growth as soon as the table reaches just a few dozen million rows. It&#x27;s an uncomplicated and flat (no constraints or foreign keys etc.) medium width table of about 10-15 columns and just a handful of non-composite btree indices - and&#x2F;or hash indices; we&#x27;ve tried mixing and matching just to see what happens - but ingestion drops to less than half already before 50m rows. At 100m rows the insertion performance is down to a fraction and from there it just gets worse the larger the table and its indices grow. It&#x27;s as if there&#x27;s some specific exponential cut-off point where everything goes awry. However, if we simply remove all indices from the table, Postgres will happily insert hundreds of millions rows at a steady and near identical pace from start to end. The exact same table and indices on MySQL, as closely as we can match between MySQL and Postgres, running on the same OS and hardware, maintains more or less linear insertion performance well beyond 500m rows.<p>Now, there&#x27;s a lot to say about the whys and why-nots when it comes to keeping tables of this size in an RDBMS and application design relying on it to work out, and probably a fair amount more about tuning Postgres&#x27; config, but we&#x27;re stumped as to why PG&#x27;s indexing performance falters this early when contrasted against InnoDB&#x2F;MySQL. 50-100m rows really isn&#x27;t much. Would greatly appreciate if anyone with insight could shed some light on it and maybe offer a few ideas to test out.<p>(add.: during these stress tests the hardware is nowhere close to over-encumbered, and there&#x27;s consistent headroom on both memory, CPU and disk I&#x2F;O)
  • jghn28 minutes ago
    It scales beyond the needs that most people have in most situations.<p>The constant problem is that &quot;big scale&quot; always means &quot;larger than I&#x27;ve seen&quot;, so on any project larger than a person has encountered, they assume they need to pull out the big guns. Also, people worry about things like what happens if they really *do* scale 10 years from now.<p>Neither is a practical concern for nearly anyone who will ever face this decision.<p>And then yes, of course, some people have problems that actually can&#x27;t be solved by Postgres. But verify this first, don&#x27;t assume.
  • q3k49 minutes ago
    Yes, you can scale it quite well vertically.<p>But how about horizontally? It would be nice to have high availability, or even to be able to upgrade the OS and postgres itself without downtime.
    • literalAardvark14 minutes ago
      Practically trivial to do in 2026 even by hand, and there are a couple of ready to use solutions that even make it automated.<p>If you&#x27;re running it in kubernetes with cloudnativepg it&#x27;s even easier.<p>The only thing it doesn&#x27;t do well is master master replication which is why most of these does it scale posts mostly talk about how slow writes are. And they are pretty slow.
    • levl28928 minutes ago
      Yep, this is what I think about when “scaling” is mentioned. Maybe I’m too distributed-compute brained, but throwing CPU at a db isn’t what I was hoping would be the answer.
      • _3u1015 minutes ago
        So the point of distributed compute is to reduce the compute needed? I’ve generally found that distributed compute requires more compute than vertical scaling while getting clobbered by network bandwidth &#x2F; latency.<p>Theoretically with 2 to 10x compute required and in practice 100 to 500x
        • literalAardvark11 minutes ago
          The point of distributed computing is to do computing that you can&#x27;t do on a vertically scaled system or to increase availability.<p>If you&#x27;re doing it for other reasons it&#x27;s usually a mistake.
    • tuvix37 minutes ago
      Only played around with it but you can use patroni, etcd and HAproxy to achieve this. It’s a pain, but I believe there was some kind of coolify-style open source application to do this for you but I can’t for the life of me remember its name
      • jrnkntl25 minutes ago
        autobase[1] is the one I can think of<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;autobase-tech&#x2F;autobase" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;autobase-tech&#x2F;autobase</a>
      • subhobroto20 minutes ago
        You might be thinking of Pigsty?<p>Atleast I hope you are! Nothing else has been as well battletested. Unfortunately, perhaps because if its name, it gets no facetime on HN. Its last few mentions here barely received attention it deserved.
  • subhobroto1 hour ago
    DBOS is amazing when it comes to Durable Workflows. There are others in the space - the most popular one being Temporal but I argue, Temporal is also the most complicated one. I often say Temporal is like Kubernetes while DBOS is like `docker compose`. (and for those taking me literally, you can use DBOS in Kubernetes!)<p>I don&#x27;t realize why DBOS is not nearly as popular as Temporal but it has made a world of difference building Durable Queues and Long Running, Durable Workflows in Python (it supports other languages too).<p>As they show in this article, Postgres scales impressively well (4 billion workflows per day, on a db.m7i.24xlarge, enough for most applications), which is why, if you have your PostgreSQL backup&#x2F;restore strategy knocked out and dialed in, you should really take a close look at DBOS to handle your cloud agnostic or self hosted Durable Queues and Durable Workflows. It&#x27;s an amazing piece of software founded by the original author of Ingres (precusor to Postgres - the story of DBOS itself is captivating. I believe it started from being unable to scale Spark job scheduling)
  • JasonHEIN56 minutes ago
    when discussing DB it becomes so so interesting not because db itself but the people trying to ask some infeasible questions