Vacuum Is a Lie: About Your Indexes

(boringsql.com)

89 points by birdculture55 days ago

13 comments

ZeroConcerns54 days ago
I think this article goes a bit overboard with the negative language ('lies', 'fools'), especially since (auto)VACUUM and indexes really don’t have that much to do with each other: the former is indeed critical on PostgreSQL to ensure availability, but something of a niche feature for most other databases, while index maintenance is important regardless of platform.For a certain class of applications ('SQLite level'), there’s not even much of that, though, other than ensuring there are no missing or obsolete indexes, which you can take care of with 15 minutes of quality time with the EXPLAIN statement every now and then.When using a database with persistent index statistics (like SQL Server and Oracle and, yeah, PostgreSQL), it’s important to at least ensure those get updated on a regular basis (but that’s almost always automatic and sufficient unless you're prone to not-usually-done bulk operations) and to optimize or rebuild the underlying tree on a semi-regular basis. This does require some additional non-default setup and monitoring, and can be surprising when you first encounter it.But it’s not exactly an obscure-slash-secret bit of DBA lore either, unlike what's suggested here...
- radimm54 days ago
 Author here - thank you for the comments. This article is indeed playing a lot on verge of clickbait and I did asked about that shortly after publishing.
 - ZeroConcerns54 days ago
 No worries -- not publishing at all is worse than publishing disliked content (well, to a certain extent), so keep reading that feedback, but don't be too discouraged by it!
 - epgui54 days ago
 It’s very unpleasant to read. I did find the article useful nonetheless.
- chubot54 days ago
 There is a bunch of AI slop in there ... It does seem like the author probably knows what he's talking about, since there is seemingly good info in the article [1], but there's still a lot of slopAlso, I think the end should be at the beginning:Know when your indexes are actually sick versus just breathing normally - and when to reach for REINDEX.VACUUM handles heap bloat. Index bloat is your problem.The intro doesn't say that, and just goes on and on about "lies" and stupid stuff like that.This part also feels like AI:Yes. But here's what it doesn't do - it doesn't restructure the B-tree.What VACUUM actually doesWhat VACUUM cannot doI don't necessarily think this is bad, since I know writing is hard for many programmers. But I think we should also encourage people to improve their writing skills.[1] I'm not an SQL expert, but it seems like some of the concrete examples point to some human experience
 - radimm54 days ago
 Author here – it’s actually funny, as you pointed out parts that are my own (TM) attempts to make it a bit lighthearted.LLM is indeed used for correction and improving some sentences, but the rest is my honest attempt at making writing approachable. If you’re willing to invest the time, you can see my fight with technical writing over time if you go through my blog.(Writing this in the middle of a car wash on my iPhone keyboard ;-)
 - ZeroConcerns54 days ago
 Yeah, I get accused of being an LLM all the time as well, best to ignore that kind of slop... (which, ironically, goes both ways!)
 - anotherhue54 days ago
 Yeah my eyes glaze over when I see the familiar tone.If it's not worth writing it sure ain't worth reading.
 - batisteo54 days ago
 Sorry, you lost at the Turing test
 - chubot54 days ago
 A better title might have been VACUUM addresses heap bloat; REINDEX addresses index bloatSimilar to a recent story Go is portable, until it isn't -- the better title is Go is portable until you pull in C dependencies<a href="https://lobste.rs/s/ijztws/go_is_portable_until_it_isn_t" rel="nofollow">https://lobste.rs/s/ijztws/go_is_portable_until_it_isn_t</a>
chuckadams54 days ago
Shorter:* VACUUM does not compact your indexes (much).* VACUUM FULL does. It's slow though.
- kccqzy54 days ago
 You also missed the main recommendation of REINDEX INDEX CONCURRENTLY along with optionally pg_squeeze.
- karolinepauls54 days ago
 That's too reductive. Vacuum full isn't just slow, it exclusively locks the table for the duration of the vacuum and is basically a no-go when the database is in use.
didgetmaster54 days ago
This article points out some of the pain associated with index maintenance. It should also point out that ALL indexes on a table suffer from the same issue. If your 20 column table has 7 indexes, then the suggestions should be applied 7x.It is conventional wisdom that indexes are absolutely essential for any relational table of at least reasonable size (e.g. thousands of rows) and is accessed more often than daily. Indexes can be a pain to create and maintain; but can greatly speed up queries and primary key validations. The pain mostly comes from having to figure out what indexes to create and how often to maintain them, rather than doing the actual thing.Indexes also have a performance penalty for any table updates. Creating new rows, updating existing rows, or deleting rows all require updates to each index.But are indexes really required? I am creating a new kind of general purpose data management system (a kind of object store) called Didgets. The tagging mechanism that I invented to allow tags to be attached to each data object, are key-value stores that essentially form a set of columnar stores.I found that these columnar stores could also be used to create regular relational database tables. The data is structured such that indexes are not needed. All the tests that I have run (up to a thousand columns with over 100 million rows), show that query speeds are equal to, or better than other database systems that are well indexed.The system is still under development, so it is still missing some key features that would make it a drop-in replacement for other databases; but it proves that it is possible to structure relational data such that query speeds can be optimal without needing separate indexing structures that have to be maintained.
- setr54 days ago
 I’m not clear on how you’re deviating from a normal columnar/OLAP database?> I found that these columnar stores could also be used to create regular relational database tables.Doesn’t every columnar store do this? Redshift, IQ, Snowflake, ClickHouse, DuckDB etc> but it proves that it is possible to structure relational data such that query speeds can be optimal without needing separate indexing structures that have to be maintained.Doesn’t every columnar database already prove this?
 - didgetmaster54 days ago
 I am not an expert on all the other columnar stores out there; but it is my understanding that they are used almost exclusively for OLAP workloads. By 'regular database tables', I meant those that handle transaction processing (inserts, updates, deletes) along with queries.My system does analytics well, but it is also very fast with changing data.I also think that some of those systems (e.g. Duckdb) also use indexes.
 - setr54 days ago
 They’re used by OLAP workloads because columnar properties fits better — namely, storing data column-wise obviously makes row-wise operations more expensive, and column-wise operations cheaper; this usually corresponds to point look-ups vs aggregations. Which cascades into things like constraint-maintenance being more expensive, row-level triggers becoming a psychotic pattern, etc. Column-wise (de-)compression also doubles-down on this.They still do all the regular CRUD operations and maintain transactional semantics; they just naturally prefer bulk operations.Redshift is the most pure take on this I’ve seen; to the point that they simply don’t support most constraints, triggers and data is allocated in 2MB immutable chunks such that non-bulk-operations undergo ridiculous amounts of write amplification and slow to a crawl. Afaik other OLAP databases are not this extreme, and support reasonable throughput on point-operations (and triggers, constraints, etc) — in the sense that it’s definitely slower, but not comically slower. (Aside: Aurora is also a pure take on transactional workloads, such that bulk aggregations are comically slow)> I also think that some of those systems (e.g. Duckdb) also use indexes.I’m pretty sure they all use indexes, in the same fashion I expect you to (I’m guessing your system doesn’t do table-scans for every single query). Columnar databases just get indexes like zone-maps for “free”, in the sense that it can simply be applied on top of the actual dataset without having to maintain a separate copy of the data ALA row-wise databases do. So it’s an implicit index automatically generated on every column — not user-maintained or specified. I expect your system does exactly the same (because it would be unreasonable not to)> My system does analytics well, but it is also very fast with changing data.Talk more, please & thank you. I expect everything above to be inherent properties/outcomes of the data layout so I’m quite curious what you’ve done
 - didgetmaster53 days ago
 Several of your assumptions are correct.My project Didgets (short for Data Widgets), started out as a file system replacement. I wanted to create an object store that would store traditional file data, but also make file searches much faster and more powerful than other file systems allow; especially on systems with hundreds of millions of files on them. To enhance this, I wanted to be able to attach contextual tags to each Didget that would make searches much more meaningful without needing to analyze file content during the search.To facilitate the file operations, I needed data structures to support them. I decided that these data structures (used/free bitmaps, file records, tags, etc.) should be stored and managed within other Didgets that had special handling. Each tag was basically a key-value pair that mapped the Didget ID (key) to a string, number, or other data type (value).Rather than rely on some external process like Redis to handle tags, I decided to build my own. Each defined tag has a data type and all values for that tag are stored together (like column values in a columnar store). I split the tag handling into two distinct pieces. All the values are deduplicated and reference counted and stored within a 'Values Didget'. The keys (along with pointers to the values) are stored within a "Links Didget'.This makes analytic functions fast (each unique value is stored only once) and allows for various mapping strategies (one-to-one, one-to-many, many-to-one, or many-to-many). The values and the links are stored within individual blocks that are arranged using hashes and other meta-data constraints. For any given query, usually only a small number of blocks need to be inspected.I expected analytic operations to be very fast, like with other OLAP systems; but I was pleasantly surprised at how fast I could make traditional OLTP operations run on it.I have some short demo videos that show not only what it can do, but also benchmark many operations against other databases. Links to the videos are in my user profile.
- arcbyte54 days ago
 Interesting ideas. Im very interested in database ideas that bring new capabilities or better ways to acconplish old ones.W.r.t. query speeds on your columnar storage engine, you will obviously have much better writes that row oriented storage engines. This limits your write capabilities though. Any effort you put into restoring write speeds necessitates an extra step to the maintain the columnar stores - which puts you back into the group of databases naintaining indices that you criticize above.I think modern databases are bringing new ideas on how to accelerate both write and query speeds simultaneously with tradeoffs around CAP.
 - didgetmaster54 days ago
 Although I have done many more benchmark testing against other databases for query speeds; I haven't noticed any significant speed degradation on writes.Could you clarify what you mean by 'this limits your write capabilities'?
 - arcbyte54 days ago
 > W.r.t. query speeds on your columnar storage engine, you will obviously have much better writes that row oriented storage engines.This should have said reads, not writes. Columnar storage takes significantly more effort to handle writes because it must do many more IOs across the different columns, potentially more de/compression cycles, etc.
lfittl54 days ago
The article has a section where it estimates index bloat based on comparing the number of index reltuples * 40 bytes (?), compared to the size of the file on disk.This is problematic, first of all because I don't think the math is right (see [0] for a more comprehensive query that takes into account column sizes), and second because it ignores the effects of B-Tree index deduplication in Postgres 13+: [1]In my experience, fast bloat estimation queries can work okay for table bloat, but for index bloat I'd recommend instead looking at the change in page density over time (i.e. track relpages divided by reltuples), or just go direct to running pgstatindex outside business hours.[0]: <a href="https://github.com/pgexperts/pgx_scripts/blob/master/bloat/index_bloat_check.sql" rel="nofollow">https://github.com/pgexperts/pgx_scripts/blob/master/bloat/i...</a> [1]: <a href="https://www.postgresql.org/docs/current/btree.html#BTREE-DEDUPLICATION" rel="nofollow">https://www.postgresql.org/docs/current/btree.html#BTREE-DED...</a>
- dap54 days ago
 Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be. (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)
 - anarazel54 days ago
 > Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be.It's a question of resource margins. If you have regular and predictable windows of low resource utilization, you can afford to run closer to the sun during busy periods, deferring (and amortizing, to some degree) maintenance costs till later. If you have a 24/7/365 service, you need considerably higher safety margins.Also, there's a lot of terrible advice on the internet, if you haven't noticed.> (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful? Like, would it useful to criticize oxides lack of production hardware availability in 2021 or so?Edit: duplicated word removed
 - dap54 days ago
 > It's a question of resource margins.What you describe is true and very important (more margin lets you weather more disruption), but it's not the whole story. The problem we had was queueing delays mainly due to I/O contention. The disks had the extra IOPS for the maintenance operation, but the resulting latency for all operations was higher. This meant overall throughput decreased when the maintenance was going on. The customer, finally accepting the problem, thought: "we'll just build enough extra shards to account for the degradation". But it just doesn't work like that. If the degradation is 30%, and you reduce the steady-state load on the database by 30%, that doesn't change the fact that when the maintenance is ongoing, even if the disks have the IOPS for the extra load, latency goes up. Throughput will still degrade. What they wanted was predictability but we just couldn't give that to them.> To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful?First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).I admit that some of this has been hard for me personally to let go. These issues dominated my professional life for three very stressful years. For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc. And we certainly did make mistakes! But many of those problems were later acknowledged by the community. And many have been improved -- which is great! What remains is me feeling triggered when it feels like users' pain is being casually dismissed.I'm sorry I let my crankiness slip into the comment above. I try to leave out the emotional baggage. Nonetheless, I do feel like it's a problem that, intentionally or otherwise, a lot of the user base has absorbed the idea that it's okay for necessary database maintenance to significantly degrade performance because folks will have some downtime in which to run it.*
 - anarazel53 days ago
 > First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).I said oxide, because it's come up so frequently and at such length on the oxide podcast... Without that I probably wouldn't have commented here. It's one thing to comment on bad experiences, but at this point it feels like more like bashing. And I feel like an open source focused company should treat other folks working on open source with a bit more, idk, respect (not quite the right word, but I can't come up with a better one right now).I probably shouldn't have commented on this here. But I read the message after just having spent a Sunday morning looking into a problem and I guess that made more thin skinned than usual.> For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc.I agree that the wider community sometimes has/had the issue of excusing away postgres problems. While I try to avoid doing that, I certainly have fallen prey to that myself.Leaving fandom like stuff aside, there's an aspect of having been told over and over we're doing xyz wrong and things would never work that way, and succeeding (to some degree) regardless. While ignoring some common wisdom has been advantageous, I think there's also plenty where we just have been high on our own supply.> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.Was that done in this thread?
 bcantrill53 days ago
 I don't agree that we have been "bashing" Postgres. As far as I can tell, Postgres has come up a very small number of times over the years: certainly on the CockroachDB episode[0] (where our experience with Postgres is germane, as it was very much guiding our process for finding a database for Oxide) and then again this year when we talked about our use of statemaps on a Rust async issue[1] (where our experience with Postgres was again relevant because it in part motivated the work that we had used to develop the tooling that we again used on the Rust issue).I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.[0] <a href="https://oxide-and-friends.transistor.fm/episodes/whither-cockroachdb" rel="nofollow">https://oxide-and-friends.transistor.fm/episodes/whither-coc...</a>[1] <a href="https://oxide-and-friends.transistor.fm/episodes/when-async-attacks" rel="nofollow">https://oxide-and-friends.transistor.fm/episodes/when-async-...</a>
 anarazel53 days ago
 I'm certainly very biased (having worked on postgres for way too long), so it's entirely plausible that I've over-observed and over-analyzed the criticism, leading to my description.> I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.I agree that criticism is important and worthwhile! It's helpful though if it's at least somewhat actionable. We can't travel back in time to fix the problems you had in the early 2010s... My experience of the criticism of the last years from the "oxide corner" was that it sometimes felt somewhat unrelated to the context and to today's postgres.> if one views recounting that experience as showing insufficient "respect" to its developersI should really have come up with a better word, but I'm still blanking on choosing a really apt word, even though I know it exists. I could try to blame ESL for it, but I can't come up with a good German word for it either... Maybe "goodwill". Basically believing that the other party is trying to do the right thing.
 dap53 days ago
 >> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.> Was that done in this thread?Well, I raised a general problem around 24/7/365 use cases (rooted in my operational experience, reinforced by the more-current words that I was replying to and the OP) and you called it "tedious", "low-info griping". Yes, that seems pretty dismissive.(Is it fair? Though I thought the podcast episodes were fairly specific, they probably glossed over details. They weren't intended to be about those issues per se. I did write a pretty detailed post though: <a href="https://www.davepacheco.net/blog/2024/challenges-deploying-postgresql-9.2-for-high-availability/" rel="nofollow">https://www.davepacheco.net/blog/2024/challenges-deploying-p...</a> (Note the prominent caveat at the top about the experience being dated.))You also wrote:> running an, at the time, outdated postgres, on an outdated OSYes, pointing to the fact that the software is old and the OS is unusual (it was never outdated; it was just not Linux) are common ways to quickly dismiss users' problems. If the problems had been fixed in newer versions, that'd be one thing. Many (if not all) of them hadn't been. But also: the reason we were running an old version was precisely that it was a 24/7/365 service and there was no way to update databases without downtime, especially replicated ones, nor a great way to mitigate risk (e.g., a mode for running the new software without updating the on-disk format so that you can go back if it's a disaster). This should be seen as a signal of the problem, not a reason to dismiss it (as I feel like you're doing here). As for the OS, I can only think of one major issue we hit that was OS-specific. (We did make a major misconfiguration related to the filesystem that certainly made many of our issues much worse.)I get that it sucks to keep hearing about problems from years ago. All of this was on 9.2 - 9.6 -- certainly ancient today. When this comes up, I try to balance sharing my operational experience with the fact that it's dated by just explaining that it's dated. After all, all experience is dated. Readers can ignore it if they want, do some research, or folks in the PostgreSQL world can update me when specific things are no longer a problem. That's how I learned that the single-threaded WAL receiver had been updated, apparently in part because of our work: <a href="https://x.com/MengTangmu/status/1828665449850294518" rel="nofollow">https://x.com/MengTangmu/status/1828665449850294518</a> (full thread: <a href="https://x.com/MengTangmu/status/1828665439234474350" rel="nofollow">https://x.com/MengTangmu/status/1828665439234474350</a>). I'll happily share these updates wherever I would otherwise share my gripes!
 - lfittl54 days ago
 Regarding pgstattuple specifically: If this was a 24/7/365 service and you would be concerned by the I/O impact of loading the full table or index at any time, you could run this on a replica too. For tables there is pgstattuple_approx which is much better at managing its impact, but there is no equivalent for indexes today.The REINDEX CONCURRENTLY mentioned in OP could also be run at other times of the day - the main issue is again I/O impact (with potentially some locking concerns at the very end of the reindex concurrently to swap out the index).There are no magic solutions here - other databases have to deal with the same practical limitations, though Postgres sometimes is a bit slow to adopt operational best practices in core (e.g. the mentioned pg_squeeze from OP may finally get an in-core "REPACK CONCURRENTLY" equivalent in Postgres 19, but its been a long time to get there)
 - radimm54 days ago
 OP here - thank you for mentioning replica, forgot to mention it. Unless it would hit it pretty hard and `hot_standby_feedback` is on
 - jeffbee54 days ago
 Postgres-compatible cloud databases like AlloyDB address this issue.
dap54 days ago
> The exclusive lock is only needed during the final swap phase, and its duration can be configured.FYI: even a very short operation that requires an exclusive lock can induce significant downtime if there’s anything else that holds a shared lock for extended periods. In [1], there was:- a wraparound autovacuum (which holds a shared lock for potentially a long time — like hours)- lots of data path operations wanting a shared lock- one operation that should have been very brief that merely tried to take an exclusive lockThe result is that the presence of an operation wanting an exclusive lock blocked the data path for the duration of the autovacuum. Major outage.[1] <a href="https://web.archive.org/web/20190320162510/https://www.joyent.com/blog/manta-postmortem-7-27-2015" rel="nofollow">https://web.archive.org/web/20190320162510/https://www.joyen...</a>Edit: this was a while ago with v9.2, but I don’t know if any of this behavior has changed.
dgreensp54 days ago
Upvoted because educational, despite the AI-ness and clickbait.I’ve worked at orgs that used Postgres in production, but I’ve never been the one responsible for tuning/maintenance. I never knew that Postgres doesn’t merge pages or have a minimum page occupancy. I would have thought it’s not technically a B-tree if it doesn’t.
JohnCClarke54 days ago
I'm (genuinely) curious about the overwhelming preference for PostgreSQL on HN. I've always used MySQL for OLTP, and been very happy with it.If you've seriously considered both and then selected PostgreSQL please comment and tell me what drove that decision.Note: I'm only talking about OLTP. I do see that PostgreSQL adds a lot for OLAP.
- tharkun__54 days ago
 Personally it's the history for me. MySQL started with MyISAM, not innodb.So if you wanted an actual transactional database back in the day, MySQL was definitely not it. You needed Postgres.InnoDB was not MySQL. It was an add on. So if I had to use MySQL it was with innodb of course but why not just use Postgres. And after the Oracle acquisition... Yes I know MariaDB. But I'm already on Postgres so...
- turtles354 days ago
 I'm also curious about this, especially if anyone has operated postgres at any kind of scale. At low scale, all databases are fine (assuming you understand what the particular database you're using does and doesn't guarantee).Postgres has some really great features from a developer point of view, but my impression is that it is much tougher from an operations perspective. Not that other databases don't have ops requirements, but mysql doesn't seem to suffer from a lot of the tricky issues, corner cases and footguns that postgres has (eg. Issues mentioned in a sibling thread around necessary maintenance having no suitable window to run at any point in the day). Again I note this is about ops, not development. Mysql has well known dev footguns. Personally I find Dev footguns easier to countenance because they likely present less business risk than operational ones. I would like to know if I am mistaken in this impression.
throwaway61374554 days ago
Dont' forget to ANALYZE your tables sometimes too.Just recently was trying to optimize a 12s index scan, turns out I didn't need to change anything about the query I just had to update the table statistics. 12s down to 100ms just form running ANALYZE (no vacuum needed).
- karolinepauls54 days ago
 And make sure your `random_page_cost` is about 1.1 if running on an SSD or if >~98% of your hot pages fit in memory. Rather than 4 by default which makes the planner afraid of using indexes.
- magicalhippo54 days ago
 We added a weekly job to do that during low activity hours as a preventive measure. It's not often the planner incorrectly goes for a table scan due to bad statistics but when it does it's a big issue.So we just so it proactively now.
- tehlike54 days ago
 there are ways to see out which indexes get used and which are not. It's surprising to find out table scans or incorrect indexes getting used.
gethly54 days ago
Ah, the beloved PGSQL and its decades old obsolete MVCC model with all it's goodies :)
apothegm54 days ago
Good article, and easy to follow. I learned more than I’d expected from this one.
horizion202554 days ago
"When VACUUM runs, it removes those dead tuples and compacts the remaining rows within each page. If an entire page becomes empty, PostgreSQL can reclaim it entirely."Is this true? I was of the belief that standard vacuum doesnt move any data even within a page... It merely enables dead tuples to be reused in the future. But I could be mistaken
- lfittl54 days ago
 I think there are two aspects to that:1) When do pages get removed? (file on disk gets smaller)Regular vacuum can truncate the tail of a table if those pages at the end are fully empty. That may or may not happen in a typical workload, and Postgres isn't particular about placing new entries in earlier pages. Otherwise you do need a VACUUM FULL/pg_squeeze.2) Does a regular VACUUM rearrange a single page when it works on it? (i.e. remove empty pockets of data within an 8kb page, which I think the author calls compacting)I think the answer to that is yes, e.g. when looking at the Postgres docs on page layout [0] the following sentence stands out: "Because an item identifier is never moved until it is freed, its index can be used on a long-term basis to reference an item, even when the item itself is moved around on the page to compact free space". That means things like HOT pruning can occur without breaking index references (which modify the versions of the tuple on the same page, but keep the item identifier in the same place), but (I think) during VACUUM, even breaking index references is allowed when cleaning up dead item identifiers.[0]: <a href="https://www.postgresql.org/docs/current/storage-page-layout.html" rel="nofollow">https://www.postgresql.org/docs/current/storage-page-layout....</a>Edit: And of course you should trust the parallel comment by anarazel to be the correct answer to this :)
- anarazel54 days ago
 It's true - otherwise the space couldn't freely be reused, because the gaps for the vacuumed tuples wouldn't allow for any larger tuples to be inserted.See <a href="https://github.com/postgres/postgres/blob/b853e644d78d99ef1779c9bf7bc3944694460aa2/src/backend/storage/page/bufpage.c#L682" rel="nofollow">https://github.com/postgres/postgres/blob/b853e644d78d99ef17...</a>
supermatt54 days ago
> When VACUUM runs, it removes those dead tuples and compacts the remaining rows within each page.No it doesn’t. It just removes unused line pointers and marks the space as free in the FSM.
- anarazel54 days ago
 > > When VACUUM runs, it removes those dead tuples and compacts the remaining rows within each page.> No it doesn’t. It just removes unused line pointers and marks the space as free in the FSM.It does:<a href="https://github.com/postgres/postgres/blob/b853e644d78d99ef1779c9bf7bc3944694460aa2/src/backend/storage/page/bufpage.c#L682" rel="nofollow">https://github.com/postgres/postgres/blob/b853e644d78d99ef17...</a>Which is executed as part of vacuum.
 - supermatt54 days ago
 I guess I stand corrected - I thought that was the purpose behind VACUUM FULL
 - anarazel54 days ago
 VACUUM FULL is about cleanup up things above the level of a single page. Moving stuff around within a page doesn't allow you to reclaim space on the OS level, nor does it "compact" tuples onto fewer pages.But it's important for normal vacuum to compact the tuples on the page, otherwise the space of deleted tuples couldn't effectively be reused. Imagine a page that's entirely filled with 100 byte tuples, then every other tuple is deleted. Then, after a vacuum, a single 108 byte tuple should be inserted onto the page. Without compacting the space in the page during the vacuum, there would not be any space for that larger tuple.
YouAreWRONGtoo54 days ago
[dead]