4 comments

  • kwillets24 minutes ago
    This is a fairly simple idea of indexing characters for each column&#x2F;offset and compressing the bitmaps. Simple is good, as the overhead of more sophisticated ideas (eg suffix sorting) is often prohibitive.<p>One suggestion is to index the end-of-string as a character as well; then you don&#x27;t need negative offsets. But that turns the suffix search into a wildcard type of thing where you have to try all offsets, which is what the &#x27;%pat%&#x27; searches do already, so maybe it&#x27;s OK.
  • fabian2k1 hour ago
    Looks very interesting. I really like trigram indexes for certain use cases, but those are essentially running an ILIKE %something% on various text content in the DB. So that would fit the described limitations of this index type very well.<p>Usually you&#x27;re quickly steered towards fulltext search (tsvector) in Postgres if you want to do something like that. But depending on what kind of search you actually need, trigram indexes can be a better option. If you don&#x27;t search so much for natural language, but more for specific keywords the stemming in fulltext search can get in the way.<p>One information that would be nice here is a comparison of the index size on disk for both index types.
  • out_of_protocol53 minutes ago
    Any data on index size for big tables? Comparison (with ms&#x2F;megabytes) vs trigram regarding size&#x2F;speed?<p>UPD<p>&gt; Biscuit is 15.0× faster than B-tree (median) and 5.6× faster than Trigram (median)<p>&gt; Trade-off: 3.2× larger index than Trigram, but 5.6× faster queries (median)
  • eatonphil4 days ago
    Noticed Daniel Lemire talking about it and how they use Roaring Bitmaps.<p><a href="https:&#x2F;&#x2F;x.com&#x2F;lemire&#x2F;status&#x2F;2000944944832504025" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;lemire&#x2F;status&#x2F;2000944944832504025</a>