5 comments

  • Sesse__9 minutes ago
    As far as I can see, these classes of filters (including xor filters) have some practical issues for many applications: They can become full (refuse new entries altogether), and they need to know all the elements up-front (no incremental inserts). Is there anything more modern than Bloom filters that don&#x27;t have these restrictions?<p>I&#x27;m especially fond of tiny filters; a well-placed 32- or 64-bit Bloom filter can be surprisingly effective in the right context!
  • dang10 hours ago
    Related prior work:<p><i>Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22742905">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22742905</a> - March 2020 (25 comments)<p><i>Xor Filters: Faster and Smaller Than Bloom Filters</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21840821">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21840821</a> - Dec 2019 (81 comments)
    • less_less4 hours ago
      See also the paper <i>Ribbon filter: practically smaller than Bloom and Xor</i>: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2103.02515" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2103.02515</a>, which is a similar idea though not by the same authors.<p>IIRC, binary fuse filters are faster to construct than ribbon filters, but typically not quite as space-efficient. There are also <i>frayed ribbon filters</i> (by me) which are slower and more complex to construct but more space-efficient. There&#x27;s no paper for those, just a Rust implementation.<p>Ribbon filters are deployed in Mozilla&#x27;s Clubcard for distributing compressed certificate revocation lists: <a href="https:&#x2F;&#x2F;github.com&#x2F;mozilla&#x2F;clubcard" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mozilla&#x2F;clubcard</a> and <a href="https:&#x2F;&#x2F;jmschanck.info&#x2F;papers&#x2F;20250327-clubcard.pdf" rel="nofollow">https:&#x2F;&#x2F;jmschanck.info&#x2F;papers&#x2F;20250327-clubcard.pdf</a>. CRLs are an almost ideal application of this sort of compressed set tech, since the aggregator runs batch jobs and needs to distribute the set to very many clients. It&#x27;s not perfectly ideal because CRLs require frequent updates and none of these methods support delta updates. There is a straightforward but inelegant workaround, which is to send a compressed set that represents the delta, and query both on the client.
  • pastage6 hours ago
    I recommend the zig library [1], it was a joy to use. Bloom filters was one of the first interesting algorithms I did in class back in university, we upgraded hardware during the lab making the use of bloom filters unnecessary in a lab ment to interactively show its usefulness. I have had this repeated since then, these filters are magic until hardware catches up, having smaller filter is lovely.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;hexops&#x2F;fastfilter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;hexops&#x2F;fastfilter</a>
  • djmips11 hours ago
    This feels like a better xor filter implementation.
    • pwagland30 minutes ago
      This is mentioned on the first page of the paper:<p>&gt; Building on theoretical work by Dietzfelbinger and Walzer [8], we propose a novel practical approach, the binary fuse filters. They are conceptually similar to xor filters, and they rely on nearly the same simple code.
    • orlp6 hours ago
      Same author.
  • vgb2k1810 hours ago
    Fast, but not faster than XOR filters. I was wondering if the title was a typo, but the article clarifies they sacrificed some speed for the smaller size.
    • nine_k9 hours ago
      One construction is smaller and faster to build than xor filters, and another is even more compact, though slower:<p>&gt; <i>we build probabilistic filters -- called binary fuse filters -- that are within 13% of the storage lower bound -- without sacrificing query speed. As an additional benefit, the construction of the new binary fuse filters can be more than twice as fast as the construction of xor filters. By slightly sacrificing query speed, we further reduce storage to within 8% of the lower bound.</i>
    • vlovich1237 hours ago
      I think you misread:<p>&gt; that are within 13% of the storage lower bound -- without sacrificing query speed. As an additional benefit, the construction of the new binary fuse filters can be more than twice as fast as the construction of xor filters. By slightly sacrificing query speed, we further reduce storage to within 8% of the lower bound.<p>They are faster to construct (2x) and smaller (within 13% of theoretical limit) while maintaining the same query performance.