Right, it would be wasteful if it's done for each entry separately. But that is not needed. (I should probably write an article about this and a better implementation).<p>The "succinct counting (blocked) Bloom filters" have two components: the regular Bloom filter for querying, and the count storage. The count storage is not "strictly" needed for querying: you will never get a false _negative_ is increments / decrements in the count storage are deferred. So updating the count storage can be done asynchronously. Both increments and decrements, if you want. So these can be buffered, eg 100 increments / decrements at a time. Sure, the false-positive rate is slightly higher than needed if the decrements are deferred, but with a large filter this effect is small.<p>So that means, you can buffer increments / decrements in the count storage. You still want to do the "or" operations in the the Bloom filter part synchronously, so that you don't get false negatives. And then, it is no longer just 30-40 cycles, but 3000-4000 cycles, or 30'000-40'000 cycles, or so. I understand this would not be trivial to implement, but also not very complex. I never really had a real-world use case so far, so I didn't work on a full implementation yet.
Sure, you can batch. But that means your decrements must be _entirely_ buffered, right? Because you cannot remove anything from the main filter until you know its count is zero, and you don't know its count until you've processed all increments. And you need to do that under a lock, or you'll have a race where you could have a false negative in the main filter for a short while.