Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fast immutable bitset for better bloom filter #716

Closed
johnynek opened this issue Oct 2, 2019 · 6 comments
Closed

use fast immutable bitset for better bloom filter #716

johnynek opened this issue Oct 2, 2019 · 6 comments

Comments

@johnynek
Copy link
Collaborator

johnynek commented Oct 2, 2019

see: https://github.com/typelevel/cats-collections/blob/master/core/src/main/scala/cats/collections/BitSet.scala

@non and I implemented that to see if we could make a fast, sparse, immutable bitset. We could! In fact, for some operations it is faster than mutable, and for the use case of a Monoid where we have to copy inputs, it is much faster.

We could copy that implementation into Algebird and remove use of the bitset implementation we have now (java ewah), and it would also remove one of the dependencies of core taking us down to 1 (typelevel algebra): https://github.com/twitter/algebird/blob/develop/build.sbt#L258

this would be binary incompatible. We could move the old BloomFilter into algebird.legacy and make the javaewah dep provided so people who don't use it don't pick it up in their build.

If we do this, we should add benchmarks to compare, but my guess is this will actually be faster.

cc @anish749 @regadas @nevillelyh

@anish749
Copy link
Contributor

anish749 commented Oct 4, 2019

I had tried out this implementation from cats and have the code in this branch
It does not have for persistence or moving the old implementation to algebird.legacy

It is indeed much faster.

The benchmark results are here. I did this a few months back, and I would need to check some of the finer details again.


// Immutable with Array copy
[info] Benchmark                                               (falsePositiveRate)  (nbrOfElements)   Mode  Cnt    Score     Error  Units
[info] BloomFilterCreateBenchmark.createBloomFilter                          0.001            10000  thrpt    3  171.513 ± 281.008  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                          0.001            50000  thrpt    3   11.661 ± 175.592  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                           0.01            10000  thrpt    3   36.594 ±  87.057  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                           0.01            50000  thrpt    3   25.702 ± 259.042  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                            0.1            10000  thrpt    3  409.528 ± 540.620  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                            0.1            50000  thrpt    3   76.085 ± 264.660  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                0.001            10000  thrpt    3    3.059 ±   7.600  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                0.001            50000  thrpt    3    0.164 ±   0.177  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                 0.01            10000  thrpt    3    7.157 ±   9.560  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                 0.01            50000  thrpt    3    0.341 ±   0.235  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                  0.1            10000  thrpt    3   26.621 ±  40.213  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                  0.1            50000  thrpt    3    1.200 ±   1.451  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                 0.001            10000  thrpt    3    3.738 ±   6.267  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                 0.001            50000  thrpt    3    0.159 ±   0.150  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                  0.01            10000  thrpt    3    7.476 ±  16.579  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                  0.01            50000  thrpt    3    0.335 ±   0.721  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                   0.1            10000  thrpt    3   24.621 ±  44.696  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                   0.1            50000  thrpt    3    1.295 ±   0.839  ops/s


// Cats Immutable
[info] Benchmark                                               (falsePositiveRate)  (nbrOfElements)   Mode  Cnt    Score     Error  Units
[info] BloomFilterCreateBenchmark.createBloomFilter                          0.001            10000  thrpt    3   44.939 ± 227.744  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                          0.001            50000  thrpt    3    5.687 ±  19.081  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                           0.01            10000  thrpt    3   78.949 ± 154.880  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                           0.01            50000  thrpt    3   11.778 ±   8.267  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                            0.1            10000  thrpt    3  189.331 ±  52.266  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                            0.1            50000  thrpt    3   22.157 ±  20.995  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                0.001            10000  thrpt    3   53.450 ±  55.043  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                0.001            50000  thrpt    3    7.222 ±  15.144  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                 0.01            10000  thrpt    3   72.325 ± 100.853  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                 0.01            50000  thrpt    3   10.576 ±  26.579  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                  0.1            10000  thrpt    3  157.641 ± 351.298  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterAggregator                  0.1            50000  thrpt    3   18.624 ±  11.941  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                 0.001            10000  thrpt    3   58.734 ±  57.416  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                 0.001            50000  thrpt    3    8.304 ±   2.001  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                  0.01            10000  thrpt    3   78.388 ±  98.764  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                  0.01            50000  thrpt    3    5.545 ±  75.888  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                   0.1            10000  thrpt    3  183.115 ± 128.414  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilterUsingFold                   0.1            50000  thrpt    3   20.643 ±  52.748  ops/s

@regadas
Copy link
Collaborator

regadas commented Jan 14, 2020

I'm looking a bit into this ... I'll make a draft PR so we can talk about it and I'll post some numbers as well.

@johnynek you want to keep the old impl in legacy right?

@johnynek
Copy link
Collaborator Author

@regadas Ideally, I really hate to leave people high-and-dry.... So, the most compatibility we can retain the better.

That said, the above numbers seem ambiguous: like, using the immutable bitset was only a win for some cases. It would be nice if we could tweak things so that we always have a win.

We can copy the implementation into this repo, I think, especially if we might want to try some optimizations to improve performance.

One thing that we didn't do with old implementations is try to get the size information at the type-level. I think that is a really good idea since it makes sure the bitsizes are aligned and serializations don't need to write out the size. So, that is something worth exploring.

The two ways to do this are to make a module:

abstract class SizedBloomFilter(bits: Int, hashes: Int) {
  sealed abstract class Bloom ...
 
  implicit val monoid: Monoid[Bloom] = ...
}

val sizedBloom = new SizedBloom(1024, 5)
import sizedBloom.Bloom
// now we can use Bloom here and have a path dependent type

or alternatively

sealed abstract class HasSize[A] {
  def sizeOf: Int
}
object HasSize {
    def apply[A](s: Int): HasSize[A] = new HasSize[A] { def sizeOf = s }
   
    sealed trait Size1
    implicit val size1HasSize: HashSize[Size1] = apply(1)
   
  // how we can do:
    sealed trait Bloom[Bits, Hashes] {
      def size(implicit b: HasSize[Bits]) = b.sizeOf
      ...
    }
}

@regadas
Copy link
Collaborator

regadas commented Jan 19, 2020

Sorry for the late reply @johnynek!

So, the most compatibility we can retain the better.

Totally agree, I moved the current impl into a legacy package. It's also in our best interest to keep since we can run benchmarks against it 😄

the above numbers seem ambiguous: like, using the immutable bitset was only a win for some cases. It would be nice if we could tweak things so that we always have a win.

In the numbers below I think i managed to get it on par with the legacy impl in the cases where it didn't perform that well. Even with this we could potentially go forward with since the BloomFilter is simpler and there's only one BitSet impl. Tested also @anish749 createBloomFilterUsingFold and createBloomFilterAggregator and saw the same improvement (I can add them as well later).

I had to cheat in on case ... sumOption to be fast enough I had to mutate the same BitSet (tbh I think it's ok in this case since it's localized).

I believe there's some other minor things we can optimize where and there I'll need to play around a little bit more. I spotted yours and @non work on improving some of them and there's also some new ones like def ++(a: Array[Int]): BitSet that I was thinking in adding but it's already there (I think I'll just copy it as well 😄 )

[info] Benchmark                                                    (falsePositiveRate)  (nbrOfElements)   Mode  Cnt          Score          Error  Units
[info] BloomFilterCreateBenchmark.createBloomFilter                               0.001              100  thrpt    3      12434.779 ±      357.553  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                               0.001             1000  thrpt    3       1183.123 ±       50.355  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                               0.001            10000  thrpt    3        108.720 ±        2.832  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                                0.01              100  thrpt    3      17978.982 ±      760.818  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                                0.01             1000  thrpt    3       1705.537 ±       44.473  ops/s
[info] BloomFilterCreateBenchmark.createBloomFilter                                0.01            10000  thrpt    3        152.526 ±       28.693  ops/s
[info] BloomFilterDistanceBenchmark.distanceOfDenseVsDense                          N/A              N/A  thrpt    3   41105793.914 ±  5011634.200  ops/s
[info] BloomFilterDistanceBenchmark.distanceOfEmptyVsDense                          N/A              N/A  thrpt    3    7843706.236 ±   468487.379  ops/s
[info] BloomFilterDistanceBenchmark.distanceOfEmptyVsEmpty                          N/A              N/A  thrpt    3  285827731.116 ± 31228610.287  ops/s
[info] BloomFilterDistanceBenchmark.distanceOfEmptyVsSparse                         N/A              N/A  thrpt    3    7900630.518 ±   643684.348  ops/s
[info] BloomFilterDistanceBenchmark.distanceOfSparseVsDense                         N/A              N/A  thrpt    3    1264613.293 ±   185974.448  ops/s
[info] BloomFilterDistanceBenchmark.distanceOfSparseVsSparse                        N/A              N/A  thrpt    3    1208178.559 ±  1140412.698  ops/s
[info] BloomFilterLegacyCreateBenchmark.createBloomFilter                         0.001              100  thrpt    3      11678.810 ±      542.307  ops/s
[info] BloomFilterLegacyCreateBenchmark.createBloomFilter                         0.001             1000  thrpt    3       1166.464 ±       64.033  ops/s
[info] BloomFilterLegacyCreateBenchmark.createBloomFilter                         0.001            10000  thrpt    3        113.420 ±        2.142  ops/s
[info] BloomFilterLegacyCreateBenchmark.createBloomFilter                          0.01              100  thrpt    3      16004.431 ±     2822.803  ops/s
[info] BloomFilterLegacyCreateBenchmark.createBloomFilter                          0.01             1000  thrpt    3       1615.137 ±      234.551  ops/s
[info] BloomFilterLegacyCreateBenchmark.createBloomFilter                          0.01            10000  thrpt    3        160.307 ±       30.518  ops/s
[info] BloomFilterLegacyDistanceBenchmark.distanceOfDenseVsDense                    N/A              N/A  thrpt    3    2725811.932 ±    99564.870  ops/s
[info] BloomFilterLegacyDistanceBenchmark.distanceOfEmptyVsDense                    N/A              N/A  thrpt    3   10708004.851 ±   434568.616  ops/s
[info] BloomFilterLegacyDistanceBenchmark.distanceOfEmptyVsEmpty                    N/A              N/A  thrpt    3  282776646.316 ± 24966839.395  ops/s
[info] BloomFilterLegacyDistanceBenchmark.distanceOfEmptyVsSparse                   N/A              N/A  thrpt    3      60006.340 ±     2724.431  ops/s
[info] BloomFilterLegacyDistanceBenchmark.distanceOfSparseVsDense                   N/A              N/A  thrpt    3      42009.918 ±     2578.536  ops/s
[info] BloomFilterLegacyDistanceBenchmark.distanceOfSparseVsSparse                  N/A              N/A  thrpt    3    5827613.935 ±  2155471.021  ops/s
[info] BloomFilterLegacyQueryBenchmark.queryBloomFilter                           0.001              100  thrpt    3    2791972.056 ±   319797.588  ops/s
[info] BloomFilterLegacyQueryBenchmark.queryBloomFilter                           0.001             1000  thrpt    3    2738206.605 ±   357713.894  ops/s
[info] BloomFilterLegacyQueryBenchmark.queryBloomFilter                           0.001            10000  thrpt    3    2659377.826 ±  2544150.099  ops/s
[info] BloomFilterLegacyQueryBenchmark.queryBloomFilter                            0.01              100  thrpt    3    3877831.480 ±  1068871.706  ops/s
[info] BloomFilterLegacyQueryBenchmark.queryBloomFilter                            0.01             1000  thrpt    3    3860338.377 ±   327106.629  ops/s
[info] BloomFilterLegacyQueryBenchmark.queryBloomFilter                            0.01            10000  thrpt    3    3880580.772 ±   294212.699  ops/s
[info] BloomFilterQueryBenchmark.queryBloomFilter                                 0.001              100  thrpt    3    2781139.309 ±   810642.587  ops/s
[info] BloomFilterQueryBenchmark.queryBloomFilter                                 0.001             1000  thrpt    3    2625591.510 ±   539291.673  ops/s
[info] BloomFilterQueryBenchmark.queryBloomFilter                                 0.001            10000  thrpt    3    2624722.080 ±   405730.511  ops/s
[info] BloomFilterQueryBenchmark.queryBloomFilter                                  0.01              100  thrpt    3    3973993.973 ±   120529.588  ops/s
[info] BloomFilterQueryBenchmark.queryBloomFilter                                  0.01             1000  thrpt    3    3595605.220 ±   239115.046  ops/s
[info] BloomFilterQueryBenchmark.queryBloomFilter                                  0.01            10000  thrpt    3    3349576.523 ±  1495983.818  ops/s

@regadas
Copy link
Collaborator

regadas commented Jan 20, 2020

@johnynek I was looking at this and I didn't actually commented the sized bloom filter;

bitsizes are aligned and serializations don't need to write out the size

These are very good points! it's easy to make a mistake and yeah improving on serialization is always a big plus. I guess we should really explore the idea, but before that I just wanted to see how far we could take the new BitSet and see if we could actually replace the existing one.

@regadas
Copy link
Collaborator

regadas commented May 14, 2021

Closed via #840

@regadas regadas closed this as completed May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants