It provides the various analytics on very large sequence of unsigned integers in constant time.
After adding to Cargo.toml, add this line to lib.rs or main.rs.
extern crate wavelet_matrix;See crate document top for further examples.
-
Overall Performance
- https://github.com/sekineh/wavelet-matrix-rs/blob/master/BENCH.md
- Bench is performed on Intel Xeon 2800 MHz
-
Error evaluation of O(1) SUM
Given an unsigned integer sequence T, it provides the following queries.
.len():- Returns the length of
T. - It's almost no overhead, just returning the stored value.
- Returns the length of
.lookup(pos):- Returns the value at the position of T,
T[pos]. - It's slower than array lookup but still is O(1). You might want to save the original Vector for faster lookup.
- Returns the value at the position of T,
Counting is performed in O(1).
.count(start..end, value):- Returns the number of the element
ewhich satisfiese == valueincluded inT[start..end]
- Returns the number of the element
.count_prefix(start..end, value, ignore_bit):- Returns the number of the element
ewhich satisfiese >> ignore_bit == value >> ignore_bitincluded inT[start..end] - This will be useful for counting the number of IPv4 address that satisfies IPv4 prefix such as
192.168.10.0/24. In this case, the ignore_bit will be 8.
- Returns the number of the element
.count_lt(start..end, value):- Returns the number of the element
ewhich satisfiese < valueincluded inT[start..end]
- Returns the number of the element
.count_gt(start..end, value):- Returns the number of the element
ewhich satisfiese > valueincluded inT[start..end]
- Returns the number of the element
.count_range(start..end, val_start..val_end):- Returns the number of the element
ewhich satisfiesval_start <= e < val_endincluded inT[start..end]
- Returns the number of the element
Searching is performed in O(1) per a next index.
.search(start..end, value):- Returns the iterator that find indexes of the element
ewhich satisfiese == valueincluded inT[start..end]
- Returns the iterator that find indexes of the element
.search_prefix(start..end, value, ignore_bit):- Returns the iterator that find indexes of the element
ewhich satisfiese >> ignore_bit == value >> ignore_bitincluded inT[start..end]
- Returns the iterator that find indexes of the element
- [TODO] implement various conditions other than equal.
Ranking is performed in roughly O(1) with regard to the number of elements n.
.max_k(start..end, val_start..val_end, k):- list the
(value, count)pairs in descending order. - values are constrained to the range
val_start..val_end.
- list the
.min_k(start..end, val_start..val_end, k):- list the
(value, count)pairs in ascending order. - values are constrained to the range
val_start..val_end.
- list the
.top_k() is also performed in O(1) in best case, but may take O(n) in the worst case where every value occurs only once!
.top_k(start..end, val_start..val_end, k):- list the
(value, count)pairs in most-frequent-one-first order. - values are constrained to the range
val_start..val_end. - [TODO] clarify the order of same count.
- list the
To achieve O(1) performance regardless of the number of unique values, use .top_k_ranges() instead:
- [EXPERIMENTAL]
.top_k_ranges(start..end, val_start..val_end, k):- list the
(v_start..v_end, count)pairs in most-frequent-one-first order. - unlike
.top_k(),.top_k_ranges()returns the exhaustive range set that covers all of the values. - values are constrained to the range
val_start..val_end. - [TODO] clarify the order of same count.
- list the
.median(start..end):- Returns the median value from
T[start..end]. - same as
.quantile(start..end, start + (end - start) / 2).
- Returns the median value from
.quantile(start..end, k):- Returns the (k+1)th smallest value from
T[start..end].
- Returns the (k+1)th smallest value from
These methods use .top_k_ranges() to enumerate the most relevant value ranges.
They are not as accurate as the method used in Experiment 2.
- [EXPERIMENTAL]
.sum_experiment1(start..end, val_start..val_end, k):- Approximately calculate the sum of
T[start..end]using up tokwavelet tree nodes. - Only values included in the range
val_start..val_endare processed. - To get the exact result, specify
k = m + 1wheremis the number of values which are unique.
- Approximately calculate the sum of
- [EXPERIMENTAL]
.mean_experiment1(start..end, val_start..val_end, k):- Approximately calculate the average of
T[start..end]using up tokwavelet tree nodes. - Only values included in the range
val_start..val_endare processed. - To get the exact result, specify
k = m + 1wheremis the number of values which are unique.
- Approximately calculate the average of
- [EXPERIMENTAL]
.variance_experiment1(start..end, val_start..val_end, k):- Approximately calculate the variance of
T[start..end]using up tokwavelet tree nodes. - Only values included in the range
val_start..val_endare processed. - To get the exact result, specify
k = m + 1wheremis the number of values which are unique.
- Approximately calculate the variance of
Improvement over Experiment 1. They use custom node enumerator to minimize the error.
- [EXPERIMENTAL]
.sum_experiment2(start..end, val_start..val_end, k):
Improvement over Experiment 2. They use Range<u64> to tell how accurate the computed value is.
- [EXPERIMENTAL]
.sum_experiment3(start..end, val_start..val_end, k):
.rank(pos, value): counts value included inT[0..pos].- Note: pos is exclusive. When pos is 0,
.rank()always returns 0.
- Note: pos is exclusive. When pos is 0,
.select(rank, value): return the position of the (rank+1)-th value- Note: When found nothing, it returns
.len()instead of None.
- Note: When found nothing, it returns
-
Add Benchmark.
- Implement same queries using trivial algorithm
- Compare wm's queries against trivial one.
- Make a nice plot.
-
Profiling
-
Optimize underlying rsdic structure.
-
Add travis CI.
-
Add u128 support or arbitrary-length integer support
-
The fastest implementation on the planet
-
A Go package for myriad array operations using wavelet trees
- https://github.com/hillbig/waveletTree
- Basically, the algorithm is deeply derived from the above Go implementation.
-
excellent slides in Japanese
-
The original inventor's pdf: