Replies: 4 comments 8 replies
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Status: Completed
Author: @locker
Reviewers
Tickets
Pull requests
index:quantile()
#11226Change log
nil
return value means that the range is too small for splitting.Summary
In the scope of this RFC document a new index API method for finding an arbitrary percentile (the median, in particular) in a key interval is proposed. The method can be used for splitting data ranges for load balancing in a range-based sharding approach.
Context
To implement range-based sharding, we need an efficient way to find a median in an indexed key interval so that we can split it for load balancing. Note, we can't just take the average key value because (a) data in the range can be distributed unevenly and (b) computing the average may be technically difficult to implement (consider a compound key with string, uuid, date-time parts, for example).
With the recent improvements of
index:count()
andindex:select()
with offset, we can now efficiently find the median (or any other percentile) of an interval in a memtx tree index, as demonstrated by the following code snippet.Unfortunately, this approach is unacceptable in case of vinyl because it would require linear scanning of the target range, which has a prohibitive cost. Optimizing
index:count()
andindex:select()
with offset from O(N) to O(logN), like we did in memtx, is technically difficult if not unfeasible because vinyl stores data in layers, which have to be merged in order to serve a query. Instead we're planning to introduce a new index API method, which would find a percentile with a reasonable error.Proposed solution
Synopsis
We propose to introduce the new index method
index:quantile()
. The method will take up to three arguments:level
: quantile level; must be a number in range (0, 1); mandatory argument.begin_key
: beginning of the target key range; optional, assumed to be the empty key if unset.end_key
: end of the target key range; optional, assumed to be the empty key if unset.See the "Range definition" section for details on how the range boundary keys are defined.
The function will return the quantile point, i.e. the max key such that the percentage of tuples in the target range which are less than the key doesn't exceed the quantile level * 100%. Note that the function will return a key (given as a Lua table), not a tuple. There's no guarantee that the corresponding tuple exists in the range, see the "Engine differences" section, but the key is guaranteed to be full (have all indexed parts) and belong in the specified range.
Note
We called the new method "quantile", not "percentile", because it seems to be more convenient to pass a fractional value between 0 and 1 rather than a percentage between 0 and 100 (see the explanation of the difference between a quantile and a percentile).
Example usage
Range definition
The range boundary keys (
begin_key
andend_key
) passed toindex:quantile()
will be specified the same way as inindex:select()
. A boundary key won't have to be full (have all indexed parts) - it may be partial. The target data set will be defined as an intersection of the following queries:Thus for full keys, the interval would be defined as starting from
begin_key
inclusive and ending atend_key
exclusive, which is a common way to define an interval. To understand how a range is defined if a boundary key is partial, we need to recall howindex:select()
works in Tarantool. In Tarantool a partial key is assumed to be equal to any full key if it has the same defined parts. For example,{10}
equals both{10, 1}
and{10, 2}
so a range starting at{10}
and ending at{15}
for two-part integer keys would include{10, 1}
,{10, 2}
,{12, 1}
,{14, 2}
, but not{15, 1}
. The empty key{}
is special: when used with thelt
iterator, it is treated asle
. So passing the empty key{}
for both range boundaries will return the quantile point in the whole space.Corner cases
The function will raise an error if any of the following conditions is true:
The function will return
nil
if there's no tuple in the target range. It may also returnnil
if it can't find the quantile point for implementation-specific reasons (for example, if the specified range is less than a page in case of vinyl). If the function is used for splitting a range,nil
means that the target range is too small for splitting.Engine differences
The implementation and guarantees of the new method will be engine-dependent:
index:count()
andindex:select()
with offset. This means that it will return a quantile point without an error and have the O(logN) complexity where N is the number of tuples stored in the index. The function will be available for the tree index only.Initially, other engines won't support the new index method. In future, we may implement it for memcs, which like memtx stores data ordered by key.
Vinyl implementation
Vinyl stores data in layers, where the last (deepest) layer is the base while more recent layers store incremental updates. Within a layer, contiguous non-intersecting data chunks are stored in runs while each run consists of pages. The page index of each run is stored in memory so that we can quickly find the first page matching given search criteria. To look up a tuple within a page, we have to read it from disk because the row index of each page is stored on disk.
The new method will look up the quantile point in the last (deepest) data storage layer, which should be a good enough approximation, because the last layer is the largest one and should reflect the data distribution of the whole index in most common usage scenarios. If there's no disk layers, we will use the same internal functions to look up the quantile point in the in-memory BPS-tree that are used internally in memtx.
To find the quantile point in a disk layer, we will perform the following steps:
Notes:
Beta Was this translation helpful? Give feedback.
All reactions