-
-
Notifications
You must be signed in to change notification settings - Fork 262
PERCENTILE_CONT and PERCENTILE_DISC functions #8807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The current implementation has one drawback. According to the description, the This can be easily implemented by restricting the For examples: WITH
T(N, P) AS (
SELECT 1, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 2, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 3, 1 FROM RDB$DATABASE
)
SELECT
PERCENTILE_DISC(P) WITHIN GROUP(ORDER BY N)
FROM T;The query is considered correct and executed for the first percentile value, although there should be an error here. For example, Postgres explicitly requires that the percentile be included in the GROUP BY.
But this query is correct because the percentile is included in the GROUP BY. WITH
T(N, P) AS (
SELECT 1, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 2, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 3, 1 FROM RDB$DATABASE
)
SELECT
PERCENTILE_DISC(P) WITHIN GROUP(ORDER BY N)
FROM T
GROUP BY P;The same applies to using these functions as a window. Incorrect examlpe: WITH
T(N, P) AS (
SELECT 1, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 2, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 3, 1 FROM RDB$DATABASE
)
SELECT
N, P,
PERCENTILE_DISC(P) WITHIN GROUP(ORDER BY N) OVER()
FROM T;Correct example: WITH
T(N, P) AS (
SELECT 1, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 2, 0.5 FROM RDB$DATABASE
UNION ALL
SELECT 3, 1 FROM RDB$DATABASE
)
SELECT
N, P,
PERCENTILE_DISC(P) WITHIN GROUP(ORDER BY N) OVER(PARTITION BY P)
FROM T;I have no idea how to implement such a check. I would appreciate any tips. |
PERCENTILE_DISC and PERCENTILE_CONT functions
The
PERCENTILE_CONTandPERCENTILE_DISCfunctions are known as inverse distribution functions.These functions operate on an ordered set. Both functions can be used as aggregate or window functions.
PERCENTILE_DISC
PERCENTILE_DISCis an inverse distribution function that assumes a discrete distribution model.It takes a percentile value and a sort specification and returns an element from the set.
Nulls are ignored in the calculation.
Syntax for the
PERCENTILE_DISCfunction as an aggregate function.Syntax for the
PERCENTILE_DISCfunction as an window function.The first argument
<percent>must evaluate to a numeric value between 0 and 1, because it is a percentile value.This expression must be constant within each aggregate group.
The
ORDER BYclause takes a single expression that can be of any type that can be sorted.The function
PERCENTILE_DISCreturns a value of the same type as the argument inORDER BY.For a given percentile value
P,PERCENTILE_DISCsorts the values of the expression in theORDER BYclause andreturns the value with the smallest
CUME_DISTvalue (with respect to the same sort specification)that is greater than or equal to
P.Analytic Example
PERCENTILE_CONT
PERCENTILE_CONTis an inverse distribution function that assumes a continuous distribution model.It takes a percentile value and a sort specification and returns an element from the set.
Nulls are ignored in the calculation.
Syntax for the
PERCENTILE_CONTfunction as an aggregate function.Syntax for the
PERCENTILE_CONTfunction as an window function.The first argument
<percent>must evaluate to a numeric value between 0 and 1, because it is a percentile value.This expression must be constant within each aggregate group.
The
ORDER BYclause takes a single expression, which must be of numeric type to perform interpolation.The
PERCENTILE_CONTfunction returns a value of typeDOUBLE PRECISIONorDECFLOAT(34)depending on the typeof the argument in the
ORDER BYclause. A value of typeDECFLOAT(34)is returned ifORDER BYcontainsan expression of one of the types
INT128,NUMERIC(38, x)orDECFLOAT(16 | 34), otherwise -DOUBLE PRECISION.The result of
PERCENTILE_CONTis computed by linear interpolation between values after ordering them.Using the percentile value (
P) and the number of rows (N) in the aggregation group, you can computethe row number you are interested in after ordering the rows with respect to the sort specification.
This row number (
RN) is computed according to the formulaRN = (1 + (P * (N - 1)).The final result of the aggregate function is computed by linear interpolation between the values from rows
at row numbers
CRN = CEILING(RN)andFRN = FLOOR(RN).Analytic Example
An example of using both aggregate functions