Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
chisquare_cdf.sqlx	chisquare_cdf.sqlx
corr_pvalue.sqlx	corr_pvalue.sqlx
csv_to_struct.sqlx	csv_to_struct.sqlx
day_occurrence_of_month.sqlx	day_occurrence_of_month.sqlx
degrees.sqlx	degrees.sqlx
find_in_set.sqlx	find_in_set.sqlx
freq_table.sqlx	freq_table.sqlx
from_binary.sqlx	from_binary.sqlx
from_hex.sqlx	from_hex.sqlx
get_array_value.sqlx	get_array_value.sqlx
get_value.sqlx	get_value.sqlx
getbit.sqlx	getbit.sqlx
int.sqlx	int.sqlx
jaccard.sqlx	jaccard.sqlx
json_extract_keys.sqlx	json_extract_keys.sqlx
json_extract_values.sqlx	json_extract_values.sqlx
json_typeof.sqlx	json_typeof.sqlx
kruskal_wallis.sqlx	kruskal_wallis.sqlx
levenshtein.sqlx	levenshtein.sqlx
linear_interpolate.sqlx	linear_interpolate.sqlx
linear_regression.sqlx	linear_regression.sqlx
mannwhitneyu.sqlx	mannwhitneyu.sqlx
median.sqlx	median.sqlx
multiply_full_scale.sqlx	multiply_full_scale.sqlx
nlp_compromise_number.sqlx	nlp_compromise_number.sqlx
nlp_compromise_people.sqlx	nlp_compromise_people.sqlx
normal_cdf.sqlx	normal_cdf.sqlx
p_fisherexact.sqlx	p_fisherexact.sqlx
percentage_change.sqlx	percentage_change.sqlx
percentage_difference.sqlx	percentage_difference.sqlx
pi.sqlx	pi.sqlx
radians.sqlx	radians.sqlx
random_int.sqlx	random_int.sqlx
random_string.sqlx	random_string.sqlx
random_value.sqlx	random_value.sqlx
string_to_struct.sqlx	string_to_struct.sqlx
t_test.sqlx	t_test.sqlx
test_cases.js	test_cases.js
to_binary.sqlx	to_binary.sqlx
to_hex.sqlx	to_hex.sqlx
translate.sqlx	translate.sqlx
ts_gen_keyed_timestamps.sqlx	ts_gen_keyed_timestamps.sqlx
ts_linear_interpolate.sqlx	ts_linear_interpolate.sqlx
ts_session_group.sqlx	ts_session_group.sqlx
ts_slide.sqlx	ts_slide.sqlx
ts_tumble.sqlx	ts_tumble.sqlx
typeof.sqlx	typeof.sqlx
url_decode.sqlx	url_decode.sqlx
url_encode.sqlx	url_encode.sqlx
url_keys.sqlx	url_keys.sqlx
url_param.sqlx	url_param.sqlx
url_parse.sqlx	url_parse.sqlx
url_trim_query.sqlx	url_trim_query.sqlx
week_of_month.sqlx	week_of_month.sqlx
y4md_to_date.sqlx	y4md_to_date.sqlx
zeronorm.sqlx	zeronorm.sqlx

Community UDFs

This directory contains community contributed user-defined functions to extend BigQuery for more specialized usage patterns. Each UDF within this directory will be automatically synchronized to the bqutil project within the fn dataset for reference in queries.

For example, if you'd like to reference the int function within your query, you can reference it like the following:

SELECT bqutil.fn.int(1.684)

UDFs

chisquare_cdf
corr_pvalue
csv_to_struct
day_occurrence_of_month
degrees
find_in_set
freq_table
from_binary
from_hex
get_array_value
getbit
get_value
int
jaccard
json_extract_keys
json_extract_values
json_typeof
kruskal_wallis
last_day
levenshtein
linear_interpolate
linear_regression
mannwhitneyu
median
nlp_compromise_number
nlp_compromise_people
normal_cdf
percentage_change
percentage_difference
pi
pvalue
p_fisherexact
radians
random_int
random_string
random_value
to_binary
to_hex
translate
ts_gen_keyed_timestamps
ts_linear_interpolate
ts_session_group
ts_slide
ts_tumble
t_test
typeof
url_decode
url_encode
url_keys
url_param
url_parse
url_trim_query
week_of_month
y4md_to_date
zeronorm

Documentation

csv_to_struct(strList STRING)

Take a list of comma separated key-value pairs and creates a struct. Input: strList: string that has map in the format a:b,c:d.... Output: struct for the above map.

WITH test_cases AS (
  SELECT NULL as s
  UNION ALL
  SELECT '' as s
  UNION ALL
  SELECT ',' as s
  UNION ALL
  SELECT ':' as s
  UNION ALL
  SELECT 'a:b' as s
  UNION ALL
  SELECT 'a:b,c:d' as s
  UNION ALL
  SELECT 'a:b' as s
)
SELECT key, value from test_cases as t, UNNEST(bqutil.fn.csv_to_struct(t.s)) s;

results:

key	value
a	b
a	b
c	d
a	b

day_occurrence_of_month(date_expression ANY TYPE)

Returns the nth occurrence of the weekday in the month for the specified date. The result is an INTEGER value between 1 and 5.

SELECT
  bqutil.fn.day_occurrence_of_month(DATE '2020-07-01'),
  bqutil.fn.day_occurrence_of_month(DATE '2020-07-08');

1 2

degrees(x ANY TYPE)

Convert radians values into degrees.

SELECT bqutil.fn.degrees(3.141592653589793) is_this_pi

180.0

find_in_set(str STRING, strList STRING)

Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas. For example, find_in_set('ab', 'abc,b,ab,c,def') returns 3. Input: str: string to search for. strList: string in which to search. Output: Position of str in strList

WITH test_cases AS (
  SELECT 'ab' as str, 'abc,b,ab,c,def' as strList
  UNION ALL
  SELECT 'ab' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
  UNION ALL
  SELECT 'mobile' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
  UNION ALL
  SELECT 'mobile,' as str, 'mobile,tablet,mobile/tablet,phone,text' as strList
)
SELECT bqutil.fn.find_in_set(str, strList) from test_cases

results:

f0_
3
NULL
1
0

freq_table(arr ANY TYPE)

Construct a frequency table (histogram) of an array of elements. Frequency table is represented as an array of STRUCT(value, freq)

SELECT bqutil.fn.freq_table([1,2,1,3,1,5,1000,5]) ft

results:

Row	ft.value	ft.freq
1	1	3
	2	1
	3	1
	5	2
	1000	1

from_binary(value STRING)

Returns a number in decimal form from its binary representation.

SELECT
  bqutil.fn.to_binary(x) AS binary,
  bqutil.fn.from_binary(bqutil.fn.to_binary(x)) AS x
FROM
  UNNEST([1, 123456, 9876543210, -1001]) AS x;

results:

binary	x
0000000000000000000000000000000000000000000000000000000000000001	1
0000000000000000000000000000000000000000000000011110001001000000	123456
0000000000000000000000000000001001001100101100000001011011101010	9876543210
1111111111111111111111111111111111111111111111111111110000010111	-1001

from_hex(value STRING)

Returns a number in decimal form from its hexadecimal representation.

SELECT
  bqutil.fn.to_hex(x) AS hex,
  bqutil.fn.from_hex(bqutil.fn.to_hex(x)) AS x
FROM
  UNNEST([1, 123456, 9876543210, -1001]) AS x;

results:

hex	x
0000000000000001	1
000000000001e240	123456
000000024cb016ea	9876543210
fffffffffffffc17	-1001

get_array_value(k STRING, arr ANY TYPE)

Given a key and a map, returns the ARRAY type value. This is same as get_value except it returns an ARRAY type. This can be used when the map has multiple values for a given key.

WITH test AS (
  SELECT ARRAY(
    SELECT STRUCT('a' AS key, 'aaa' AS value) AS s
    UNION ALL
    SELECT STRUCT('b' AS key, 'bbb' AS value) AS s
    UNION ALL
    SELECT STRUCT('a' AS key, 'AAA' AS value) AS s
    UNION ALL
    SELECT STRUCT('c' AS key, 'ccc' AS value) AS s
  ) AS a
)
SELECT bqutil.fn.get_array_value('b', a), bqutil.fn.get_array_value('a', a), bqutil.fn.get_array_value('c', a) from test;

results:

f0_	f1_	f2_
["bbb"]	["aaa","AAA"]	["ccc"]

getbit(target_arg INT64, target_bit_arg INT64)

Given an INTEGER value, returns the value of a bit at a specified position. The position of the bit starts from 0.

SELECT bqutil.fn.getbit(23, 2), bqutil.fn.getbit(23, 3), bqutil.fn.getbit(null, 1)

1 0 NULL

get_value(k STRING, arr ANY TYPE)

Given a key and a list of key-value maps in the form [{'key': 'a', 'value': 'aaa'}], returns the SCALAR type value.

WITH test AS (
  SELECT ARRAY(
    SELECT STRUCT('a' AS key, 'aaa' AS value) AS s
    UNION ALL
    SELECT STRUCT('b' AS key, 'bbb' AS value) AS s
    UNION ALL
    SELECT STRUCT('c' AS key, 'ccc' AS value) AS s
  ) AS a
)
SELECT bqutil.fn.get_value('b', a), bqutil.fn.get_value('a', a), bqutil.fn.get_value('c', a) from test;

results:

f0_	f1_	f2_
bbb	aaa	ccc

int(v ANY TYPE)

Convience wrapper which can be used to convert values to integers in place of the native CAST(x AS INT64).

SELECT bqutil.fn.int(1) int1
  , bqutil.fn.int(2.5) int2
  , bqutil.fn.int('7') int3
  , bqutil.fn.int('7.8') int4

1, 2, 7, 7

Note that CAST(x AS INT64) rounds the number, while this function truncates it. In many cases, that's the behavior users expect.

jaccard()

Accepts two string and returns the distance using Jaccard algorithm.

SELECT
       bqutil.fn.jaccard('thanks', 'thaanks'),
       bqutil.fn.jaccard('thanks', 'thanxs'),
       bqutil.fn.jaccard('bad demo', 'abd demo'),
       bqutil.fn.jaccard('edge case', 'no match'),
       bqutil.fn.jaccard('Special. Character?', 'special character'),
       bqutil.fn.jaccard('', ''),
1, 0.71, 1.0, 0.25, 0.67, 0.0

json_extract_keys()

Returns all keys in the input JSON as an array of string

SELECT bqutil.fn.json_extract_keys(
  '{"foo" : "cat", "bar": "dog", "hat": "rat"}'
) AS keys_array

foo
bar
hat

json_extract_values()

Returns all values in the input JSON as an array of string

SELECT bqutil.fn.json_extract_values(
  '{"foo" : "cat", "bar": "dog", "hat": "rat"}'
) AS keys_array

cat
dog
rat

json_typeof(json string)

Returns the type of JSON value. It emulates json_typeof of PostgreSQL.

SELECT
       bqutil.fn.json_typeof('{"foo": "bar"}'),
       bqutil.fn.json_typeof(TO_JSON_STRING(("foo", "bar"))),
       bqutil.fn.json_typeof(TO_JSON_STRING([1,2,3])),
       bqutil.fn.json_typeof(TO_JSON_STRING("test")),
       bqutil.fn.json_typeof(TO_JSON_STRING(123)),
       bqutil.fn.json_typeof(TO_JSON_STRING(TRUE)),
       bqutil.fn.json_typeof(TO_JSON_STRING(FALSE)),
       bqutil.fn.json_typeof(TO_JSON_STRING(NULL)),

object, array, string, number, boolean, boolean, null

levenshtein(source STRING, target STRING) RETURNS INT64

Returns an integer number indicating the degree of similarity between two strings (0=identical, 1=single character difference, etc.)

SELECT
  source,
  target,
  bqutil.fn.levenshtein(source, target) distance,
FROM UNNEST([
  STRUCT('analyze' AS source, 'analyse' AS target),
  STRUCT('opossum', 'possum'),
  STRUCT('potatoe', 'potatoe'),
  STRUCT('while', 'whilst'),
  STRUCT('aluminum', 'alumininium'),
  STRUCT('Connecticut', 'CT')
]);

Row	source	target	distance
1	analyze	analyse	1
2	opossum	possum	1
3	potatoe	potatoe	0
4	while	whilst	2
5	aluminum	alumininium	3
6	Connecticut	CT	10

This function is based on the Levenshtein distance algorithm which determines the minimum number of single-character edits (insertions, deletions or substitutions) required to change one source string into another target one.

linear_interpolate(pos INT64, prev STRUCT<x INT64, y FLOAT64>, next STRUCT<x INT64, y FLOAT64>)

Interpolate the current positions value from the preceding and folllowing coordinates

SELECT
  bqutil.fn.linear_interpolate(2, STRUCT(0 AS x, 0.0 AS y), STRUCT(10 AS x, 10.0 AS y)),
  bqutil.fn.linear_interpolate(2, STRUCT(0 AS x, 0.0 AS y), STRUCT(20 AS x, 10.0 AS y))

results:

f0_	f1_
2.0	1.0

median(arr ANY TYPE)

Get the median of an array of numbers.

SELECT bqutil.fn.median([1,1,1,2,3,4,5,100,1000]) median_1
  , bqutil.fn.median([1,2,3]) median_2
  , bqutil.fn.median([1,2,3,4]) median_3

3.0, 2.0, 2.5

nlp_compromise_number(str STRING)

Parse numbers from text.

SELECT bqutil.fn.nlp_compromise_number('one hundred fifty seven')
  , bqutil.fn.nlp_compromise_number('three point 5')
  , bqutil.fn.nlp_compromise_number('2 hundred')
  , bqutil.fn.nlp_compromise_number('minus 8')
  , bqutil.fn.nlp_compromise_number('5 million 3 hundred 25 point zero 1')

157, 3.5, 200, -8, 5000325.01

nlp_compromise_people(str STRING)

Extract names out of text.

SELECT bqutil.fn.nlp_compromise_people(
  "hello, I'm Felipe Hoffa and I work with Elliott Brossard - who thinks Jordan Tigani will like this post?"
) names

["felipe hoffa", "elliott brossard", "jordan tigani"]

percentage_change(val1 FLOAT64, val2 FLOAT64)

Calculate the percentage change (increase/decrease) between two numbers.

SELECT bqutil.fn.percentage_change(0.2, 0.4)
  , bqutil.fn.percentage_change(5, 15)
  , bqutil.fn.percentage_change(100, 50)
  , bqutil.fn.percentage_change(-20, -45)

results:

f0_	f1_	f2_	f3_
1.0	2.0	-0.5	-1.125

percentage_difference(val1 FLOAT64, val2 FLOAT64)

Calculate the percentage difference between two numbers.

SELECT bqutil.fn.percentage_difference(0.2, 0.8)
  , bqutil.fn.percentage_difference(4.0, 12.0)
  , bqutil.fn.percentage_difference(100, 200)
  , bqutil.fn.percentage_difference(1.0, 1000000000)

results:

f0_	f1_	f2_	f3_
1.2	1.0	0.6667	2.0

pi()

Returns the value of pi.

SELECT bqutil.fn.pi() this_is_pi

3.141592653589793

radians(x ANY TYPE)

Convert degree values into radian.

SELECT bqutil.fn.radians(180) is_this_pi

3.141592653589793

random_int(min ANY TYPE, max ANY TYPE)

Generate random integers between the min and max values.

SELECT bqutil.fn.random_int(0,10) randint, COUNT(*) c
FROM UNNEST(GENERATE_ARRAY(1,1000))
GROUP BY 1
ORDER BY 1

random_value(arr ANY TYPE)

Returns a random value from an array.

SELECT
  bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe']),
  bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe']),
  bqutil.fn.random_value(['tino', 'jordan', 'julie', 'elliott', 'felipe'])

'tino', 'julie', 'jordan'

to_binary(x INT64)

Returns a binary representation of a number.

SELECT
  x,
  bqutil.fn.to_binary(x) AS binary
FROM
  UNNEST([1, 123456, 9876543210, -1001]) AS x;

results:

x	binary
1	0000000000000000000000000000000000000000000000000000000000000001
123456	0000000000000000000000000000000000000000000000011110001001000000
9876543210	0000000000000000000000000000001001001100101100000001011011101010
-1001	1111111111111111111111111111111111111111111111111111110000010111

to_hex(x INT64)

Returns a hexadecimal representation of a number.

SELECT
  x,
  bqutil.fn.to_hex(x) AS hex
FROM
  UNNEST([1, 123456, 9876543210, -1001]) AS x;

results:

x	hex
1	0000000000000001
123456	000000000001e240
9876543210	000000024cb016ea
-1001	fffffffffffffc17

random_string(length INT64)

Returns a random string of specified length. Individual characters are chosen uniformly at random from the following pool of characters: 0-9, a-z, A-Z.

SELECT
  bqutil.fn.random_string(5),
  bqutil.fn.random_string(7),
  bqutil.fn.random_string(10)

'mb3AP' 'aQG5XYB' '0D5WFVQuq6'

translate(expression STRING, characters_to_replace STRING, characters_to_substitute STRING)

For a given expression, replaces all occurrences of specified characters with specified substitutes. Existing characters are mapped to replacement characters by their positions in the characters_to_replace and characters_to_substitute arguments. If more characters are specified in the characters_to_replace argument than in the characters_to_substitute argument, the extra characters from the characters_to_replace argument are omitted in the return value.

SELECT bqutil.fn.translate('mint tea', 'inea', 'osin')

most tin

ts_gen_keyed_timestamps(keys ARRAY, tumble_seconds INT64, min_ts TIMESTAMP, max_ts TIMESTAMP)

Generate a timestamp array associated with each key

SELECT *
FROM
  UNNEST(bqutil.fn.ts_gen_keyed_timestamps(['abc', 'def'], 60, TIMESTAMP '2020-01-01 00:30:00', TIMESTAMP '2020-01-01 00:31:00))

series_key	tumble_val
abc	2020-01-01 00:30:00 UTC
def	2020-01-01 00:30:00 UTC
abc	2020-01-01 00:31:00 UTC
def	2020-01-01 00:31:00 UTC

ts_linear_interpolate(pos TIMESTAMP, prev STRUCT<x TIMESTAMP, y FLOAT64>, next STRUCT<x TIMESTAMP, y FLOAT64>)

Interpolate the positions value using timestamp seconds as the x-axis

select bqutil.fn.ts_linear_interpolate(
  TIMESTAMP '2020-01-01 00:30:00',
  STRUCT(TIMESTAMP '2020-01-01 00:29:00' AS x, 1.0 AS y),
  STRUCT(TIMESTAMP '2020-01-01 00:31:00' AS x, 3.0 AS y)
)

f0_
2.0

ts_session_group(row_ts TIMESTAMP, prev_ts TIMESTAMP, session_gap INT64)

Function to compare two timestamp as being within the same session window. A timestamp in the same session window as its previous timestamp will evaluate as NULL, otherwise the current row's timestamp is returned. The "LAST_VALUE(ts IGNORE NULLS)" window function can then be used to stamp all rows with the starting timestamp for the session window.

--5 minute (300 seconds) session window
WITH ticks AS (
  SELECT 'abc' as key, 1.0 AS price, CAST('2020-01-01 01:04:59 UTC' AS TIMESTAMP) AS ts
  UNION ALL
  SELECT 'abc', 2.0, CAST('2020-01-01 01:05:00 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 'abc', 3.0, CAST('2020-01-01 01:05:01 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 'abc', 4.0, CAST('2020-01-01 01:09:01 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 'abc', 5.0, CAST('2020-01-01 01:24:01 UTC' AS TIMESTAMP)
)
SELECT
  * EXCEPT(session_group),
  LAST_VALUE(session_group IGNORE NULLS)
    OVER (PARTITION BY key ORDER BY ts ASC) AS session_group
FROM (
  SELECT
    *,
    bqutil.fn.ts_session_group(
      ts,
      LAG(ts) OVER (PARTITION BY key ORDER BY ts ASC),
      300
    ) AS session_group
  FROM ticks
)

key	price	ts	sesssion_group
abc	1.0	2020-01-01 01:04:59 UTC	2020-01-01 01:04:59 UTC
abc	2.0	2020-01-01 01:05:00 UTC	2020-01-01 01:04:59 UTC
abc	3.0	2020-01-01 01:05:01 UTC	2020-01-01 01:04:59 UTC
abc	4.0	2020-01-01 01:09:01 UTC	2020-01-01 01:04:59 UTC
abc	5.0	2020-01-01 01:24:01 UTC	2020-01-01 01:24:01 UTC

ts_slide(ts TIMESTAMP, period INT64, duration INT64)

Calculate the sliding windows the ts parameter belongs to.

-- show a 15 minute window every 5 minutes and a 15 minute window every 10 minutes
WITH ticks AS (
  SELECT 1.0 AS price, CAST('2020-01-01 01:04:59 UTC' AS TIMESTAMP) AS ts
  UNION ALL
  SELECT 2.0, CAST('2020-01-01 01:05:00 UTC' AS TIMESTAMP)
  UNION ALL
  SELECT 3.0, CAST('2020-01-01 01:05:01 UTC' AS TIMESTAMP)
)
SELECT
  price,
  ts,
  bqutil.fn.ts_slide(ts, 300, 900) as _5_15,
  bqutil.fn.ts_slide(ts, 600, 900) as _10_15,
FROM ticks

price	ts	_5_15.window_start	_5_15.window_end	_5_15.window_start	_5_15.window_end
1.0	2020-01-01 01:04:59 UTC	2020-01-01 00:50:00 UTC	2020-01-01 01:05:00 UTC	2020-01-01 00:50:00 UTC	2020-01-01 01:05:00 UTC
		2020-01-01 00:55:00 UTC	2020-01-01 01:10:00 UTC	2020-01-01 01:00:00 UTC	2020-01-01 01:15:00 UTC
		2020-01-01 01:00:00 UTC	2020-01-01 01:15:00 UTC
2.0	2020-01-01 01:05:00 UTC	2020-01-01 00:55:00 UTC	2020-01-01 01:10:00 UTC	2020-01-01 01:00:00 UTC	2020-01-01 01:15:00 UTC
		2020-01-01 01:00:00 UTC	2020-01-01 01:15:00 UTC
		2020-01-01 01:05:00 UTC	2020-01-01 01:20:00 UTC
3.0	2020-01-01 01:05:01 UTC	2020-01-01 00:55:00 UTC	2020-01-01 01:10:00 UTC	2020-01-01 01:00:00 UTC	2020-01-01 01:15:00 UTC
		2020-01-01 01:00:00 UTC	2020-01-01 01:15:00 UTC
		2020-01-01 01:05:00 UTC	2020-01-01 01:20:00 UTC

ts_tumble(input_ts TIMESTAMP, tumble_seconds INT64)

Calculate the tumbling window the input_ts belongs in

SELECT
  fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 900) AS min_15,
  fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 600) AS min_10,
  fn.ts_tumble(TIMESTAMP '2020-01-01 00:17:30', 60) As min_1

min_15	min_10
2020-01-01 00:15:00 UTC	2020-01-01 00:10:00 UTC	2020-01-01 00:17:00 UTC

typeof(input ANY TYPE)

Return the type of input or 'UNKNOWN' if input is unknown typed value.

SELECT
  bqutil.fn.typeof(""),
  bqutil.fn.typeof(b""),
  bqutil.fn.typeof(1.0),
  bqutil.fn.typeof(STRUCT()),

STRING, BINARY, FLOAT64, STRUCT

url_decode(text STRING, method STRING)

Return decoded string of inputs "text" in "method" function.

SELECT NULL as method, bqutil.fn.url_decode("https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A", NULL) as value
UNION ALL SELECT "decodeURIComponent" as method, bqutil.fn.url_encode("https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A", "decodeURIComponent") as value
UNION ALL SELECT "decodeURI" as method, bqutil.fn.url_decode("https://example.com/?id=%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A", "decodeURI") as value
UNION ALL SELECT "unescape" as method, bqutil.fn.url_decode("https%3A//example.com/%3Fid%3D%u3042%u3044%u3046%u3048%u304A", "unescape") as value

method	value
NULL	https://example.com/?id=あいうえお
decodeURIComponent	https://example.com/?id=あいうえお
decodeURI	https://example.com/?id=あいうえお
unescape	https://example.com/?id=あいうえお

url_encode(text STRING, method STRING)

Return encoded string of inputs "text" in "method" function.

SELECT NULL as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", NULL) as value
UNION ALL SELECT "encodeURIComponent" as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", "encodeURIComponent") as value
UNION ALL SELECT "encodeURI" as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", "encodeURI") as value
UNION ALL SELECT "escape" as method, bqutil.fn.url_encode("https://example.com/?id=あいうえお", "escape") as value

method	value
NULL	https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A
encodeURIComponent	https%3A%2F%2Fexample.com%2F%3Fid%3D%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A
encodeURI	https://example.com/?id=%E3%81%82%E3%81%84%E3%81%86%E3%81%88%E3%81%8A
escape	https%3A//example.com/%3Fid%3D%u3042%u3044%u3046%u3048%u304A

url_keys(query STRING)

Get an array of url param keys.

SELECT bqutil.fn.url_keys(
  'https://www.google.com/search?q=bigquery+udf&client=chrome')

["q", "client"]

url_param(query STRING, p STRING)

Get the value of a url param key.

SELECT bqutil.fn.url_param(
  'https://www.google.com/search?q=bigquery+udf&client=chrome', 'client')

"chrome"

url_parse(urlString STRING, partToExtract STRING)

Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL For example, url_parse('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'.

WITH urls AS (
  SELECT 'http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1' as url
  UNION ALL
  SELECT 'rpc://facebook.com/' as url
)
SELECT bqutil.fn.url_parse(url, 'HOST'), bqutil.fn.url_parse(url, 'PATH'), bqutil.fn.url_parse(url, 'QUERY'), bqutil.fn.url_parse(url, 'REF'), bqutil.fn.url_parse(url, 'PROTOCOL') from urls

results:

f0_	f1_	f2_	f3_	f4_
facebook.com	path1/p.php	k1=v1&k2=v2#Ref1	Ref1	http
facebook.com	NULL	NULL	NULL	rpc

url_trim_query(url STRING, keys_to_trim ARRAY)

Returns a URL with specified keys removed from the URL's query component. The keys to be removed are provided as an ARRAY input argument.

SELECT bqutil.fn.url_trim_query(
  "https://www.example.com/index.html?goods_id=G1002&utm_id=ads&gclid=abc123",
  ["utm_id", "gclid"]
)
UNION ALL SELECT bqutil.fn.url_trim_query(
  "https://www.example.com/index.html?goods_id=G1002&utm_id=ads&gclid=abc123",
  ["utm_id", "gclid", "goods_id"]
)

results:

f0_
https://www.example.com/index.html?goods_id=G1002
https://www.example.com/index.html

week_of_month(date_expression ANY TYPE)

Returns the number of weeks from the beginning of the month to the specified date. The result is an INTEGER value between 1 and 5, representing the nth occurrence of the week in the month. The value 0 means the partial week.

SELECT
  bqutil.fn.week_of_month(DATE '2020-07-01'),
  bqutil.fn.week_of_month(DATE '2020-07-08');

0 1

y4md_to_date(y4md STRING)

Convert a STRING formatted as a YYYYMMDD to a DATE

SELECT bqutil.fn.y4md_to_date('20201220')

"2020-12-20"

zeronorm(x ANY TYPE, meanx FLOAT64, stddevx FLOAT64)

Normalize a variable so that it has zero mean and unit variance.

with r AS (
  SELECT 10 AS x
  UNION ALL SELECT 20
  UNION ALL SELECT 30
  UNION ALL SELECT 40
  UNION ALL SELECT 50
),
stats AS (
  SELECT AVG(x) AS meanx, STDDEV(x) AS stddevx
  FROM r
)
SELECT x, bqutil.fn.zeronorm(x, meanx, stddevx) AS zeronorm
FROM r, stats;

returns:

Row	x	zeronorm
1	10	-12.649110640673518
2	20	-6.324555320336759
3	30	0.0
4	40	6.324555320336759
5	50	12.649110640673518

StatsLib: Statistical UDFs

This section details the subset of community contributed user-defined functions that extend BigQuery and enable more specialized Statistical Analysis usage patterns. Each UDF detailed below will be automatically synchronized to the fn dataset within the bqutil project for reference in your queries.

For example, if you'd like to reference the int function within your query, you can reference it like the following:

SELECT bqutil.fn.int(1.684)

Documentation

corr_pvalue(r FLOAT64, n INT64)

The returns the p value of the computed correlation coefficient based on the t-distribution. Input: r: correlation value. n: number of samples. Output: The p value of the correlation coefficient.

WITH test_cases AS (
    SELECT  0.9 AS r, 25 n
    UNION ALL
    SELECT -0.5, 40
    UNION ALL
    SELECT 1.0, 50
    UNION ALL
    SELECT -1.0, 50
)
SELECT bqutil.fn.corr_pvalue(r,n) AS p
FROM test_cases

results:

p
1.443229117741041E-9
0.0010423414457657223
0.0
0.0

kruskal_wallis(ARRAY(STRUCT(factor STRING, val FLOAT64))

Takes an array of struct where each struct (point) represents a measurement, with a group label and a measurement value

The Kruskal–Wallis test by ranks, Kruskal–Wallis H test (named after William Kruskal and W. Allen Wallis), or one-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

Input: array: struct <factor STRING, val FLOAT64>
Output: struct<H FLOAT64, p-value FLOAT64, DOF FLOAT64>

DECLARE data ARRAY<STRUCT<factor STRING, val FLOAT64>>;

set data = [
('a',1.0),
('b',2.0),
('c',2.3),
('a',1.4),
('b',2.2),
('c',5.5),
('a',1.0),
('b',2.3),
('c',2.3),
('a',1.1),
('b',7.2),
('c',2.8)
];


SELECT `bqutil.fn.kruskal_wallis`(data) AS results;

results:

results.H	results.p	results.DoF
3.4230769	0.1805877	2

linear_regression(ARRAY(STRUCT(STRUCT(X FLOAT64, Y FLOAT64))

Takes an array of STRUCT X, Y and returns a, b, r where Y = a*X + b, and r is the "goodness of fit measure.

The Linear Regression, is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).

Input: array: struct <X FLOAT64, Y FLOAT64>
Output: struct<a FLOAT64,b FLOAT64, r FLOAT64>

DECLARE data ARRAY<STRUCT<X STRING, Y FLOAT64>>;
set data = [ (5.1,2.5), (5.0,2.0), (5.7,2.6), (6.0,2.2), (5.8,2.6), (5.5,2.3), (6.1,2.8), (5.5,2.5), (6.4,3.2), (5.6,3.0)];
SELECT `bqutils.fn.linear_regression`(data) AS results;

results:

results.a	results.b	results.r
-0.4353361094588436	0.5300416418798544	0.632366563565354

pvalue(H FLOAT64, dof FLOAT64)

Takes H and dof and returns p probability value.

The chisquare_cdf is NULL Hypothesis probability of the Kruskal-Wallis (KW) test. This is obtained to be the CDF of the chisquare with the H value and the Degrees of Freedom (dof) of the KW problem.

Input: H FLOAT64, dof FLOAT64
Output: p FLOAT64

SELECT `bqutils.fn.chisquare_cdf`(.3,2) AS results;

results:

results
0.8607079764250578

p_fisherexact(a FLOAT64, b FLOAT64, c FLOAT64, d FLOAT64)

Computes the p value of the Fisher exact test (https://en.wikipedia.org/wiki/Fisher%27s_exact_test), implemented in JavaScript.

Input: a,b,c,d : values of 2x2 contingency table ([ [ a, b ] ;[ c , d ] ] (type FLOAT64).
Output: The p value of the test (type: FLOAT64)

Example

WITH mydata as (
SELECT
    90.0        as a,
    27.0        as b,
    17.0        as c,
    50.0  as d
)
SELECT
    `bqutils.fn.p_fisherexact`(a,b,c,d) as pvalue
FROM
   mydata

Output:

pvalue
8.046828829103659E-12

mannwhitneyu(x ARRAY, y ARRAY, alt STRING)

Computes the U statistics and the p value of the Mann–Whitney U test (https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test). This test is also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test

Input: x,y :arrays of samples, both should be one-dimensional (type: ARRAY ), alt: defines the alternative hypothesis, the following options are available: 'two-sided', 'less', and 'greater'.
Output: structure of the type struct<U FLOAT64, p FLOAT64> where U is the statistic and p is the p value of the test.

Example

WITH mydata AS (
  SELECT
    [2, 4, 6, 2, 3, 7, 5, 1.] AS x,
    [8, 10, 11, 14, 20, 18, 19, 9. ] AS y
)
SELECT `bqutils.fn.mannwhitneyu`(y, x, 'two-sided') AS test
FROM mydata

Output:

test.U	test.p
0.0	9.391056991171487E-4

t_test(ARRAY,ARRAY)

Runs the Student's T-test. Well known test to compare populations. Example taken from here: Sample

Sample Query:

DECLARE pop1 ARRAY<FLOAT64>;
DECLARE pop2 ARRAY<FLOAT64>;

SET pop1 = [13.3,6.0,20.0,8.0,14.0,19.0,18.0,25.0,16.0,24.0,15.0,1.0,15.0];
SET pop2 = [22.0,16.0,21.7,21.0,30.0,26.0,12.0,23.2,28.0,23.0] ;

SELECT `bqutils.fn.t_test`(pop1, pop2) AS actual_result_rows;

Results:

Row	actual_result_rows.t_value	actual_result_rows.dof
1	2.8957935572829476	21

normal_cdf(x FLOAT64, mean FLOAT64, stdev FLOAT64)

Returns the value of x in the cdf of the Normal distribution with parameters mean and std (standard deviation).

Sample Query:

SELECT `bqutils.fn.normal_cdf`(1.1, 1.7, 2.0) as normal_cdf;

Results:

Row	normal_cdf
1	0.3820885778110474

FilesExpand file tree

community

Directory actions

More options

Directory actions

More options

Latest commit

History

community

Folders and files

parent directory

Community UDFs

UDFs

Documentation

StatsLib: Statistical UDFs

UDFs

Documentation