聚合函数
编辑聚合函数编辑
用于从一组输入值计算*单个*结果的函数。Elasticsearch SQL 仅在分组(隐式或显式) alongside 支持聚合函数。
通用编辑
AVG
编辑
概要
输入:
输出:double
数值
说明:返回输入值的平均值(算术平均数)。
SELECT AVG(salary) AS avg FROM emp; avg --------------- 48248.55
SELECT AVG(salary / 12.0) AS avg FROM emp; avg --------------- 4020.7125
COUNT
编辑
概要
输入:
字段名称、通配符 ( |
输出:数值
说明:返回输入值的总数(计数)。
SELECT COUNT(*) AS count FROM emp; count --------------- 100
COUNT(ALL)
编辑
概要
输入:
输出:数值
说明:返回所有*非空*输入值的总数(计数)。COUNT(<field_name>)
和 COUNT(ALL <field_name>)
等效。
SELECT COUNT(ALL last_name) AS count_all, COUNT(DISTINCT last_name) count_distinct FROM emp; count_all | count_distinct ---------------+------------------ 100 |96
SELECT COUNT(ALL CASE WHEN languages IS NULL THEN -1 ELSE languages END) AS count_all, COUNT(DISTINCT CASE WHEN languages IS NULL THEN -1 ELSE languages END) count_distinct FROM emp; count_all | count_distinct ---------------+--------------- 100 |6
COUNT(DISTINCT)
编辑
概要
输入:
输出:数值。如果此字段仅包含 null
值,则该函数将返回 null
。否则,该函数将忽略此字段中的 null
值。
说明:返回输入值中*不同的非空*值的总数。
SELECT COUNT(DISTINCT hire_date) unique_hires, COUNT(hire_date) AS hires FROM emp; unique_hires | hires ----------------+--------------- 99 |100
SELECT COUNT(DISTINCT DATE_TRUNC('YEAR', hire_date)) unique_hires, COUNT(DATE_TRUNC('YEAR', hire_date)) AS hires FROM emp; unique_hires | hires ---------------+--------------- 14 |100
FIRST/FIRST_VALUE
编辑
概要
输入:
输出:与输入类型相同
说明:返回按 ordering_field_name
列排序的 field_name
输入列的第一个非 null
值(如果存在)。如果未提供 ordering_field_name
,则仅使用 field_name
列进行排序。例如:
a | b |
---|---|
100 |
1 |
200 |
1 |
1 |
2 |
2 |
2 |
10 |
null |
20 |
null |
null |
null |
SELECT FIRST(a) FROM t
将导致
FIRST(a) |
1 |
和
SELECT FIRST(a, b) FROM t
将导致
FIRST(a, b) |
100 |
SELECT FIRST(first_name) FROM emp; FIRST(first_name) -------------------- Alejandro
SELECT gender, FIRST(first_name) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST(first_name) ------------+-------------------- null | Berni F | Alejandro M | Amabile
SELECT FIRST(first_name, birth_date) FROM emp; FIRST(first_name, birth_date) -------------------------------- Remzi
SELECT gender, FIRST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST(first_name, birth_date) --------------+-------------------------------- null | Lillian F | Sumant M | Remzi
FIRST_VALUE
是名称别名,可以用来代替 FIRST
,例如:
SELECT gender, FIRST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | FIRST_VALUE(first_name, birth_date) --------------+-------------------------------------- null | Lillian F | Sumant M | Remzi
SELECT gender, FIRST_VALUE(SUBSTRING(first_name, 2, 6), birth_date) AS "first" FROM emp GROUP BY gender ORDER BY gender; gender | first ---------------+--------------- null |illian F |umant M |emzi
FIRST
不能在 HAVING 子句中使用。
LAST/LAST_VALUE
编辑
概要
输入:
输出:与输入类型相同
说明:它是 FIRST/FIRST_VALUE
的反义词。返回按 ordering_field_name
列降序排序的 field_name
输入列的最后一个非 null
值(如果存在)。如果未提供 ordering_field_name
,则仅使用 field_name
列进行排序。例如:
a | b |
---|---|
10 |
1 |
20 |
1 |
1 |
2 |
2 |
2 |
100 |
null |
200 |
null |
null |
null |
SELECT LAST(a) FROM t
将导致
LAST(a) |
200 |
和
SELECT LAST(a, b) FROM t
将导致
LAST(a, b) |
2 |
SELECT LAST(first_name) FROM emp; LAST(first_name) ------------------- Zvonko
SELECT gender, LAST(first_name) FROM emp GROUP BY gender ORDER BY gender; gender | LAST(first_name) ------------+------------------- null | Patricio F | Xinglin M | Zvonko
SELECT LAST(first_name, birth_date) FROM emp; LAST(first_name, birth_date) ------------------------------- Hilari
SELECT gender, LAST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | LAST(first_name, birth_date) -----------+------------------------------- null | Eberhardt F | Valdiodio M | Hilari
LAST_VALUE
是名称别名,可以用来代替 LAST
,例如:
SELECT gender, LAST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender; gender | LAST_VALUE(first_name, birth_date) -----------+------------------------------------- null | Eberhardt F | Valdiodio M | Hilari
SELECT gender, LAST_VALUE(SUBSTRING(first_name, 3, 8), birth_date) AS "last" FROM emp GROUP BY gender ORDER BY gender; gender | last ---------------+--------------- null |erhardt F |ldiodio M |lari
LAST
不能在 HAVING
子句中使用。
MAX
编辑
概要
输入:
输出:与输入类型相同
说明:返回字段 field_name
中输入值的最大值。
SELECT MAX(salary) AS max FROM emp; max --------------- 74999
SELECT MAX(ABS(salary / -12.0)) AS max FROM emp; max ----------------- 6249.916666666667
类型为 text
或 keyword
的字段上的 MAX
将转换为 LAST/LAST_VALUE
,因此不能在 HAVING
子句中使用。
MIN
编辑
概要
输入:
输出:与输入类型相同
说明:返回字段 field_name
中输入值的最小值。
SELECT MIN(salary) AS min FROM emp; min --------------- 25324
类型为 text
或 keyword
的字段上的 MIN
将转换为 FIRST/FIRST_VALUE
,因此不能在 HAVING
子句中使用。
SUM
编辑
概要
输入:
输出:整数输入为 bigint
,浮点数为 double
说明:返回字段 field_name
中输入值的总和。
SELECT SUM(salary) AS sum FROM emp; sum --------------- 4824855
SELECT ROUND(SUM(salary / 12.0), 1) AS sum FROM emp; sum --------------- 402071.3
统计编辑
KURTOSIS
编辑
概要
输入:
输出:double
数值
说明:
量化字段 field_name
中输入值的分布形状。
SELECT MIN(salary) AS min, MAX(salary) AS max, KURTOSIS(salary) AS k FROM emp; min | max | k ---------------+---------------+------------------ 25324 |74999 |2.0444718929142986
KURTOSIS
不能用于标量函数或运算符之上,而只能直接用于字段。因此,例如,不允许以下操作,并且会返回错误
SELECT KURTOSIS(salary / 12.0), gender FROM emp GROUP BY gender
MAD
编辑
概要
输入:
输出:double
数值
说明:
测量字段 field_name
中输入值的可变性。
SELECT MIN(salary) AS min, MAX(salary) AS max, AVG(salary) AS avg, MAD(salary) AS mad FROM emp; min | max | avg | mad ---------------+---------------+---------------+--------------- 25324 |74999 |48248.55 |10096.5
SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, AVG(salary/ 12.0) AS avg, MAD(salary / 12.0) AS mad FROM emp; min | max | avg | mad ------------------+-----------------+---------------+----------------- 2110.3333333333335|6249.916666666667|4020.7125 |841.3750000000002
PERCENTILE
编辑
概要
输入:
数字字段。如果此字段仅包含 |
|
数值表达式(必须是常量,而不是基于字段)。如果为 |
|
百分位数算法的可选字符串文字。可能的值: |
|
配置百分位数算法的可选数字文字。为 |
输出:double
数值
说明:
返回字段 field_name
中输入值的第 n 个百分位数(由 numeric_exp
参数表示)。
SELECT languages, PERCENTILE(salary, 95) AS "95th" FROM emp GROUP BY languages; languages | 95th ---------------+----------------- null |74482.4 1 |71122.8 2 |70271.4 3 |71926.0 4 |69352.15 5 |56371.0
SELECT languages, PERCENTILE(salary / 12.0, 95) AS "95th" FROM emp GROUP BY languages; languages | 95th ---------------+------------------ null |6206.866666666667 1 |5926.9 2 |5855.949999999999 3 |5993.833333333333 4 |5779.345833333333 5 |4697.583333333333
SELECT languages, PERCENTILE(salary, 97.3, 'tdigest', 100.0) AS "97.3_TDigest", PERCENTILE(salary, 97.3, 'hdr', 3) AS "97.3_HDR" FROM emp GROUP BY languages; languages | 97.3_TDigest | 97.3_HDR ---------------+-----------------+--------------- null |74720.036 |74992.0 1 |72316.132 |73712.0 2 |71792.436 |69936.0 3 |73326.23999999999|74992.0 4 |71753.281 |74608.0 5 |61176.16000000001|56368.0
PERCENTILE_RANK
编辑
概要
输入:
数字字段。如果此字段仅包含 |
|
数值表达式(必须是常量,而不是基于字段)。如果为 |
|
百分位数算法的可选字符串文字。可能的值: |
|
配置百分位数算法的可选数字文字。为 |
输出:double
数值
说明:
返回字段 field_name
中输入值的第 n 个百分位等级(由 numeric_exp
参数表示)。
SELECT languages, PERCENTILE_RANK(salary, 65000) AS rank FROM emp GROUP BY languages; languages | rank ---------------+----------------- null |73.65766569962062 1 |73.7291625157734 2 |88.88005607010643 3 |79.43662623295829 4 |85.70446389643493 5 |96.79075152940749
SELECT languages, PERCENTILE_RANK(salary/12, 5000) AS rank FROM emp GROUP BY languages; languages | rank ---------------+------------------ null |66.91240875912409 1 |66.70766707667076 2 |84.13266895048271 3 |61.052992625621684 4 |76.55646443990001 5 |94.00696864111498
SELECT languages, ROUND(PERCENTILE_RANK(salary, 65000, 'tdigest', 100.0), 2) AS "rank_TDigest", ROUND(PERCENTILE_RANK(salary, 65000, 'hdr', 3), 2) AS "rank_HDR" FROM emp GROUP BY languages; languages | rank_TDigest | rank_HDR ---------------+---------------+--------------- null |73.66 |80.0 1 |73.73 |73.33 2 |88.88 |89.47 3 |79.44 |76.47 4 |85.7 |83.33 5 |96.79 |95.24
SKEWNESS
编辑
概要
输入:
输出:double
数值
说明:
量化字段 field_name
中输入值的非对称分布。
SELECT MIN(salary) AS min, MAX(salary) AS max, SKEWNESS(salary) AS s FROM emp; min | max | s ---------------+---------------+------------------ 25324 |74999 |0.2707722118423227
SKEWNESS
不能用于标量函数之上,而只能直接用于字段。因此,例如,不允许以下操作,并且会返回错误
SELECT SKEWNESS(ROUND(salary / 12.0, 2), gender FROM emp GROUP BY gender
STDDEV_POP
编辑
概要
输入:
输出:double
数值
说明:
返回字段 field_name
中输入值的总体标准差。
SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_POP(salary) AS stddev FROM emp; min | max | stddev ---------------+---------------+------------------ 25324 |74999 |13765.125502787832
SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_POP(salary / 12.0) AS stddev FROM emp; min | max | stddev ------------------+-----------------+----------------- 2110.3333333333335|6249.916666666667|1147.093791898986
STDDEV_SAMP
编辑
概要
输入:
输出:double
数值
说明:
返回字段 field_name
中输入值的样本标准差。
SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_SAMP(salary) AS stddev FROM emp; min | max | stddev ---------------+---------------+------------------ 25324 |74999 |13834.471662090747
SELECT MIN(salary / 12.0) AS min, MAX(salary / 12.0) AS max, STDDEV_SAMP(salary / 12.0) AS stddev FROM emp; min | max | stddev ------------------+-----------------+----------------- 2110.3333333333335|6249.916666666667|1152.872638507562
SUM_OF_SQUARES
编辑
概要
输入:
输出:double
数值
说明:
返回字段 field_name
中输入值的平方和。
SELECT MIN(salary) AS min, MAX(salary) AS max, SUM_OF_SQUARES(salary) AS sumsq FROM emp; min | max | sumsq ---------------+---------------+---------------- 25324 |74999 |2.51740125721E11
SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, SUM_OF_SQUARES(salary / 24.0) AS sumsq FROM emp; min | max | sumsq ------------------+------------------+------------------- 1055.1666666666667|3124.9583333333335|4.370488293767361E8
VAR_POP
编辑
概要
输入:
输出:double
数值
说明:
返回字段 field_name
中输入值的总体方差。
SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_POP(salary) AS varpop FROM emp; min | max | varpop ---------------+---------------+---------------- 25324 |74999 |1.894786801075E8
SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_POP(salary / 24.0) AS varpop FROM emp; min | max | varpop ------------------+------------------+------------------ 1055.1666666666667|3124.9583333333335|328956.04185329855
VAR_SAMP
编辑
概要
输入:
输出:double
数值
说明:
返回字段 field_name
中输入值的 样本方差。
SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_SAMP(salary) AS varsamp FROM emp; min | max | varsamp ---------------+---------------+---------------- 25324 |74999 |1.913926061691E8
SELECT MIN(salary / 24.0) AS min, MAX(salary / 24.0) AS max, VAR_SAMP(salary / 24.0) AS varsamp FROM emp; min | max | varsamp ------------------+------------------+---------------- 1055.1666666666667|3124.9583333333335|332278.830154847