title: 'Aggregate Functions'
sidebar_label: 'Aggregate Functions'
sidebar_position: 7
:::info
Spice is built on Apache DataFusion and uses the PostgreSQL dialect, even when querying datasources with different SQL dialects. Note, when using a data accelerator like DuckDB, function support is specific to each acceleration engine, and not all functions are supported by all acceleration engines.
:::
Aggregate functions operate on a set of values to compute a single result.
Filter clause
Aggregate functions support the SQL FILTER (WHERE ...) clause to restrict which input rows contribute to the aggregate result.
Example:
Note: When no rows pass the filter, COUNT returns 0 while SUM/AVG/MIN/MAX return NULL.
General Functions
array_agg
Returns an array created from the expression elements. If ordering is required, elements are inserted in the specified order.
This aggregation function can only mix DISTINCT and ORDER BY if the ordering expression is exactly the same as the argument expression.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
avg
Returns the average of numeric values in the specified column.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
Aliases
bit_and
Computes the bitwise AND of all non-null input values.
Arguments
- expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.
bit_or
Computes the bitwise OR of all non-null input values.
Arguments
- expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.
bit_xor
Computes the bitwise exclusive OR of all non-null input values.
Arguments
- expression: Integer expression to operate on. Can be a constant, column, or function, and any combination of operators.
bool_and
Returns true if all non-null input values are true, otherwise false.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
bool_or
Returns true if any non-null input value is true, otherwise false.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
count
Returns the number of non-null values in the specified column. To include null values in the total count, use count(*).
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
first_value
Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
grouping
Returns 1 if the data is aggregated across the specified column, or 0 if it is not aggregated in the result set.
Arguments
- expression: Expression to evaluate whether data is aggregated across the specified column. Can be a constant, column, or function.
Example
last_value
Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
max
Returns the maximum value in the specified column.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
mean
Alias of avg.
median
Returns the median value in the specified column.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
min
Returns the minimum value in the specified column.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
string_agg
Concatenates the values of string expressions and places separator values between them. If ordering is required, strings are concatenated in the specified order. This aggregation function can only mix DISTINCT and ORDER BY if the ordering expression is exactly the same as the first argument expression.
Arguments
- expression: The string expression to concatenate. Can be a column or any valid string expression.
- delimiter: A literal string used as a separator between the concatenated values.
Example
sum
Returns the sum of all values in the specified column.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
var
Returns the statistical sample variance of a set of numbers.
Arguments
- expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
Aliases
var_pop
Returns the statistical population variance of a set of numbers.
Arguments
- expression: Numeric expression to operate on. Can be a constant, column, or function, and any combination of operators.
Aliases
var_population
Alias of var_pop.
var_samp
Alias of var.
var_sample
Alias of var.
Statistical Functions
corr
Returns the coefficient of correlation between two numeric values.
Arguments
- expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
covar
Alias of covar_samp.
covar_pop
Returns the sample covariance of a set of number pairs.
Arguments
- expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
covar_samp
Returns the sample covariance of a set of number pairs.
Arguments
- expression1: First expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression2: Second expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
Aliases
nth_value
Returns the nth value in a group of values.
Arguments
- expression: The column or expression to retrieve the nth value from.
- n: The position (nth) of the value to retrieve, based on the ordering.
Example
regr_avgx
Computes the average of the independent variable (input) expression_x for the non-null paired data points.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_avgy
Computes the average of the dependent variable (output) expression_y for the non-null paired data points.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_count
Counts the number of non-null paired data points.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_intercept
Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_r2
Computes the square of the correlation coefficient between the independent and dependent variables.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_slope
Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_sxx
Computes the sum of squares of the independent variable.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_sxy
Computes the sum of products of paired data points.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
regr_syy
Computes the sum of squares of the dependent variable.
Arguments
- expression_y: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- expression_x: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
stddev
Returns the standard deviation of a set of numbers.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
Aliases
stddev_pop
Returns the population standard deviation of a set of numbers.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
stddev_samp
Alias of stddev.
Approximate Functions
approx_distinct
Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
approx_median
Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(0.5) WITHIN GROUP (ORDER BY x).
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
Example
approx_percentile_cont
Returns the approximate percentile of input values using the t-digest algorithm.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
- percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).
- centroids: Number of centroids to use in the t-digest algorithm. Default is 100. A higher number results in more accurate approximation but requires more memory.
Example
An alternate syntax is also supported:
approx_percentile_cont_with_weight
Returns the weighted approximate percentile of input values using the t-digest algorithm.
Arguments
- expression: The expression to operate on. Can be a constant, column, or function, and any combination of operators.
- weight: Expression to use as weight. Can be a constant, column, or function, and any combination of arithmetic operators.
- percentile: Percentile to compute. Must be a float value between 0 and 1 (inclusive).
- centroids: Number of centroids to use in the t-digest algorithm. Default is 100. A higher number results in more accurate approximation but requires more memory.
Example
An alternative syntax is also supported: