CORR()
Correlation coefficient
DOUBLE PRECISION
CORR ( <expr1>, <expr2> )
Parameter | Description |
---|---|
exprN |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The CORR
function return the correlation coefficient for a pair of numerical expressions.
The function CORR(<expr1>, <expr2>)
is equivalent to
COVAR_POP(<expr1>, <expr2>) / (STDDEV_POP(<expr2>) * STDDEV_POP(<expr1>))
This is also known as the Pearson correlation coefficient.
In a statistical sense, correlation is the degree to which a pair of variables are linearly related.A linear relation between variables means that the value of one variable can to a certain extent predict the value of the other.The correlation coefficient represents the degree of correlation as a number ranging from -1 (high inverse correlation) to 1 (high correlation).A value of 0 corresponds to no correlation.
If the group or window is empty, or contains only NULL
values, the result will be NULL
.
CORR
Examplesselect
corr(alength, aheight) AS c_corr
from measure
COVAR_POP()
Population covariance
DOUBLE PRECISION
COVAR_POP ( <expr1>, <expr2> )
Parameter | Description |
---|---|
exprN |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function COVAR_POP
returns the population covariance for a pair of numerical expressions.
The function COVAR_POP(<expr1>, <expr2>)
is equivalent to
(SUM(<expr1> * <expr2>) - SUM(<expr1>) * SUM(<expr2>) / COUNT(*)) / COUNT(*)
If the group or window is empty, or contains only NULL
values, the result will be NULL
.
COVAR_POP
Examplesselect
covar_pop(alength, aheight) AS c_covar_pop
from measure
COVAR_SAMP()
Sample covariance
DOUBLE PRECISION
COVAR_SAMP ( <expr1>, <expr2> )
Parameter | Description |
---|---|
exprN |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function COVAR_SAMP
returns the sample covariance for a pair of numerical expressions.
The function COVAR_SAMP(<expr1>, <expr2>)
is equivalent to
(SUM(<expr1> * <expr2>) - SUM(<expr1>) * SUM(<expr2>) / COUNT(*)) / (COUNT(*) - 1)
If the group or window is empty, contains only 1 row, or contains only NULL
values, the result will be NULL
.
COVAR_SAMP
Examplesselect
covar_samp(alength, aheight) AS c_covar_samp
from measure
STDDEV_POP()
Population standard deviation
DOUBLE PRECISION
or NUMERIC
depending on the type of expr
STDDEV_POP ( <expr> )
Parameter | Description |
---|---|
expr |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function STDDEV_POP
returns the population standard deviation for a group or window.NULL
values are skipped.
The function STDDEV_POP(<expr>)
is equivalent to
SQRT(VAR_POP(<expr>))
If the group or window is empty, or contains only NULL
values, the result will be NULL
.
STDDEV_POP
Examplesselect
dept_no
stddev_pop(salary)
from employee
group by dept_no
STDDEV_SAMP()
Sample standard deviation
DOUBLE PRECISION
or NUMERIC
depending on the type of expr
STDDEV_POP ( <expr> )
Parameter | Description |
---|---|
expr |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function STDDEV_SAMP
returns the sample standard deviation for a group or window.NULL
values are skipped.
The function STDDEV_SAMP(<expr>)
is equivalent to
SQRT(VAR_SAMP(<expr>))
If the group or window is empty, contains only 1 row, or contains only NULL
values, the result will be NULL
.
STDDEV_SAMP
Examplesselect
dept_no
stddev_samp(salary)
from employee
group by dept_no
VAR_POP()
Population variance
DOUBLE PRECISION
or NUMERIC
depending on the type of expr
VAR_POP ( <expr> )
Parameter | Description |
---|---|
expr |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function VAR_POP
returns the population variance for a group or window.NULL
values are skipped.
The function VAR_POP(<expr>)
is equivalent to
(SUM(<expr> * <expr>) - SUM (<expr>) * SUM (<expr>) / COUNT(<expr>)) / COUNT (<expr>)
If the group or window is empty, or contains only NULL
values, the result will be NULL
.
VAR_POP
Examplesselect
dept_no
var_pop(salary)
from employee
group by dept_no
VAR_SAMP()
Sample variance
DOUBLE PRECISION
or NUMERIC
depending on the type of expr
VAR_SAMP ( <expr> )
Parameter | Description |
---|---|
expr |
Numeric expression.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function VAR_POP
returns the sample variance for a group or window.NULL
values are skipped.
The function VAR_SAMP(<expr>)
is equivalent to
(SUM(<expr> * <expr>) - SUM(<expr>) * SUM (<expr>) / COUNT (<expr>)) / (COUNT(<expr>) - 1)
If the group or window is empty, contains only 1 row, or contains only NULL
values, the result will be NULL
.
VAR_SAMP
Examplesselect
dept_no
var_samp(salary)
from employee
group by dept_no
Linear regression functions are useful for trend line continuation.The trend or regression line is usually a pattern followed by a set of values.Linear regression is useful to predict future values.To continue the regression line, you need to know the slope and the point of intersection with the y-axis.As set of linear functions can be used for calculating these values.
In the function syntax, y is interpreted as an x-dependent variable.
The linear regression aggregate functions take a pair of arguments, the dependent variable expression (y) and the independent variable expression (x), which are both numeric value expressions.Any row in which either argument evaluates to NULL
is removed from the rows that qualify.If there are no rows that qualify, then the result of REGR_COUNT
is 0
(zero), and the other linear regression aggregate functions result in NULL
.
REGR_AVGX()
Average of the independent variable of the regression line
DOUBLE PRECISION
REGR_AVGX ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_AVGX
calculates the average of the independent variable (x) of the regression line.
The function REGR_AVGX(<y>, <x>)
is equivalent to
SUM(<exprX>) / REGR_COUNT(<y>, <x>) <exprX> :== CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
REGR_AVGY()
Average of the dependent variable of the regression line
DOUBLE PRECISION
REGR_AVGY ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_AVGY
calculates the average of the dependent variable (y) of the regression line.
The function REGR_AVGY(<y>, <x>)
is equivalent to
SUM(<exprY>) / REGR_COUNT(<y>, <x>) <exprY> :== CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END
REGR_COUNT()
Number of non-empty pairs of the regression line
DOUBLE PRECISION
REGR_COUNT ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_COUNT
counts the number of non-empty pairs of the regression line.
The function REGR_COUNT(<y>, <x>)
is equivalent to
COUNT(*) FILTER (WHERE <x> IS NOT NULL AND <y> IS NOT NULL)
REGR_INTERCEPT()
Point of intersection of the regression line with the y-axis
DOUBLE PRECISION
REGR_INTERCEPT ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_INTERCEPT
calculates the point of intersection of the regression line with the y-axis.
The function REGR_INTERCEPT(<y>, <x>)
is equivalent to
REGR_AVGY(<y>, <x>) - REGR_SLOPE(<y>, <x>) * REGR_AVGX(<y>, <x>)
REGR_INTERCEPT
ExamplesForecasting sales volume
with recursive years (byyear) as (
select 1991
from rdb$database
union all
select byyear + 1
from years
where byyear < 2020
),
s as (
select
extract(year from order_date) as byyear,
sum(total_value) as total_value
from sales
group by 1
),
regr as (
select
regr_intercept(total_value, byyear) as intercept,
regr_slope(total_value, byyear) as slope
from s
)
select
years.byyear as byyear,
intercept + (slope * years.byyear) as total_value
from years
cross join regr
BYYEAR TOTAL_VALUE ------ ------------ 1991 118377.35 1992 414557.62 1993 710737.89 1994 1006918.16 1995 1303098.43 1996 1599278.69 1997 1895458.96 1998 2191639.23 1999 2487819.50 2000 2783999.77 ...
REGR_R2()
Coefficient of determination of the regression line
DOUBLE PRECISION
REGR_R2 ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The REGR_R2 function calculates the coefficient of determination, or R-squared, of the regression line.
The function REGR_R2(<y>, <x>)
is equivalent to
POWER(CORR(<y>, <x>), 2)
REGR_SLOPE()
Slope of the regression line
DOUBLE PRECISION
REGR_SLOPE ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_SLOPE
calculates the slope of the regression line.
The function REGR_SLOPE(<y>, <x>)
is equivalent to
COVAR_POP(<y>, <x>) / VAR_POP(<exprX>) <exprX> :== CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
REGR_SXX()
Sum of squares of the independent variable
DOUBLE PRECISION
REGR_SXX ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_SXX
calculates the sum of squares of the independent expression variable (x).
The function REGR_SXX(<y>, <x>)
is equivalent to
REGR_COUNT(<y>, <x>) * VAR_POP(<exprX>) <exprX> :== CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <x> END
REGR_SXY()
Sum of products of the independent variable and the dependent variable
DOUBLE PRECISION
REGR_SXY ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_SXY
calculates the sum of products of independent variable expression (x) times dependent variable expression (y).
The function REGR_SXY(<y>, <x>)
is equivalent to
REGR_COUNT(<y>, <x>) * COVAR_POP(<y>, <x>)
REGR_SYY()
Sum of squares of the dependent variable
DOUBLE PRECISION
REGR_SYY ( <y>, <x> )
Parameter | Description |
---|---|
y |
Dependent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
x |
Independent variable of the regression line.It may contain a table column, a constant, a variable, an expression, a non-aggregate function or a UDF.Aggregate functions are not allowed as expressions. |
The function REGR_SYY
calculates the sum of squares of the dependent variable (y).
The function REGR_SYY(<y>, <x>)
is equivalent to
REGR_COUNT(<y>, <x>) * VAR_POP(<exprY>) <exprY> :== CASE WHEN <x> IS NOT NULL AND <y> IS NOT NULL THEN <y> END