The Firebird 5.0 Language Reference | Aggregate Functions As Window Functions

Aggregate Functions as Window Functions

All aggregate functions — including FILTER clause — can be used as window functions, by adding the OVER clause.

Imagine a table EMPLOYEE with columns ID, NAME and SALARY, and the need to show each employee with their respective salary and the percentage of their salary over the payroll.

A normal query could achieve this, as follows:

select
    id,
    department,
    salary,
    salary / (select sum(salary) from employee) portion
  from employee
  order by id;

Results

id  department  salary  portion
--  ----------  ------  ----------
1   R & D        10.00      0.2040
2   SALES        12.00      0.2448
3   SALES         8.00      0.1632
4   R & D         9.00      0.1836
5   R & D        10.00      0.2040

The query is repetitive and lengthy to run, especially if EMPLOYEE happens to be a complex view.

The same query could be specified in a much faster and more elegant way using a window function:

select
    id,
    department,
    salary,
    salary / sum(salary) OVER () portion
  from employee
  order by id;

Here, sum(salary) over () is computed with the sum of all SALARY from the query (the EMPLOYEE table).

Partitioning

Like aggregate functions, that may operate alone or in relation to a group, window functions may also operate on a group, which is called a “partition”.

Syntax

<window function>(...) OVER (PARTITION BY <expr> [, <expr> ...])

Aggregation over a group could produce more than one row, so the result set generated by a partition is joined with the main query using the same expression list as the partition.

Continuing the EMPLOYEE example, instead of getting the portion of each employee’s salary over the all-employees total, we would like to get the portion based on the employees in the same department:

select
    id,
    department,
    salary,
    salary / sum(salary) OVER (PARTITION BY department) portion
  from employee
  order by id;

Results

id  department  salary  portion
--  ----------  ------  ----------
1   R & D        10.00      0.3448
2   SALES        12.00      0.6000
3   SALES         8.00      0.4000
4   R & D         9.00      0.3103
5   R & D        10.00      0.3448

Ordering

The ORDER BY sub-clause can be used with or without partitions.The ORDER BY clause within OVER specifies the order in which the window function will process rows.This order does not have to be the same as the order rows appear in the output.

There is an important concept associated with window functions: for each row there is a set of rows in its partition called the window frame.By default, when specifying ORDER BY, the frame consists of all rows from the beginning of the partition to the current row and rows equal to the current ORDER BY expression.Without ORDER BY, the default frame consists of all rows in the partition.

As a result, for standard aggregate functions, the ORDER BY clause produces partial aggregation results as rows are processed.

Example

select
    id,
    salary,
    sum(salary) over (order by salary) cumul_salary
  from employee
  order by salary;

Results

id  salary  cumul_salary
--  ------  ------------
3     8.00          8.00
4     9.00         17.00
1    10.00         37.00
5    10.00         37.00
2    12.00         49.00

Then cumul_salary returns the partial/accumulated (or running) aggregation (of the SUM function).It may appear strange that 37.00 is repeated for the ids 1 and 5, but that is how it should work.The ORDER BY keys are grouped together, and the aggregation is computed once (but summing the two 10.00).To avoid this, you can add the ID field to the end of the ORDER BY clause.

It’s possible to use multiple windows with different orders, and ORDER BY parts like ASC/DESC and NULLS FIRST/LAST.

With a partition, ORDER BY works the same way, but at each partition boundary the aggregation is reset.

All aggregation functions can use ORDER BY, except for LIST().

Window Frames

A window frame specifies which rows to consider for the current row when evaluating the window function.

The frame comprises three pieces: unit, start bound, and end bound.The unit can be RANGE or ROWS, which defines how the bounds will work.

The bounds are:

UNBOUNDED PRECEDING
<expr> PRECEDING
CURRENT ROW
<expr> FOLLOWING
UNBOUNDED FOLLOWING

With RANGE, the ORDER BY should specify exactly one expression, and that expression should be of a numeric, date, time, or timestamp type.For <expr> PRECEDING, expr is subtracted from the ORDER BY expression, and for <expr> FOLLOWING, expr is added.For CURRENT ROW, the expression is used as-is.

All rows inside the current partition that are between the bounds are considered part of the resulting window frame.
With ROWS, ORDER BY expressions are not limited by number or type.For this unit, <expr> PRECEDING and <expr FOLLOWING relate to the row position within the current partition, and not the values of the ordering keys.

Both UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING work identical with RANGE and ROWS.UNBOUNDED PRECEDING start at the first row of the current partition, and UNBOUNDED FOLLOWING ends at the last row of the current partition.

The frame syntax with <window-frame-start> specifies the start-frame, with the end-frame being CURRENT ROW.

Some window functions discard frames:

ROW_NUMBER, LAG and LEAD always work as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
DENSE_RANK, RANK, PERCENT_RANK and CUME_DIST always work as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
FIRST_VALUE, LAST_VALUE and NTH_VALUE respect frames, but the RANGE unit behaviour is identical to ROWS.

Example Using Frame

When the ORDER BY clause is used, but a frame clause is omitted, the default considers the partition up to the current row.When combined with SUM, this results in a running total:

select
  id,
  salary,
  sum(salary) over (order by salary) sum_salary
from employee
order by salary;

Result:

| id | salary | sum_salary |
|---:|-------:|-----------:|
|  3 |   8.00 |       8.00 |
|  4 |   9.00 |      17.00 |
|  1 |  10.00 |      37.00 |
|  5 |  10.00 |      37.00 |
|  2 |  12.00 |      49.00 |

On the other hand, if we apply a frame for the entire partition, we get the total for the entire partition.

select
  id,
  salary,
  sum(salary) over (
    order by salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) sum_salary
from employee
order by salary;

Result:

| id | salary | sum_salary |
|---:|-------:|-----------:|
|  3 |   8.00 |      49.00 |
|  4 |   9.00 |      49.00 |
|  1 |  10.00 |      49.00 |
|  5 |  10.00 |      49.00 |
|  2 |  12.00 |      49.00 |

This example is to demonstrate how this works;the result of this example would be simpler to produce with sum(salary) over().

We can use a range frame to compute the count of employees with salaries between (an employee’s salary - 1) and (their salary + 1) with this query:

select
  id,
  salary,
  count(*) over (
    order by salary
    RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) range_count
from employee
order by salary;

Result:

| id | salary | range_count |
|---:|-------:|------------:|
|  3 |   8.00 |           2 |
|  4 |   9.00 |           4 |
|  1 |  10.00 |           3 |
|  5 |  10.00 |           3 |
|  2 |  12.00 |           1 |

Named Windows

The WINDOW clause can be used to explicitly name a window, for example to avoid repetitive or confusing expressions.

A named window can be used

in the OVER clause to reference a window definition, e.g. OVER window_name
as a base window of another named or inline (OVER) window, if it is not a window with a frame (ROWS or RANGE clauses)

Note

A window with a base windows cannot have PARTITION BY, nor override the ordering (ORDER BY) of a base window.

Ranking Functions

The ranking functions compute the ordinal rank of a row within the window partition.

These functions can be used with or without partitioning and ordering.However, using them without ordering almost never makes sense.

The ranking functions can be used to create different type of counters.Consider SUM(1) OVER (ORDER BY SALARY) as an example of what they can do, each of them differently.Following is an example query, also comparing with the SUM behavior.

select
    id,
    salary,
    dense_rank() over (order by salary),
    rank() over (order by salary),
    row_number() over (order by salary),
    sum(1) over (order by salary)
  from employee
  order by salary;

Results

id  salary  dense_rank  rank  row_number  sum
--  ------  ----------  ----  ----------  ---
 3    8.00           1     1           1    1
 4    9.00           2     2           2    2
 1   10.00           3     3           3    4
 5   10.00           3     3           4    4
 2   12.00           4     5           5    5

The difference between DENSE_RANK and RANK is that there is a gap related to duplicate rows (relative to the window ordering) only in RANK.DENSE_RANK continues assigning sequential numbers after the duplicate salary.On the other hand, ROW_NUMBER always assigns sequential numbers, even when there are duplicate values.

`CUME_DIST()`

Relative rank (or, cumulative distribution) of a row within a window partition

Result type

DOUBLE PRECISION

Syntax

CUME_DIST () OVER <window_name_or_spec>

CUME_DIST is calculated as the number of rows preceding or peer of the current row divided by the number of rows in the partition.

In other words, CUME_DIST() OVER <window_name_or_spec> is equivalent to COUNT(*) OVER <window_name_or_spec> / COUNT(*) OVER()

`CUME_DIST` Examples

select
  id,
  salary,
  cume_dist() over (order by salary)
from employee
order by salary;

Result

id salary cume_dist
-- ------ ---------
 3   8.00       0.2
 4   9.00       0.4
 1  10.00       0.8
 5  10.00       0.8
 2  12.00         1

`DENSE_RANK()`

Rank of rows in a partition without gaps

Result type

BIGINT

Syntax

DENSE_RANK () OVER <window_name_or_spec>

Rows with the same window_order values get the same rank within the partition window_partition, if specified.The dense rank of a row is equal to the number of different rank values in the partition preceding the current row, plus one.

`DENSE_RANK` Examples

select
  id,
  salary,
  dense_rank() over (order by salary)
from employee
order by salary;

Result

id salary dense_rank
-- ------ ----------
 3  8.00           1
 4  9.00           2
 1 10.00           3
 5 10.00           3
 2 12.00           4

`NTILE()`

Distributes the rows of the current window partition into the specified number of tiles (groups)

Result type

BIGINT

Syntax

NTILE ( number_of_tiles ) OVER <window_name_or_spec>

Table 1. Arguments of `NTILE`
Argument	Description
number_of_tiles	Number of tiles (groups).Restricted to a positive integer literal, a named parameter (PSQL), or a positional parameter (DSQL).

`NTILE` Examples

select
  id,
  salary,
  rank() over (order by salary),
  ntile(3) over (order by salary)
from employee
order by salary;

Result

ID SALARY RANK NTILE
== ====== ==== =====
 3   8.00    1     1
 4   9.00    2     1
 1  10.00    3     2
 5  10.00    3     2
 2  12.00    5     3

`PERCENT_RANK()`

Relative rank of a row within a window partition.

Result type

DOUBLE PRECISION

Syntax

PERCENT_RANK () OVER <window_name_or_spec>

PERCENT_RANK is calculated as the [fblangref50-windowfuncs-rank] minus 1 of the current row divided by the number of rows in the partition minus 1.

In other words, PERCENT_RANK() OVER <window_name_or_spec> is equivalent to (RANK() OVER <window_name_or_spec> - 1) / CAST(COUNT(*) OVER() - 1 AS DOUBLE PRECISION)

`PERCENT_RANK` Examples

select
  id,
  salary,
  rank() over (order by salary),
  percent_rank() over (order by salary)
from employee
order by salary;

Result

id salary rank percent_rank
-- ------ ---- ------------
 3   8.00    1            0
 4   9.00    2         0.25
 1  10.00    3          0.5
 5  10.00    3          0.5
 2  12.00    5            1

`RANK()`

Rank of each row in a partition

Result type

BIGINT

Syntax

RANK () OVER <window_name_or_spec>

Rows with the same values of window-order get the same rank with in the partition window-partition, if specified.The rank of a row is equal to the number of rank values in the partition preceding the current row, plus one.

`RANK` Examples

select
  id,
  salary,
  rank() over (order by salary)
from employee
order by salary;

Result

id salary rank
-- ------ ----
 3  8.00     1
 4  9.00     2
 1 10.00     3
 5 10.00     3
 2 12.00     5

`ROW_NUMBER()`

Sequential row number in the partition

Result type

BIGINT

Syntax

ROW_NUMBER () OVER <window_name_or_spec>

Returns the sequential row number in the partition, where 1 is the first row in each of the partitions.

`ROW_NUMBER` Examples

select
  id,
  salary,
  row_number() over (order by salary)
from employee
order by salary;

Result

id salary rank
-- ------ ----
 3  8.00     1
 4  9.00     2
 1 10.00     3
 5 10.00     4
 2 12.00     5

Navigational Functions

The navigational functions get the simple (non-aggregated) value of an expression from another row of the query, within the same partition.

Important

FIRST_VALUE, LAST_VALUE and NTH_VALUE also operate on a window frame.For navigational functions, Firebird applies a default frame from the first to the current row of the partition, not to the last.In other words, it behaves as if the following frame is specified:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

This is likely to produce strange or unexpected results for NTH_VALUE and especially LAST_VALUE, so make sure to specify an explicit frame with these functions.

Example of Navigational Functions

select
    id,
    salary,
    first_value(salary) over (order by salary),
    last_value(salary) over (order by salary),
    nth_value(salary, 2) over (order by salary),
    lag(salary) over (order by salary),
    lead(salary) over (order by salary)
  from employee
  order by salary;

Results

id  salary  first_value  last_value  nth_value     lag    lead
--  ------  -----------  ----------  ---------  ------  ------
3     8.00         8.00        8.00     <null>  <null>    9.00
4     9.00         8.00        9.00       9.00    8.00   10.00
1    10.00         8.00       10.00       9.00    9.00   10.00
5    10.00         8.00       10.00       9.00   10.00   12.00
2    12.00         8.00       12.00       9.00   10.00  <null>

`FIRST_VALUE()`

First value of the current partition

Result type

The same as type as expr

Syntax

FIRST_VALUE ( <expr> ) OVER <window_name_or_spec>

Table 1. Arguments of `FIRST_VALUE`
Argument	Description
expr	Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

`LAG()`

Value from row in the current partition with a given offset before the current row

Result type

The same as type as expr

Syntax

LAG ( <expr> [, <offset [, <default>]])
  OVER <window_name_or_spec>

Table 1. Arguments of `LAG`
Argument	Description
expr	Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.
offset	The offset in rows before the current row to get the value identified by expr.If offset is not specified, the default is `1`.offset can be a column, subquery or other expression that results in a positive integer value, or another type that can be implicitly converted to `BIGINT`.`offset` cannot be negative (use `LEAD` instead).
default	The default value to return if offset points outside the partition.Default is `NULL`.

The LAG function provides access to the row in the current partition with a given offset before the current row.

If offset points outside the current partition, default will be returned, or NULL if no default was specified.

`LAG` Examples

Suppose you have RATE table that stores the exchange rate for each day.To trace the change of the exchange rate over the past five days you can use the following query.

select
  bydate,
  cost,
  cost - lag(cost) over (order by bydate) as change,
  100 * (cost - lag(cost) over (order by bydate)) /
    lag(cost) over (order by bydate) as percent_change
from rate
where bydate between dateadd(-4 day to current_date)
and current_date
order by bydate

Result

bydate     cost   change percent_change
---------- ------ ------ --------------
27.10.2014  31.00 <null>         <null>
28.10.2014  31.53   0.53         1.7096
29.10.2014  31.40  -0.13        -0.4123
30.10.2014  31.67   0.27         0.8598
31.10.2014  32.00   0.33         1.0419

See also

[fblangref50-windowfuncs-lead]

`LAST_VALUE()`

Last value from the current partition

Result type

The same as type as expr

Syntax

LAST_VALUE ( <expr> ) OVER <window_name_or_spec>

Table 1. Arguments of `LAST_VALUE`
Argument	Description
expr	Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

`LEAD()`

Value from a row in the current partition with a given offset after the current row

Result type

The same as type as expr

Syntax

LEAD ( <expr> [, <offset [, <default>]])
  OVER <window_name_or_spec>

Table 1. Arguments of `LEAD`
Argument	Description
expr	Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.
offset	The offset in rows after the current row to get the value identified by expr.If offset is not specified, the default is `1`.offset can be a column, subquery or other expression that results in a positive integer value, or another type that can be implicitly converted to `BIGINT`.`offset` cannot be negative (use `LAG` instead).
default	The default value to return if offset points outside the partition.Default is `NULL`.

The LEAD function provides access to the row in the current partition with a given offset after the current row.

If offset points outside the current partition, default will be returned, or NULL if no default was specified.

See also

[fblangref50-windowfuncs-lag]

`NTH_VALUE()`

The Nth value starting from the first or the last row of the current frame

Result type

The same as type as expr

Syntax

NTH_VALUE ( <expr>, <offset> )
  [FROM {FIRST | LAST}]
  OVER <window_name_or_spec>

Table 1. Arguments of `NTH_VALUE`
Argument	Description
expr	Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.
offset	The offset in rows from the start (`FROM FIRST`), or the last (`FROM LAST`) to get the value identified by expr.offset can be a column, subquery or other expression that results in a positive integer value, or another type that can be implicitly converted to `BIGINT`.`offset` cannot be zero or negative.

The NTH_VALUE function returns the Nth value starting from the first (FROM FIRST) or the last (FROM LAST) row of the current frame, see also note on frame for navigational functions.Offset 1 with FROM FIRST is equivalent to FIRST_VALUE, and offset 1 with FROM LAST is equivalent to LAST_VALUE.

Aggregate Functions Inside Window Specification

It is possible to use aggregate functions (but not window functions) inside the OVER clause.In that case, first the aggregate function is applied to determine the windows, and only then the window functions are applied on those windows.

Note	When using aggregate functions inside `OVER`, all columns not used in aggregate functions must be specified in the `GROUP BY` clause of the `SELECT`.

Using an Aggregate Function in a Window Specification

select
  code_employee_group,
  avg(salary) as avg_salary,
  rank() over (order by avg(salary)) as salary_rank
from employee
group by code_employee_group

Aggregate Functions As Window Functions

Aggregate Functions as Window Functions

Partitioning

Ordering

Window Frames

Example Using Frame

Named Windows

Ranking Functions

CUME_DIST()

CUME_DIST Examples

DENSE_RANK()

DENSE_RANK Examples

NTILE()

NTILE Examples

PERCENT_RANK()

PERCENT_RANK Examples

RANK()

RANK Examples

ROW_NUMBER()

ROW_NUMBER Examples

Navigational Functions

Example of Navigational Functions

FIRST_VALUE()

LAG()

LAG Examples

LAST_VALUE()

LEAD()

NTH_VALUE()

Aggregate Functions Inside Window Specification

`CUME_DIST()`

`CUME_DIST` Examples

`DENSE_RANK()`

`DENSE_RANK` Examples

`NTILE()`

`NTILE` Examples

`PERCENT_RANK()`

`PERCENT_RANK` Examples

`RANK()`

`RANK` Examples

`ROW_NUMBER()`

`ROW_NUMBER` Examples

`FIRST_VALUE()`

`LAG()`

`LAG` Examples

`LAST_VALUE()`

`LEAD()`

`NTH_VALUE()`