FirebirdSQL logo
 Aggregate FunctionsSystem Packages 

Aggregate Functions as Window Functions

All aggregate functions — including FILTER clause — can be used as window functions, by adding the OVER clause.

Imagine a table EMPLOYEE with columns ID, NAME and SALARY, and the need to show each employee with their respective salary and the percentage of their salary over the payroll.

A normal query could achieve this, as follows:

select
    id,
    department,
    salary,
    salary / (select sum(salary) from employee) portion
  from employee
  order by id;
Results
id  department  salary  portion
--  ----------  ------  ----------
1   R & D        10.00      0.2040
2   SALES        12.00      0.2448
3   SALES         8.00      0.1632
4   R & D         9.00      0.1836
5   R & D        10.00      0.2040

The query is repetitive and lengthy to run, especially if EMPLOYEE happens to be a complex view.

The same query could be specified in a much faster and more elegant way using a window function:

select
    id,
    department,
    salary,
    salary / sum(salary) OVER () portion
  from employee
  order by id;

Here, sum(salary) over () is computed with the sum of all SALARY from the query (the EMPLOYEE table).

Partitioning

Like aggregate functions, that may operate alone or in relation to a group, window functions may also operate on a group, which is called a “partition”.

Syntax
<window function>(...) OVER (PARTITION BY <expr> [, <expr> ...])

Aggregation over a group could produce more than one row, so the result set generated by a partition is joined with the main query using the same expression list as the partition.

Continuing the EMPLOYEE example, instead of getting the portion of each employee’s salary over the all-employees total, we would like to get the portion based on the employees in the same department:

select
    id,
    department,
    salary,
    salary / sum(salary) OVER (PARTITION BY department) portion
  from employee
  order by id;
Results
id  department  salary  portion
--  ----------  ------  ----------
1   R & D        10.00      0.3448
2   SALES        12.00      0.6000
3   SALES         8.00      0.4000
4   R & D         9.00      0.3103
5   R & D        10.00      0.3448

Ordering

The ORDER BY sub-clause can be used with or without partitions.The ORDER BY clause within OVER specifies the order in which the window function will process rows.This order does not have to be the same as the order rows appear in the output.

There is an important concept associated with window functions: for each row there is a set of rows in its partition called the window frame.By default, when specifying ORDER BY, the frame consists of all rows from the beginning of the partition to the current row and rows equal to the current ORDER BY expression.Without ORDER BY, the default frame consists of all rows in the partition.

As a result, for standard aggregate functions, the ORDER BY clause produces partial aggregation results as rows are processed.

Example
select
    id,
    salary,
    sum(salary) over (order by salary) cumul_salary
  from employee
  order by salary;
Results
id  salary  cumul_salary
--  ------  ------------
3     8.00          8.00
4     9.00         17.00
1    10.00         37.00
5    10.00         37.00
2    12.00         49.00

Then cumul_salary returns the partial/accumulated (or running) aggregation (of the SUM function).It may appear strange that 37.00 is repeated for the ids 1 and 5, but that is how it should work.The ORDER BY keys are grouped together, and the aggregation is computed once (but summing the two 10.00).To avoid this, you can add the ID field to the end of the ORDER BY clause.

It’s possible to use multiple windows with different orders, and ORDER BY parts like ASC/DESC and NULLS FIRST/LAST.

With a partition, ORDER BY works the same way, but at each partition boundary the aggregation is reset.

All aggregation functions can use ORDER BY, except for LIST().

Window Frames

A window frame specifies which rows to consider for the current row when evaluating the window function.

The frame comprises three pieces: unit, start bound, and end bound.The unit can be RANGE or ROWS, which defines how the bounds will work.

The bounds are:

  • UNBOUNDED PRECEDING

  • <expr> PRECEDING

  • CURRENT ROW

  • <expr> FOLLOWING

  • UNBOUNDED FOLLOWING

  • With RANGE, the ORDER BY should specify exactly one expression, and that expression should be of a numeric, date, time, or timestamp type.For <expr> PRECEDING, expr is subtracted from the ORDER BY expression, and for <expr> FOLLOWING, expr is added.For CURRENT ROW, the expression is used as-is.

    All rows inside the current partition that are between the bounds are considered part of the resulting window frame.

  • With ROWS, ORDER BY expressions are not limited by number or type.For this unit, <expr> PRECEDING and <expr FOLLOWING relate to the row position within the current partition, and not the values of the ordering keys.

Both UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING work identical with RANGE and ROWS.UNBOUNDED PRECEDING start at the first row of the current partition, and UNBOUNDED FOLLOWING ends at the last row of the current partition.

The frame syntax with <window-frame-start> specifies the start-frame, with the end-frame being CURRENT ROW.

Some window functions discard frames:

  • ROW_NUMBER, LAG and LEAD always work as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

  • DENSE_RANK, RANK, PERCENT_RANK and CUME_DIST always work as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

  • FIRST_VALUE, LAST_VALUE and NTH_VALUE respect frames, but the RANGE unit behaviour is identical to ROWS.

Example Using Frame

When the ORDER BY clause is used, but a frame clause is omitted, the default considers the partition up to the current row.When combined with SUM, this results in a running total:

select
  id,
  salary,
  sum(salary) over (order by salary) sum_salary
from employee
order by salary;

Result:

| id | salary | sum_salary |
|---:|-------:|-----------:|
|  3 |   8.00 |       8.00 |
|  4 |   9.00 |      17.00 |
|  1 |  10.00 |      37.00 |
|  5 |  10.00 |      37.00 |
|  2 |  12.00 |      49.00 |

On the other hand, if we apply a frame for the entire partition, we get the total for the entire partition.

select
  id,
  salary,
  sum(salary) over (
    order by salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) sum_salary
from employee
order by salary;

Result:

| id | salary | sum_salary |
|---:|-------:|-----------:|
|  3 |   8.00 |      49.00 |
|  4 |   9.00 |      49.00 |
|  1 |  10.00 |      49.00 |
|  5 |  10.00 |      49.00 |
|  2 |  12.00 |      49.00 |

This example is to demonstrate how this works;the result of this example would be simpler to produce with sum(salary) over().

We can use a range frame to compute the count of employees with salaries between (an employee’s salary - 1) and (their salary + 1) with this query:

select
  id,
  salary,
  count(*) over (
    order by salary
    RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) range_count
from employee
order by salary;

Result:

| id | salary | range_count |
|---:|-------:|------------:|
|  3 |   8.00 |           2 |
|  4 |   9.00 |           4 |
|  1 |  10.00 |           3 |
|  5 |  10.00 |           3 |
|  2 |  12.00 |           1 |

Named Windows

The WINDOW clause can be used to explicitly name a window, for example to avoid repetitive or confusing expressions.

A named window can be used

  1. in the OVER clause to reference a window definition, e.g. OVER window_name

  2. as a base window of another named or inline (OVER) window, if it is not a window with a frame (ROWS or RANGE clauses)

    Note

    A window with a base windows cannot have PARTITION BY, nor override the ordering (ORDER BY) of a base window.

Ranking Functions

The ranking functions compute the ordinal rank of a row within the window partition.

These functions can be used with or without partitioning and ordering.However, using them without ordering almost never makes sense.

The ranking functions can be used to create different type of counters.Consider SUM(1) OVER (ORDER BY SALARY) as an example of what they can do, each of them differently.Following is an example query, also comparing with the SUM behavior.

select
    id,
    salary,
    dense_rank() over (order by salary),
    rank() over (order by salary),
    row_number() over (order by salary),
    sum(1) over (order by salary)
  from employee
  order by salary;
Results
id  salary  dense_rank  rank  row_number  sum
--  ------  ----------  ----  ----------  ---
 3    8.00           1     1           1    1
 4    9.00           2     2           2    2
 1   10.00           3     3           3    4
 5   10.00           3     3           4    4
 2   12.00           4     5           5    5

The difference between DENSE_RANK and RANK is that there is a gap related to duplicate rows (relative to the window ordering) only in RANK.DENSE_RANK continues assigning sequential numbers after the duplicate salary.On the other hand, ROW_NUMBER always assigns sequential numbers, even when there are duplicate values.

CUME_DIST()

Relative rank (or, cumulative distribution) of a row within a window partition

Result type

DOUBLE PRECISION

Syntax
CUME_DIST () OVER <window_name_or_spec>

CUME_DIST is calculated as the number of rows preceding or peer of the current row divided by the number of rows in the partition.

In other words, CUME_DIST() OVER <window_name_or_spec> is equivalent to COUNT(*) OVER <window_name_or_spec> / COUNT(*) OVER()

CUME_DIST Examples

select
  id,
  salary,
  cume_dist() over (order by salary)
from employee
order by salary;
Result
id salary cume_dist
-- ------ ---------
 3   8.00       0.2
 4   9.00       0.4
 1  10.00       0.8
 5  10.00       0.8
 2  12.00         1

DENSE_RANK()

Rank of rows in a partition without gaps

Result type

BIGINT

Syntax
DENSE_RANK () OVER <window_name_or_spec>

Rows with the same window_order values get the same rank within the partition window_partition, if specified.The dense rank of a row is equal to the number of different rank values in the partition preceding the current row, plus one.

DENSE_RANK Examples

select
  id,
  salary,
  dense_rank() over (order by salary)
from employee
order by salary;
Result
id salary dense_rank
-- ------ ----------
 3  8.00           1
 4  9.00           2
 1 10.00           3
 5 10.00           3
 2 12.00           4

NTILE()

Distributes the rows of the current window partition into the specified number of tiles (groups)

Result type

BIGINT

Syntax
NTILE ( number_of_tiles ) OVER <window_name_or_spec>
Table 1. Arguments of NTILE
Argument Description

number_of_tiles

Number of tiles (groups).Restricted to a positive integer literal, a named parameter (PSQL), or a positional parameter (DSQL).

NTILE Examples

select
  id,
  salary,
  rank() over (order by salary),
  ntile(3) over (order by salary)
from employee
order by salary;
Result
ID SALARY RANK NTILE
== ====== ==== =====
 3   8.00    1     1
 4   9.00    2     1
 1  10.00    3     2
 5  10.00    3     2
 2  12.00    5     3

PERCENT_RANK()

Relative rank of a row within a window partition.

Result type

DOUBLE PRECISION

Syntax
PERCENT_RANK () OVER <window_name_or_spec>

PERCENT_RANK is calculated as the [fblangref50-windowfuncs-rank] minus 1 of the current row divided by the number of rows in the partition minus 1.

In other words, PERCENT_RANK() OVER <window_name_or_spec> is equivalent to (RANK() OVER <window_name_or_spec> - 1) / CAST(COUNT(*) OVER() - 1 AS DOUBLE PRECISION)

PERCENT_RANK Examples

select
  id,
  salary,
  rank() over (order by salary),
  percent_rank() over (order by salary)
from employee
order by salary;
Result
id salary rank percent_rank
-- ------ ---- ------------
 3   8.00    1            0
 4   9.00    2         0.25
 1  10.00    3          0.5
 5  10.00    3          0.5
 2  12.00    5            1

RANK()

Rank of each row in a partition

Result type

BIGINT

Syntax
RANK () OVER <window_name_or_spec>

Rows with the same values of window-order get the same rank with in the partition window-partition, if specified.The rank of a row is equal to the number of rank values in the partition preceding the current row, plus one.

RANK Examples

select
  id,
  salary,
  rank() over (order by salary)
from employee
order by salary;
Result
id salary rank
-- ------ ----
 3  8.00     1
 4  9.00     2
 1 10.00     3
 5 10.00     3
 2 12.00     5

ROW_NUMBER()

Sequential row number in the partition

Result type

BIGINT

Syntax
ROW_NUMBER () OVER <window_name_or_spec>

Returns the sequential row number in the partition, where 1 is the first row in each of the partitions.

ROW_NUMBER Examples

select
  id,
  salary,
  row_number() over (order by salary)
from employee
order by salary;
Result
id salary rank
-- ------ ----
 3  8.00     1
 4  9.00     2
 1 10.00     3
 5 10.00     4
 2 12.00     5

Navigational Functions

The navigational functions get the simple (non-aggregated) value of an expression from another row of the query, within the same partition.

Important

FIRST_VALUE, LAST_VALUE and NTH_VALUE also operate on a window frame.For navigational functions, Firebird applies a default frame from the first to the current row of the partition, not to the last.In other words, it behaves as if the following frame is specified:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

This is likely to produce strange or unexpected results for NTH_VALUE and especially LAST_VALUE, so make sure to specify an explicit frame with these functions.

Example of Navigational Functions
select
    id,
    salary,
    first_value(salary) over (order by salary),
    last_value(salary) over (order by salary),
    nth_value(salary, 2) over (order by salary),
    lag(salary) over (order by salary),
    lead(salary) over (order by salary)
  from employee
  order by salary;
Results
id  salary  first_value  last_value  nth_value     lag    lead
--  ------  -----------  ----------  ---------  ------  ------
3     8.00         8.00        8.00     <null>  <null>    9.00
4     9.00         8.00        9.00       9.00    8.00   10.00
1    10.00         8.00       10.00       9.00    9.00   10.00
5    10.00         8.00       10.00       9.00   10.00   12.00
2    12.00         8.00       12.00       9.00   10.00  <null>

FIRST_VALUE()

First value of the current partition

Result type

The same as type as expr

Syntax
FIRST_VALUE ( <expr> ) OVER <window_name_or_spec>
Table 1. Arguments of FIRST_VALUE
Argument Description

expr

Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

LAG()

Value from row in the current partition with a given offset before the current row

Result type

The same as type as expr

Syntax
LAG ( <expr> [, <offset [, <default>]])
  OVER <window_name_or_spec>
Table 1. Arguments of LAG
Argument Description

expr

Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

offset

The offset in rows before the current row to get the value identified by expr.If offset is not specified, the default is 1.offset can be a column, subquery or other expression that results in a positive integer value, or another type that can be implicitly converted to BIGINT.offset cannot be negative (use LEAD instead).

default

The default value to return if offset points outside the partition.Default is NULL.

The LAG function provides access to the row in the current partition with a given offset before the current row.

If offset points outside the current partition, default will be returned, or NULL if no default was specified.

LAG Examples

Suppose you have RATE table that stores the exchange rate for each day.To trace the change of the exchange rate over the past five days you can use the following query.

select
  bydate,
  cost,
  cost - lag(cost) over (order by bydate) as change,
  100 * (cost - lag(cost) over (order by bydate)) /
    lag(cost) over (order by bydate) as percent_change
from rate
where bydate between dateadd(-4 day to current_date)
and current_date
order by bydate
Result
bydate     cost   change percent_change
---------- ------ ------ --------------
27.10.2014  31.00 <null>         <null>
28.10.2014  31.53   0.53         1.7096
29.10.2014  31.40  -0.13        -0.4123
30.10.2014  31.67   0.27         0.8598
31.10.2014  32.00   0.33         1.0419

LAST_VALUE()

Last value from the current partition

Result type

The same as type as expr

Syntax
LAST_VALUE ( <expr> ) OVER <window_name_or_spec>
Table 1. Arguments of LAST_VALUE
Argument Description

expr

Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

LEAD()

Value from a row in the current partition with a given offset after the current row

Result type

The same as type as expr

Syntax
LEAD ( <expr> [, <offset [, <default>]])
  OVER <window_name_or_spec>
Table 1. Arguments of LEAD
Argument Description

expr

Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

offset

The offset in rows after the current row to get the value identified by expr.If offset is not specified, the default is 1.offset can be a column, subquery or other expression that results in a positive integer value, or another type that can be implicitly converted to BIGINT.offset cannot be negative (use LAG instead).

default

The default value to return if offset points outside the partition.Default is NULL.

The LEAD function provides access to the row in the current partition with a given offset after the current row.

If offset points outside the current partition, default will be returned, or NULL if no default was specified.

NTH_VALUE()

The Nth value starting from the first or the last row of the current frame

Result type

The same as type as expr

Syntax
NTH_VALUE ( <expr>, <offset> )
  [FROM {FIRST | LAST}]
  OVER <window_name_or_spec>
Table 1. Arguments of NTH_VALUE
Argument Description

expr

Expression.May contain a table column, constant, variable, expression, scalar function.Aggregate functions are not allowed as an expression.

offset

The offset in rows from the start (FROM FIRST), or the last (FROM LAST) to get the value identified by expr.offset can be a column, subquery or other expression that results in a positive integer value, or another type that can be implicitly converted to BIGINT.offset cannot be zero or negative.

The NTH_VALUE function returns the Nth value starting from the first (FROM FIRST) or the last (FROM LAST) row of the current frame, see also note on frame for navigational functions.Offset 1 with FROM FIRST is equivalent to FIRST_VALUE, and offset 1 with FROM LAST is equivalent to LAST_VALUE.

Aggregate Functions Inside Window Specification

It is possible to use aggregate functions (but not window functions) inside the OVER clause.In that case, first the aggregate function is applied to determine the windows, and only then the window functions are applied on those windows.

Note

When using aggregate functions inside OVER, all columns not used in aggregate functions must be specified in the GROUP BY clause of the SELECT.

Using an Aggregate Function in a Window Specification
select
  code_employee_group,
  avg(salary) as avg_salary,
  rank() over (order by avg(salary)) as salary_rank
from employee
group by code_employee_group