FirebirdSQL logo

Partitioning

Like aggregate functions, that may operate alone or in relation to a group, window functions may also operate on a group, which is called a “partition”.

Syntax
<window function>(...) OVER (PARTITION BY <expr> [, <expr> ...])

Aggregation over a group could produce more than one row, so the result set generated by a partition is joined with the main query using the same expression list as the partition.

Continuing the EMPLOYEE example, instead of getting the portion of each employee’s salary over the all-employees total, we would like to get the portion based on the employees in the same department:

select
    id,
    department,
    salary,
    salary / sum(salary) OVER (PARTITION BY department) portion
  from employee
  order by id;
Results
id  department  salary  portion
--  ----------  ------  ----------
1   R & D        10.00      0.3448
2   SALES        12.00      0.6000
3   SALES         8.00      0.4000
4   R & D         9.00      0.3103
5   R & D        10.00      0.3448

Ordering

The ORDER BY sub-clause can be used with or without partitions.The ORDER BY clause within OVER specifies the order in which the window function will process rows.This order does not have to be the same as the order rows appear in the output.

There is an important concept associated with window functions: for each row there is a set of rows in its partition called the window frame.By default, when specifying ORDER BY, the frame consists of all rows from the beginning of the partition to the current row and rows equal to the current ORDER BY expression.Without ORDER BY, the default frame consists of all rows in the partition.

As a result, for standard aggregate functions, the ORDER BY clause produces partial aggregation results as rows are processed.

Example
select
    id,
    salary,
    sum(salary) over (order by salary) cumul_salary
  from employee
  order by salary;
Results
id  salary  cumul_salary
--  ------  ------------
3     8.00          8.00
4     9.00         17.00
1    10.00         37.00
5    10.00         37.00
2    12.00         49.00

Then cumul_salary returns the partial/accumulated (or running) aggregation (of the SUM function).It may appear strange that 37.00 is repeated for the ids 1 and 5, but that is how it should work.The ORDER BY keys are grouped together, and the aggregation is computed once (but summing the two 10.00).To avoid this, you can add the ID field to the end of the ORDER BY clause.

It’s possible to use multiple windows with different orders, and ORDER BY parts like ASC/DESC and NULLS FIRST/LAST.

With a partition, ORDER BY works the same way, but at each partition boundary the aggregation is reset.

All aggregation functions can use ORDER BY, except for LIST().

Window Frames

A window frame specifies which rows to consider for the current row when evaluating the window function.

The frame comprises three pieces: unit, start bound, and end bound.The unit can be RANGE or ROWS, which defines how the bounds will work.

The bounds are:

  • UNBOUNDED PRECEDING

  • <expr> PRECEDING

  • CURRENT ROW

  • <expr> FOLLOWING

  • UNBOUNDED FOLLOWING

  • With RANGE, the ORDER BY should specify exactly one expression, and that expression should be of a numeric, date, time, or timestamp type.For <expr> PRECEDING, expr is subtracted from the ORDER BY expression, and for <expr> FOLLOWING, expr is added.For CURRENT ROW, the expression is used as-is.

    All rows inside the current partition that are between the bounds are considered part of the resulting window frame.

  • With ROWS, ORDER BY expressions are not limited by number or type.For this unit, <expr> PRECEDING and <expr FOLLOWING relate to the row position within the current partition, and not the values of the ordering keys.

Both UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING work identical with RANGE and ROWS.UNBOUNDED PRECEDING start at the first row of the current partition, and UNBOUNDED FOLLOWING ends at the last row of the current partition.

The frame syntax with <window-frame-start> specifies the start-frame, with the end-frame being CURRENT ROW.

Some window functions discard frames:

  • ROW_NUMBER, LAG and LEAD always work as ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

  • DENSE_RANK, RANK, PERCENT_RANK and CUME_DIST always work as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

  • FIRST_VALUE, LAST_VALUE and NTH_VALUE respect frames, but the RANGE unit behaviour is identical to ROWS.

Example Using Frame

When the ORDER BY clause is used, but a frame clause is omitted, the default considers the partition up to the current row.When combined with SUM, this results in a running total:

select
  id,
  salary,
  sum(salary) over (order by salary) sum_salary
from employee
order by salary;

Result:

| id | salary | sum_salary |
|---:|-------:|-----------:|
|  3 |   8.00 |       8.00 |
|  4 |   9.00 |      17.00 |
|  1 |  10.00 |      37.00 |
|  5 |  10.00 |      37.00 |
|  2 |  12.00 |      49.00 |

On the other hand, if we apply a frame for the entire partition, we get the total for the entire partition.

select
  id,
  salary,
  sum(salary) over (
    order by salary
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) sum_salary
from employee
order by salary;

Result:

| id | salary | sum_salary |
|---:|-------:|-----------:|
|  3 |   8.00 |      49.00 |
|  4 |   9.00 |      49.00 |
|  1 |  10.00 |      49.00 |
|  5 |  10.00 |      49.00 |
|  2 |  12.00 |      49.00 |

This example is to demonstrate how this works;the result of this example would be simpler to produce with sum(salary) over().

We can use a range frame to compute the count of employees with salaries between (an employee’s salary - 1) and (their salary + 1) with this query:

select
  id,
  salary,
  count(*) over (
    order by salary
    RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) range_count
from employee
order by salary;

Result:

| id | salary | range_count |
|---:|-------:|------------:|
|  3 |   8.00 |           2 |
|  4 |   9.00 |           4 |
|  1 |  10.00 |           3 |
|  5 |  10.00 |           3 |
|  2 |  12.00 |           1 |

Named Windows

The WINDOW clause can be used to explicitly name a window, for example to avoid repetitive or confusing expressions.

A named window can be used

  1. in the OVER clause to reference a window definition, e.g. OVER window_name

  2. as a base window of another named or inline (OVER) window, if it is not a window with a frame (ROWS or RANGE clauses)

    Note

    A window with a base windows cannot have PARTITION BY, nor override the ordering (ORDER BY) of a base window.