Firebird Null Guide | Aggregate Functions

Aggregate functions

The aggregate functions — COUNT, SUM, AVG, MAX, MIN and LIST — don’t handle NULL in the same way as ordinary functions and operators.Instead of returning NULL as soon as a NULL operand is encountered, they only take non-NULL fields into consideration while computing the outcome.That is, if you have this table:

MyTable
ID	Name	Amount
1	John	37
2	Jack	`NULL`
3	Jim	5
4	Joe	12
5	Josh	`NULL`

...the statement select sum(Amount) from MyTable returns 54, which is 37 + 5 + 12.Had all five fields been summed, the result would have been NULL.For AVG, the non-NULL fields are summed and the sum divided by the number of non-NULL fields.

There is one exception to this rule: COUNT(*) returns the count of all rows, even rows whose fields are all NULL.But COUNT(FieldName) behaves like the other aggregate functions in that it only counts rows where the specified field is not NULL.

Another thing worth knowing is that COUNT(*) and COUNT(FieldName) never return NULL: if there are no rows in the set, both functions return 0.COUNT(FieldName) also returns 0 if all FieldName fields in the set are NULL.The other aggregate functions return NULL in such cases.Be warned that SUM even returns NULL if used on an empty set, which is contrary to common logic (if there are no rows, the average, maximum and minimum are undefined, but the sum is known to be zero).

Now let’s put all that knowledge in a table for your easy reference:

Table 1. Aggregate function results with different column states
Function	Results
Function	Empty set	All-`NULL` set or column	Other sets or columns
`COUNT(*)`	0	Total number of rows	Total number of rows
`COUNT(Field)`	0	0	Number of rows where `Field` is not `NULL`
`MAX`, `MIN`	`NULL`	`NULL`	Max or min value found in the column
`SUM`	`NULL`	`NULL`	Sum of non-`NULL` values in the column
`AVG`	`NULL`	`NULL`	Average of non-`NULL` values in the column.This equals `SUM(Field) / COUNT(Field)`.^[1]
`LIST`^[2]	`NULL`	`NULL`	Comma-separated string concatenation of non-`NULL` values in the column

1. If Field is of an integer type, AVG is always rounded towards 0. For instance, 6 non-null INT records with a sum of -11 yield an average of -1, not -2.

2. LIST was added in Firebird 2.1

The `GROUP BY` clause

A GROUP BY clause doesn’t change the aggregate function logic described above, except that it is now applied to each group individually rather than to the result set as a whole.Suppose you have a table Employee, with fields Dept and Salary which both allow NULLs, and you run this query:

SELECT Dept, SUM(Salary) FROM Employee GROUP BY Dept

The result may look like this (the row where Dept is <null> may be at the top or bottom, depending on your Firebird version):

DEPT                     SUM

=====================

<null> 219465.19000 266643.00100 155262.50110 130442.81115 13480000.00120 <null>121 110000.00123 390500.00

First notice that the people whose department is unknown (`NULL`) are grouped together, although you can't say that they have the same _value_ in the Dept field.
But the alternative would have been to give each of those records a "`group`" of their own.
Not only would this possibly add a huge number of lines to the output, but it would also defeat the purpose of __group__ing: those lines wouldn't be aggregates, but simple "```SELECT Dept, Salary```" rows.
So it makes sense to group the `NULL` depts by their state and the rest by their value.

Anyway, the `Dept` field is not what interests us most.
What does the aggregate `SUM` column tell us?
That all salaries are non-`NULL`, except in department 120?
No.
All we can say is that in every department except 120, there is at least one employee with a known salary in the database.
Each department _may_ contain `NULL` salaries;
in dept. 120 _all_ the salaries are `NULL`.

You can find out more by throwing in one or more `COUNT()` columns.
For instance, if you want to know the number of `NULL` salaries in each group, add a column "```COUNT({asterisk}) – COUNT(Salary)```".

Counting frequencies

A GROUP BY clause can be used to report the frequencies with which values occur in a table.In that case you use the same field name several times in the query statement.Let’s say you have a table TT with a column A whose contents are { 3, 8, NULL, 6, 8, -1, NULL, 3, 1 }.To get a frequencies report, you could use:

SELECT A, COUNT(A) FROM TT GROUP BY A

which would give you this result:

A            COUNT
============ ============
          -1            1
           1            1
           3            2
           6            1
           8            2
      <null>            0

Oops — something went wrong with the NULL count, but what? Remember that COUNT(FieldName) skips all NULL fields, so with COUNT(A) the count of the <null> group can only ever be 0.Reformulate your query like this:

SELECT A, COUNT(*) FROM TT GROUP BY A

and the correct value will be returned (in casu 2).

The `HAVING` clause

HAVING clauses can place extra restrictions on the output rows of an aggregate query — just like WHERE clauses do in record-by-record queries.A HAVING clause can impose conditions on any output column or combination of columns, aggregate or not.

As far as NULL is concerned, the following two facts are worth knowing (and hardly surprising, I would guess):

Rows for which the HAVING condition evaluates to NULL won’t be included in the result set.(“Only true is good enough.”)
“HAVING <col> IS [NOT] NULL” is a legal and often useful condition, whether <col> is aggregate or not.(But if <col> is non-aggregate, you may save the engine some work by changing HAVING to WHERE and placing the condition before the “GROUP BY” clause.This goes for any condition on non-aggregate columns.)

For instance, adding the following clause to the example query from the “GROUP BY” paragraph:

...HAVING Dept IS NOT NULL

will prevent the first row from being output, whereas this one:

...HAVING SUM(Salary) IS NOT NULL

suppresses the sixth row (the one with Dept = 120).

Conditional statements and loops

`IF` statements

If the test expression of an IF statement resolves to NULL, the THEN clause is skipped and the ELSE clause — if present — executed.In other words, NULL and false have the same effect in this context.So in situations where you would logically expect false but NULL is returned, no harm will be done.However, we’ve already seen examples of NULL being returned where you would expect true, and that does affect the flow of the code!

Below are some examples of the seemingly paradoxical (but perfectly correct) results you can get if NULLs creep into your IF statements.

Tip	If you use Firebird 2 or higher, you can avoid all the pitfalls discussed here, simply by using `[NOT] DISTINCT` instead of the ‘`=`’ and “`<>`” operators!

Equals (‘=’)
```
if (a = b) then
  MyVariable = 'Equal';
else
  MyVariable = 'Not equal';
```
If a and b are both NULL, MyVariable will yet be “Not equal” after executing this code.The reason is that the expression “a = b” yields NULL if at least one of them is NULL.With a NULL test expression, the THEN block is skipped and the ELSE block executed.
Not equals (‘<>’)
```
if (a <> b) then
  MyVariable = 'Not equal';
else
  MyVariable = 'Equal';
```
Here, MyVariable will be “Equal” if a is NULL and b isn’t, or vice versa.The explanation is analogous to that of the previous example.

So how should you set up equality tests that do give the logical result under all circumstances, even with NULL operands?In Firebird 2 you can use DISTINCT, as already shown (see Testing DISTINCTness). With earlier versions, you’ll have to write some more code.This is discussed in the section [nullguide-testing-equality], later on in this guide.For now, just remember that you have to be very careful with IF conditions that may resolve to NULL.

Another aspect you shouldn’t forget is the following: a NULL test expression may behave like false in an IF condition, but it doesn’t have the value false.It’s still NULL, and that means that its inverse will also be NULL — not “true”.As a consequence, inverting the test expression and swapping the THEN and ELSE blocks may change the behaviour of the IF statement.In binary logic, where only true and false can occur, such a thing could never happen.

To illustrate this, let’s refactor the last example:

Not not equals (“not (.. <> ..)”)
```
if (not (a <> b)) then
  MyVariable = 'Equal';
else
  MyVariable = 'Not equal';
```
In the original version, if one operand was NULL and the other wasn’t (so they were intuitively unequal), the result was “Equal”.Here, it’s “Not equal”.The explanation: one operand is NULL, therefore “a <> b” is NULL, therefore “not(a <> b)” is NULL, therefore ELSE is executed.While this result is correct where the original had it wrong, there’s no reason to rejoice: in the refactored version, the result is also “Not equal” if both operands are NULL — something that the original version “got right”.

Of course, as long as no operand in the test expression can ever be NULL, you can happily formulate your IF statements like above.Also, refactoring by inverting the test expression and swapping the THEN and ELSE blocks will always preserve the functionality, regardless of the complexity of the expressions — as long as they aren’t NULL.What’s especially treacherous is when the operands are almost always non-NULL, so in the vast majority of cases the results will be correct.In such a situation those rare NULL cases may go unnoticed for a long time, silently corrupting your data.

`CASE` expression

Firebird introduced the CASE construct in version 1.5, with two syntactic variants.The first one is called the simple syntax:

case <expression>
  when <exp1> then <result1>
  when <exp2> then <result2>
  ...
  [else <defaultresult>]
end

This one works more or less like a Pascal case or a C switch construct: <expression> is compared to <exp1>, <exp2> etc., until a match is found, in which case the corresponding result is returned.If there is no match and there is an ELSE clause, <defaultresult> is returned.If there is no match and no ELSE clause, NULL is returned.

It is important to know that the comparisons are done with the ‘=’ operator, so a null <expression> will not match a null <expN>.If <expression> is NULL, the only way to get a non-NULL result is via the ELSE clause.

It is OK to specify NULL (or any other valid NULL expression) as a result.

The second, or searched syntax is:

case
  when <condition1> then <result1>
  when <condition2> then <result2>
  ...
  [else <defaultresult>]
end

Here, the <conditionN>s are tests that give a ternary boolean result: true, false, or NULL.Once again, only true is good enough, so a condition like “A = 3” — or even “A = null” — is not satisfied when A is NULL.Remember though that “IS [NOT] NULL” never returns NULL: if A is NULL, the condition “A is null” returns true and the corresponding <resultN> will be returned.In Firebird 2+ you can also use “IS [NOT] DISTINCT FROM” in your conditions — this operator too will never return NULL.

`WHILE` loops

When evaluating the condition of a WHILE loop, NULL has the same effect as in an IF statement: if the condition resolves to NULL, the loop is not (re)entered — just as if it were false.Again, watch out with inversion using NOT: a condition like

while ( Counter > 12 ) do

will skip the loop block if Counter is NULL, which is probably what you want, but:

while ( not Counter > 12 ) do

will also skip if Counter is NULL.Maybe this is also exactly what you want — just be aware that these seemingly complementary tests both exclude NULL counters.

`FOR` loops

To avoid any possible confusion, let us emphasise here that FOR loops in Firebird PSQL have a totally different function than WHILE loops, or for loops in general programming languages.Firebird FOR loops have the form:

for <select-statement> into <var-list> do <code-block>

and they will keep executing the code block until all the rows from the result set have been retrieved, unless an exception occurs or a BREAK, LEAVE or EXIT statement is encountered.Fetching a NULL, or even row after row filled with NULLs, does not terminate the loop!

Aggregate Functions

Aggregate functions

The GROUP BY clause

=====================

Counting frequencies

The HAVING clause

Conditional statements and loops

IF statements

CASE expression

WHILE loops