Building Regular Expressions
In this section are the elements and rules for building SQL regular expressions.
In this section are the elements and rules for building SQL regular expressions.
Within regular expressions, most characters represent themselves.The only exceptions are the special characters below:
[ ] ( ) | ^ - + * % _ ? { }
... and the escape character, if it is defined.
A regular expression that contains no special characters or escape characters matches only strings that are identical to itself (subject to the collation in use).That is, it functions just like the ‘=
’ operator:
'Apple' similar to 'Apple' -- true
'Apples' similar to 'Apple' -- false
'Apple' similar to 'Apples' -- false
'APPLE' similar to 'Apple' -- depends on collation
The known SQL wildcards ‘_
’ and ‘%
’ match any single character and a string of any length, respectively:
'Birne' similar to 'B_rne' -- true
'Birne' similar to 'B_ne' -- false
'Birne' similar to 'B%ne' -- true
'Birne' similar to 'Bir%ne%' -- true
'Birne' similar to 'Birr%ne' -- false
Notice how ‘%
’ also matches the empty string.
A bunch of characters enclosed in brackets define a character class.A character in the string matches a class in the pattern if the character is a member of the class:
'Citroen' similar to 'Cit[arju]oen' -- true
'Citroen' similar to 'Ci[tr]oen' -- false
'Citroen' similar to 'Ci[tr][tr]oen' -- true
As can be seen from the second line, the class only matches a single character, not a sequence.
Within a class definition, two characters connected by a hyphen define a range.A range comprises the two endpoints and all the characters that lie between them in the active collation.Ranges can be placed anywhere in the class definition without special delimiters to keep them apart from the other elements.
'Datte' similar to 'Dat[q-u]e' -- true
'Datte' similar to 'Dat[abq-uy]e' -- true
'Datte' similar to 'Dat[bcg-km-pwz]e' -- false
The following predefined character classes can also be used in a class definition:
[:ALPHA:]
Latin letters a..z and A..Z.With an accent-insensitive collation, this class also matches accented forms of these characters.
[:DIGIT:]
Decimal digits 0..9.
[:ALNUM:]
Union of [:ALPHA:]
and [:DIGIT:]
.
[:UPPER:]
Uppercase Latin letters A..Z.Also matches lowercase with case-insensitive collation and accented forms with accent-insensitive collation.
[:LOWER:]
Lowercase Latin letters a..z.Also matches uppercase with case-insensitive collation and accented forms with accent-insensitive collation.
[:SPACE:]
Matches the space character (ASCII 32).
[:WHITESPACE:]
Matches horizontal tab (ASCII 9), linefeed (ASCII 10), vertical tab (ASCII 11), formfeed (ASCII 12), carriage return (ASCII 13) and space (ASCII 32).
Including a predefined class has the same effect as including all its members.Predefined classes are only allowed within class definitions.If you need to match against a predefined class and nothing more, place an extra pair of brackets around it.
'Erdbeere' similar to 'Erd[[:ALNUM:]]eere' -- true
'Erdbeere' similar to 'Erd[[:DIGIT:]]eere' -- false
'Erdbeere' similar to 'Erd[a[:SPACE:]b]eere' -- true
'Erdbeere' similar to [[:ALPHA:]] -- false
'E' similar to [[:ALPHA:]] -- true
If a class definition starts with a caret, everything that follows is excluded from the class.All other characters match:
'Framboise' similar to 'Fra[^ck-p]boise' -- false
'Framboise' similar to 'Fr[^a][^a]boise' -- false
'Framboise' similar to 'Fra[^[:DIGIT:]]boise' -- true
If the caret is not placed at the start of the sequence, the class contains everything before the caret, except for the elements that also occur after the caret:
'Grapefruit' similar to 'Grap[a-m^f-i]fruit' -- true
'Grapefruit' similar to 'Grap[abc^xyz]fruit' -- false
'Grapefruit' similar to 'Grap[abc^de]fruit' -- false
'Grapefruit' similar to 'Grap[abe^de]fruit' -- false
'3' similar to '[[:DIGIT:]^4-8]' -- true
'6' similar to '[[:DIGIT:]^4-8]' -- false
Lastly, the already mentioned wildcard ‘_
’ is a character class of its own, matching any single character.
A question mark (‘?
’) immediately following a character or class indicates that the preceding item may occur 0 or 1 times to match:
'Hallon' similar to 'Hal?on' -- false
'Hallon' similar to 'Hal?lon' -- true
'Hallon' similar to 'Halll?on' -- true
'Hallon' similar to 'Hallll?on' -- false
'Hallon' similar to 'Halx?lon' -- true
'Hallon' similar to 'H[a-c]?llon[x-z]?' -- true
An asterisk (‘*
’) immediately following a character or class indicates that the preceding item may occur 0 or more times to match:
'Icaque' similar to 'Ica*que' -- true
'Icaque' similar to 'Icar*que' -- true
'Icaque' similar to 'I[a-c]*que' -- true
'Icaque' similar to '_*' -- true
'Icaque' similar to '[[:ALPHA:]]*' -- true
'Icaque' similar to 'Ica[xyz]*e' -- false
A plus sign (‘+
’) immediately following a character or class indicates that the preceding item must occur 1 or more times to match:
'Jujube' similar to 'Ju_+' -- true
'Jujube' similar to 'Ju+jube' -- true
'Jujube' similar to 'Jujuber+' -- false
'Jujube' similar to 'J[jux]+be' -- true
'Jujube' sililar to 'J[[:DIGIT:]]+ujube' -- false
If a character or class is followed by a number enclosed in braces (‘{
’ and ‘}
’), it must be repeated exactly that number of times to match:
'Kiwi' similar to 'Ki{2}wi' -- false
'Kiwi' similar to 'K[ipw]{2}i' -- true
'Kiwi' similar to 'K[ipw]{2}' -- false
'Kiwi' similar to 'K[ipw]{3}' -- true
If the number is followed by a comma (‘,
’), the item must be repeated at least that number of times to match:
'Limone' similar to 'Li{2,}mone' -- false
'Limone' similar to 'Li{1,}mone' -- true
'Limone' similar to 'Li[nezom]{2,}' -- true
If the braces contain two numbers separated by a comma, the second number not smaller than the first, then the item must be repeated at least the first number and at most the second number of times to match:
'Mandarijn' similar to 'M[a-p]{2,5}rijn' -- true
'Mandarijn' similar to 'M[a-p]{2,3}rijn' -- false
'Mandarijn' similar to 'M[a-p]{2,3}arijn' -- true
The quantifiers ‘?
’, ‘*
’ and ‘+
’ are shorthand for {0,1}
, {0,}
and {1,}
, respectively.
Regular expression terms can be OR’ed with the ‘|
’ operator.A match is made when the argument string matches at least one of the terms:
'Nektarin' similar to 'Nek|tarin' -- false
'Nektarin' similar to 'Nektarin|Persika' -- true
'Nektarin' similar to 'M_+|N_+|P_+' -- true
One or more parts of the regular expression can be grouped into subexpressions (also called subpatterns) by placing them between parentheses (‘(
’ and ‘)
’).A subexpression is a regular expression in its own right.It can contain all the elements allowed in a regular expression, and can also have quantifiers added to it.
'Orange' similar to 'O(ra|ri|ro)nge' -- true
'Orange' similar to 'O(r[a-e])+nge' -- true
'Orange' similar to 'O(ra){2,4}nge' -- false
'Orange' similar to 'O(r(an|in)g|rong)?e' -- true
IS [NOT] DISTINCT FROM
<operand1> IS [NOT] DISTINCT FROM <operand2>
Two operands are considered DISTINCT (different) if they have a different value or if one of them is NULL
and the other non-null.They are considered NOT DISTINCT (equal) if they have the same value or if both of them are NULL
.
IS [NOT] DISTINCT FROM
always returns TRUE or FALSE and never UNKNOWN
(NULL
) (unknown value).Operators ‘=
’ and ‘<>
’, conversely, will return UNKNOWN
(NULL
) if one or both operands are NULL
.
Operand values |
Result of various predicates |
|||
---|---|---|---|---|
|
|
|
|
|
Same value |
|
|
|
|
Different values |
|
|
|
|
Both |
|
|
|
|
One |
|
|
|
|
SELECT ID, NAME, TEACHER
FROM COURSES
WHERE START_DAY IS NOT DISTINCT FROM END_DAY;
-- PSQL fragment
IF (NEW.JOB IS DISTINCT FROM OLD.JOB)
THEN POST_EVENT 'JOB_CHANGED';
IS [NOT]
<value> IS [NOT] { TRUE | FALSE | UNKNOWN }
The IS
predicate with Boolean literal values checks if the expression on the left side matches the Boolean value on the right side.The expression on the left side must be of type BOOLEAN
, otherwise an exception is raised.
The IS [NOT] UNKNOWN
is equivalent to IS [NOT] NULL
.
Note
|
The right side of the predicate only accepts the literals |
-- Checking FALSE value
SELECT * FROM TBOOL WHERE BVAL IS FALSE;
ID BVAL
============= =======
2 <false>
-- Checking UNKNOWN value
SELECT * FROM TBOOL WHERE BVAL IS UNKNOWN;
ID BVAL
============= =======
3 <null>
IS [NOT] NULL
<value> IS [NOT] NULL
Since NULL
is not a value, these operators are not comparison operators.The IS [NOT] NULL
predicate tests that the expression on the left side has a value (IS NOT NULL) or has no value (IS NULL).
Search for sales entries that have no shipment date set for them:
SELECT * FROM SALES
WHERE SHIP_DATE IS NULL;
Note
|
Note regarding the
IS predicatesThe |
This group of predicates includes those that use subqueries to submit values for all kinds of assertions in search conditions.Existential predicates are so called because they use various methods to test for the existence or non-existence of some condition, returning TRUE
if the existence or non-existence is confirmed or FALSE
otherwise.
EXISTS
[NOT] EXISTS (<select_stmt>)
The EXISTS
predicate uses a subquery expression as its argument.It returns TRUE
if the subquery result contains at least one row, otherwise it returns FALSE
.
NOT EXISTS
returns FALSE
if the subquery result contains at least one row, otherwise it returns TRUE
.
Note
|
The subquery can specify multiple columns, or |
Find those employees who have projects.
SELECT *
FROM employee
WHERE EXISTS(SELECT *
FROM employee_project ep
WHERE ep.emp_no = employee.emp_no)
Find those employees who have no projects.
SELECT *
FROM employee
WHERE NOT EXISTS(SELECT *
FROM employee_project ep
WHERE ep.emp_no = employee.emp_no)
IN
<value> [NOT] IN (<select_stmt> | <value_list>) <value_list> ::= <value_1> [, <value_2> ...]
The IN
predicate tests whether the value of the expression on the left side is present in the set of values specified on the right side.The set of values cannot have more than 65535 items.The IN
predicate can be replaced with the following equivalent forms:
(<value> = <value_1> [OR <value> = <value_2> ...]) <value> = { ANY | SOME } (<select_stmt>)
When the IN
predicate is used in the search conditions of DML queries, the Firebird optimizer can use an index on the searched column, if a suitable one exists.Lists that are known to be constant are pre-evaluated as invariants and cached as a binary search tree, making comparisons faster if the condition needs to be tested for many rows or if the value list is long.
In its second form, the IN
predicate tests whether the value of the expression on the left side is present — or not present, if NOT IN
is used — in the result of the subquery on the right side.
The subquery must specify only one column, otherwise the error “count of column list and variable list do not match” will occur.
Queries using an IN
predicate with a subquery can be replaced with a similar query using the EXISTS
predicate.For example, the following query:
SELECT
model, speed, hd
FROM PC
WHERE
model IN (SELECT model
FROM product
WHERE maker = 'A');
can be replaced with a similar one using the EXISTS
predicate:
SELECT
model, speed, hd
FROM PC
WHERE
EXISTS (SELECT *
FROM product
WHERE maker = 'A'
AND product.model = PC.model);
However, a query using NOT IN
with a subquery does not always give the same result as its NOT EXISTS
counterpart.The reason is that EXISTS
always returns TRUE or FALSE, whereas IN
returns NULL
in one of these two cases:
when the test value is NULL
and the IN ()
list is not empty
when the test value has no match in the IN ()
list and at least one list element is NULL
It is in only these two cases that IN ()
will return NULL
while the EXISTS
predicate will return FALSE
(“no matching row found”).In a search or, for example, an IF (…)
statement, both results mean “failure”, and it makes no difference to the outcome.
For the same data, NOT IN ()
will return NULL
, while NOT EXISTS
will return TRUE
, leading to opposite results.
As an example, suppose you have the following query:
-- Looking for people who were not born
-- on the same day as any famous New York citizen
SELECT P1.name AS NAME
FROM Personnel P1
WHERE P1.birthday NOT IN (SELECT C1.birthday
FROM Celebrities C1
WHERE C1.birthcity = 'New York');
Now, assume that the NY celebrities list is not empty and contains at least one NULL birthday.Then for every citizen who does not share his birthday with a NY celebrity, NOT IN
will return NULL
, because that is what IN
does.The search condition is thereby not satisfied and the citizen will be left out of the SELECT
result, which is wrong.
For citizens whose birthday does match with a celebrity’s birthday, NOT IN
will correctly return FALSE
, so they will be left out too, and no rows will be returned.
If the NOT EXISTS
form is used:
-- Looking for people who were not born
-- on the same day as any famous New York citizen
SELECT P1.name AS NAME
FROM Personnel P1
WHERE NOT EXISTS (SELECT *
FROM Celebrities C1
WHERE C1.birthcity = 'New York'
AND C1.birthday = P1.birthday);
non-matches will have a NOT EXISTS
result of TRUE
and their records will be in the result set.
Tip
|
If there is any chance of |
Find employees with the names “Pete”, “Ann” and “Roger”:
SELECT *
FROM EMPLOYEE
WHERE FIRST_NAME IN ('Pete', 'Ann', 'Roger');
Find all computers that have models whose manufacturer starts with the letter “A”:
SELECT
model, speed, hd
FROM PC
WHERE
model IN (SELECT model
FROM product
WHERE maker STARTING WITH 'A');
SINGULAR
[NOT] SINGULAR (<select_stmt>)
The SINGULAR
predicate takes a subquery as its argument and evaluates it as TRUE
if the subquery returns exactly one row, otherwise the predicate is evaluated as FALSE
.The subquery may list several output columns since the rows are not returned anyway, they are only tested for (singular) existence.For brevity, people usually specify ‘SELECT *
’.The SINGULAR
predicate can return only two values: TRUE
or FALSE
.
Find those employees who have only one project.
SELECT *
FROM employee
WHERE SINGULAR(SELECT *
FROM employee_project ep
WHERE ep.emp_no = employee.emp_no)
A quantifier is a logical operator that sets the number of objects for which this condition is true.It is not a numeric quantity, but a logical one that connects the condition with the full set of possible objects.Such predicates are based on logical universal and existential quantifiers that are recognised in formal logic.
In subquery expressions, quantified predicates make it possible to compare separate values with the results of subqueries;they have the following common form:
<value expression> <comparison operator> <quantifier> <subquery>
ALL
<value> <op> ALL (<select_stmt>)
When the ALL
quantifier is used, the predicate is TRUE
if every value returned by the subquery satisfies the condition in the predicate of the main query.
Show only those clients whose ratings are higher than the rating of every client in Paris.
SELECT c1.*
FROM Customers c1
WHERE c1.rating > ALL
(SELECT c2.rating
FROM Customers c2
WHERE c2.city = 'Paris')
Important
|
If the subquery returns an empty set, the predicate is Nevertheless, it aligns perfectly with formal logic: if the set is empty, the predicate is true for every row in the set. |
ANY
and SOME
<value> <op> {ANY | SOME} (<select_stmt>)
The quantifiers ANY
and SOME
are identical in their behaviour.Both are specified in the SQL standard, and they be used interchangeably to improve the readability of operators.When the ANY
or the SOME
quantifier is used, the predicate is TRUE
if any of the values returned by the subquery satisfies the condition in the predicate of the main query.If the subquery returns no rows at all, the predicate is automatically considered as FALSE
.
Show only those clients whose ratings are higher than those of one or more clients in Rome.
SELECT *
FROM Customers
WHERE rating > ANY
(SELECT rating
FROM Customers
WHERE city = 'Rome')