FirebirdSQL logo

UTF8 Collations

The following table shows the possible collations for the UTF8 character set.

Table 1. Collations for Character Set UTF8
Collation Characteristics

UCS_BASIC

Collation works according to the position of the character in the table (binary).

UNICODE

Collation works according to the UCA algorithm (Unicode Collation Algorithm) (alphabetical).

UTF8

The default, binary collation, identical to UCS_BASIC, which was added for SQL compatibility

UNICODE_CI

Case-insensitive collation, works without taking character case into account.

UNICODE_CI_AI

Case-insensitive, accent-insensitive collation, works alphabetically without taking character case or accents into account.

Example

An example of collation for the UTF8 character set without taking into account the case or accentuation of characters (similar to COLLATE PXW_CYRL in the earlier example).

...
ORDER BY NAME COLLATE UNICODE_CI_AI

Character Indexes

The maximum length for an index key equals one quarter of the page size, i.e. from 1,024 — for page size 4,096 — to 8,192 bytes — for page size 32,768.The maximum length of an indexed string is 9 bytes less than that quarter-page limit.

Calculating Maximum Length of an Indexed String Field

The following formula calculates the maximum length of an indexed string (in characters):

max_char_length = FLOOR((page_size / 4 - 9) / N)

where N is the number of bytes per character in the character set.

The table below shows the maximum length of an indexed string (in characters), according to page size and character set, calculated using this formula.

Table 1. Maximum Index Lengths by Page Size and Character Size

Page Size

Bytes per character

1

2

3

4

6

4,096

1,015

507

338

253

169

8,192

2,039

1,019

679

509

339

16,384

4,087

2,043

1,362

1,021

681

32,768

8,183

4,091

2,727

2,045

1,363

Note

With case-insensitive collations (“_CI”), one character in the index key will occupy not 4, but 6 (six) bytes, so the maximum key length for a page of — for example — 4,096 bytes, will be 169 characters.