FirebirdSQL logo

Analyse Index Pages Only

The command to analyse only user indices in the database is:

gstat employee -index >employee.gst

The output from this command lists the user tables in alphabetical order.No tables will be analysed;however, the report will list the table names in alphabetical order and will list all applicable indices beneath the appropriate table name.

Once the analysis has been completed, the results can be interpreted as follows.The following example shows the output from a single index in a database.

CONFIGREVISIONSTORE (213)
    Index PK_CONFIGREVISIONSTORE (0)
        Depth: 3, leaf buckets: 174, nodes: 62372
        Average data length: 2.58, total dup: 0, max dup: 0
        Fill distribution:
             0 - 19% = 15
            20 - 39% = 0
            40 - 59% = 55
            60 - 79% = 68
            80 - 99% = 36

The above extract from the report begins by displaying the table name — CONFIGREVISIONSTORE — and the table ID — 213 as described above.

Following the table’s details — and only the name and ID are displayed — the index details are shown.As above, the index name and its ID are displayed.This time, the ID refers to the index’s position in the list of all indices created on the table.ID zero is the first index created, ID 1 is the next and so on.The output from gstat may not list the indices in ID order and if any indices were created but subsequently dropped, there may be gaps in the ID sequence.

The next two lines, after the index name and ID, show the overall statistics for this index.

Depth

This statistic displays the number of pages that have to be accessed to get at an index entry.In this example we have to read three separate pages into the buffer cache before we can use the index details to access the row we want in the table.This is often referred to as index indirection.

Depth: 3

On disk, there is a top level Index Root Page which is created at the same time as the table.This page holds a list of pointers to the top (apex) page for each index — one page per index.For any given index, this page holds a list of pointers to either:

  • intermediate level pages if depth is greater than 1, or,

  • to the leaf pages for the actual index data if depth = 1.

The leaf pages store the location of the data that have been indexed.The index depth is the number of levels you have to step down from the index’s apex page, to get to the leaf pages.Neither the Index Root Page nor the index’s apex page are counted in the depth.

On average, a depth of 2 or less indicates an index that is efficient.If the depth is 3 or more, the index will most likely not be operating at its best.The solution in this situation is to use gbak to increase the database page size by taking a backup and restoring it, as follows:

tux> # Shutdown the database
tux> gfix -shut -tran 60 employee

tux> # Backup the database
tux> gbak -backup employee /backups/employee.fbk

tux> # Find current page size
tux> gstat employee -header | grep -i "page size"
     page size             4096

tux> # Restore database with a bigger page size
tux> gbak -recreate overwrite -page 8192 /backups/employee.fbk employee

tux> # Check new page size
tux gstat employee -header | grep -i "page size"
     page size             8192

tux> #Open the database
tux> gfix -online normal employee

Once the above has been carried out, you should find that the depth of the index is 2 or less.If this is not the case, simply repeat the process above using an even bigger page size.

Warning

The above command to restore the backup overwrites the original database file.This works by deleting the original file and recreating it, so you really need to be sure that your database backup actually works and that the backup file produced is usable before attempting to overwrite a database.See thegbak manualfor more details.

Leaf buckets

This statistic informs us of the number of leaf pages that this particular index uses.A page and a bucket are synonymous but page tends to be the more modern term in wide use.

leaf buckets: 174

In our example index, we see that there are 174 pages in the database holding the details of the indexed values for this table — all of these pages contain pointers to the data.

The number of leaf pages should match up to the sum of the total number of pages in each histogram bar in the fill distribution, shown below.

Nodes

This is the total number of records in the table that have been indexed.However, it is possible — because gstat doesn’t work in a transaction-aware manner — that this figure will possibly include rows that have been deleted (and not garbage-collected) and/or it may count records more than once if they have been modified in such a way that the indexed column(s) have been changed.

nodes: 62372

Because of the above, it is advisable to carry out a sweep, or a database backup and restore, prior to running gstat to ensure that the statistics gathered are accurate and reflect the true state of the database.

Average data length

This statistic indicates the average length of the key column(s) in bytes.

Average data length: 2.58

This is most likely less than the actual sum of the column sizes as Firebird uses index compression to reduce the amount of data held in an index leaf page.

Duplicates

Duplicates are not permitted in a primary key or unique index.Other indexes do permit duplicates and these statistics report on the number of duplicates the index holds.The following isql query shows the details of duplicates for an indexed column in a different table to the one being used so far — which has no duplicates.

SQL> SELECT IDX, COUNT(*)
CON> FROM NORMAN_TEST
CON> GROUP BY IDX;

         IDX        COUNT
============ ============
           1           10
           2            4
           3            1

From the above we see a total of 15 rows, of which there are 14 duplicated values (all those with a 1 or 2 in the IDX column).The following is the extract for the duplicates for this table:

Index NORMANX (0)
        Depth: 1, leaf buckets: 1, nodes: 15
        Average data length: 0.27, total dup: 12, max dup: 9

Total dup is the total number of duplicates in the index.Note from the above that only 12 duplicates are listed, but we already know that there are 14 duplicates rows in the index.How is this possible?

The first occurrence of a 1 and the first occurrence of a 2 are not counted, by gstat, as duplicates.Only the second and subsequent copies are considered duplicates.

Note

In my opinion this is not quite correct behaviour.In the table above there are 15 rows and only three unique values in the IDX column, which is indexed.My index therefore holds 14 duplicate values rather than just 12.

You can, however, use the total dup value to extract the number of unique values in the index by subtracting it from the nodes value.

Max dup reports on the number of index entries which share the longest chain of duplicates.In other words — for the above index — there are 9 index entries that share the same value in the indexed column.We can see this to be true as the rows where IDX is 1 has 9 duplicate entries.

If max dup is getting close to total dup, then it is a reasonable assumption to conclude that it may be that the index is so poor in selectivity that it may never be used in queries.

Fill distribution

The remainder of the report for our original example index shows how the pages are used within the index.

Fill distribution:
             0 - 19% = 15
            20 - 39% = 0
            40 - 59% = 55
            60 - 79% = 68
            80 - 99% = 36

The figures represent a graph (or histogram) of how the space in the index’s pages are being utilised.Each value of the histogram represents the number of pages in the whole index, which have been filled to a certain percentage.Each bar of the histogram represents the percentage filled for the page.

The example index’s fill distribution is shown above and from these figures we see that the vast majority of the pages are filled to between 40 and 99%. The individual numbers at the end of each line above show the number of pages in this band.The example shows that:

  • 15 pages have been filled to less than 20%; and

  • 0 pages have been filled to between 20% and 39%; and

  • 55 pages have been filled to between 40% and 59%; and

  • 68 pages have been filled to between 60% and 79%; and

  • 36 pages are filled to between 80% and 99%.

The sum of all these pages should add up to the same figure shown above for leaf nodes.

This index shows reasonably good space usage as the majority of pages are well filled.Ideally, you would like to see all the pages being filled to between 80 and 99%. If, on the other hand, the report showed that the pages were all lightly filled — say less than 60% — the index would be a good candidate for a rebuild exercise.

Be sure to consider the total number of nodes before starting a rebuild — if there are only a few nodes in the index, then rebuilding will not help the space usage as there may not be enough records to actually fill the index pages.

Selecting Tables To Analyse

To limit the analysis to a specific list of tables, rather than all user tables, you can use the [gstat-cmdline-table] switch to specify the ones you wish to analyse.Note that specifying table names in this manner also analyses all indices associated with those tables, unless only [gstat-cmdline-data] or [gstat-cmdline-index] is specified.

gstat employee -t EMPLOYEE JOB COUNTRY >employee.gst

The resulting output is interpreted as described above.

If you have a table name that has been created by a user wishing to preserve the letter case of the table name, rather than having it converted to uppercase, for example:

tux> isql myMusic
Database:  mymusic

SQL> CREATE TABLE "MyMusic_Artists" (
CON> art_id integer,
CON> art_name ....);

SQL> COMMIT;

... then you must supply the table names in exactly the same letter case as the name of the table within the database:

gstat mymusic -t MyMusic_Titles MyMusic_Artists > MyMusic.gst

If you supply a non-existing table name, or get the name in the wrong case, etc., gstat will output an error and not analyse any table.

Including The System Tables & Indices

Normal use of gstat doesn’t include the system tables and indices in the output.Calling gstat with the [gstat-cmdline-system] switch causes these tables to be included in the analysis.

gstat employee -system >employee.gst

The interpretation of the results for the various system tables and indices is exactly as described above for user tables and indices.

Record & Version Details

When you run gstat with either the default switches, or [gstat-cmdline-data] or [gstat-cmdline-table] and add the [gstat-cmdline-record] switch, you get additional information in the report that shows the average record length and average version details for the table(s) in question:

Average record length: 96.55, total records: 62372
    Average version length: 0.00, total versions: 0, max versions: 0
Average record length

Simply the average record length, in bytes, of all the records in the table.If this figure is 0.00 then you can be reasonably sure that all your records have been deleted, or that you have no records in the table.

Total records

The total number of records in the table.The value may include records in currently active transactions and may include records which have been deleted.

tux> # In session 1.
tux> gstat test -r -t NORMAN

...
Analyzing database pages ...
NORMAN (142)
    Primary pointer page: 268, Index root page: 269
    Average record length: 9.00, total records: 15
    Average version length: 0.00, total versions: 0, max versions: 0
    Data pages: 1, data page slots: 1, average fill: 10%

tux> isql tset -user norman -password secret
Database:  employee

SQL> SELECT COUNT(*) FROM NORMAN;

       COUNT
============
          15

At this point, we can see that there are 15 records in the NORMAN table and that the average length of these 15 records is 9.00 bytes.Next, we start another isql session and delete all the records from the NORMAN table.

tux> # In session 2.
tux> isql test -user norman -password secret
Database:  employee

SQL> DELETE FROM NORMAN;
SQL> COMMIT;
SQL> shell;

Still in the second session, we execute gstat to fetch statistics for the NORMAN table, the results are shown below.

tux> gstat test -r -t NORMAN

...
Analyzing database pages ...
NORMAN (142)
    Primary pointer page: 268, Index root page: 269
    Average record length: 0.00, total records: 15
    Average version length: 9.00, total versions: 15, max versions: 1
    Data pages: 1, data page slots: 1, average fill: 16%
...

tux> # Return to isql.
tux> exit

Comparing the report above with the one taken before we deleted the records, we can see straight away that:

  • The average record length indicates that there are no records in the table, but the total record count shows that there are (still) 15.This is a good indicator that a session has deleted all the records but garbage collection has yet to run.

  • The versioning details have all changed, there are now statistics for average version length, total versions and max versions.

  • The average fill for the page(s) in this table has risen from 10% to 16% even though everything has been deleted.The extra space is being used by the back versions of the deleted records.

Continuing in the second session, if we execute a full table scan of the NORMAN table we will not see any results, but we will garbage collect the back versions.

SQL> SELECT * FROM NORMAN;

SQL> shell;

tux> gstat test -r -t NORMAN

...
Analyzing database pages ...
NORMAN (142)
    Primary pointer page: 268, Index root page: 269
    Average record length: 0.00, total records: 0
    Average version length: 0.00, total versions: 0, max versions: 0
    Data pages: 0, data page slots: 0, average fill: 0%

Everything has now returned to zero.There are no back versions, no current versions and the page is no longer filled.

Average version length

This is similar to the average record length, but for the back versions of the record.For example, if you have deleted a number of records and updated others, the old — back — versions of these records will be reported here.If the figure is 0.00 then garbage collection has taken place and removed the back versions — see above for an example.

Total versions

The same as total records above, but includes only the back versions.If the figure is 0 then garbage collection has taken place and removed the back versions — see above for an example.

Max versions

If a record has been updated many times, the max versions statistic shows you the number of back versions of the record (or records) in question.In a table where all the rows have been updated 7 times, but one has been updated 20 times, this statistic will report a value of 20.If the figure is 0.00 then garbage collection has taken place and removed the back versions — see above for an example.