HP 3000 Manuals

DBEFile Organization [ ALLBASE/SQL Performance Guidelines ] MPE/iX 5.0 Documentation


ALLBASE/SQL Performance Guidelines

DBEFile Organization 

Many performance issues depend on understanding the structure of
ALLBASE/SQL DBEFiles.  DBEFiles are disk files that store data and
indexes for both user and system tables.  The following paragraphs
outline the structure of DBEFiles and the format of data they contain.

Page Organization 

Data is read from and written to DBEFiles in 4096-byte blocks known as
pages.  Each page read from or written to disk constitutes one I/O
operation.  The number of I/Os for a given query is an important measure
of performance.  When you create a DBEFile for the DBEnvironment, you
specify its size as a number of pages, and the size is recorded in the
system catalog.

Page Table Pages 

The first page in a DBEFile is known as a page table page, so called
because it contains a table of the contents of the following 252 pages in
the file.  Before each group of 252 data or index pages, a new page table
page is included.  Page table pages are composed of entries that indicate
whether the following pages are allocated, full, free, or in some other
state.  These pages are accessed during serial scans, and they are
modified when data is added to or dropped from a table.

Page table page entries are 16 bytes long; each page table page contains
one entry for each of the following 252 pages.  The page table page also
contains a 48-byte header and a 16-byte trailer at the end for a total of
4096 bytes, as the following diagram shows:

      ____________________________________
     |      HEADER  (48)                  |
     |____________________________________|
     |                                    |
     |       Page Table Entries           |
     |         (252 X 16 bytes)           |
     |                    ________________|
     |                   |  TRAILER (16)  |
     |___________________|________________|

Rows of Data on Pages 

Data from tables and indexes is stored on DBEFile pages as rows or
tuples.  A given page can store data for only one table or index, though
the DBEFile as a whole can contain pages for many different objects.  A
DBEFile of type TABLE can only contain data for tables, and a DBEFile of
type INDEX can only contain index data.  A DBEFile of type MIXED may
include pages for both tables and indexes.  A page table page entry
indicates which object has data stored on that particular page.

Structure of a Page 

Understanding how data is arranged on these pages can assist you in
deciding how to reduce the number of I/Os required for particular
database operations.  Since table data and index data are stored somewhat
differently, the two types of storage are presented in separate sections
below.

Storage of Table Data on DBEFile Pages 

Tuples from user tables are stored on DBEFile pages as in the following
diagram.

      ____________________________________
     |      HEADER  (48)                  |
     |____________________________________|
     | Tuple | Tuple | Tuple | Tuple |    |
     |Header_|_Body__|Header_|_Body__|... |
     |                                    |
     |       ____________ ________________|
     | <--- | SLOT TABLE |  TRAILER (16)  |
     |______|____________|________________|

The tuples in a TABLE DBEFile are rows from user tables.  The tuple
header contains descriptive information about the columns in a tuple, and
the tuple body contains the actual data.

The format of a tuple header is as follows:

      _____________________________________________________
     | LEN | Column Descriptor | Column Descriptor | ...
     |_____|___________________|___________________|______

The tuple header contains the following fields:

   *   Length of the header (2 bytes)
   *   A 2-byte column descriptor for each column in the tuple.  Each
       column descriptor has bits that indicate the following:
          *   Whether or not the column contains a null value.
          *   Data type of the column.
          *   Byte offset to the end of the column within the tuple.

The total length of a tuple header is given by the formula:

     Length = 2*(NCols + 1) 

where NCols is the number of columns in the tuple.

At the beginning of the data area on a page, following the page header
and before the first tuple on the page, there is a tuple header which
contains information about the first tuple.  This header is also known as
the shared header because it can be shared with other tuples on the page.
For all tuples other than the first one, a header is only stored with the
tuple body if it is different from the shared header.

To illustrate the use of the shared header, suppose that a page holds 100
rows of 4 columns each.  In this case, header data requires 10 bytes per
tuple.  If each header were different from the shared header, the total
space used on the page by all tuple headers would be 1000 bytes.  But if
all 100 rows used the same header, then header data would only occupy a
total of 10 bytes, leaving more room for tuples.  A page with a shared
tuple header looks like the following: 

      ____________________________________
     |      HEADER  (48)                  |
     |____________________________________|
     | Tuple | Tuple | Tuple | Tuple |    |
     |Header_|_Body__|_Body__|_Body__|... |
     |                                    |
     |       ____________ ________________|
     | <--- | SLOT TABLE |  TRAILER (16)  |
     |______|____________|________________|

A flag is set in the slot table to indicate whether or not the shared
tuple header is used by a particular row.  The slot table is further
described below. 
Each tuple can be uniquely identified by a tuple ID (TID), which has the
following format:

      __________________________________________
     | Version | DBEFile No | Page No | Slot No |
     |_________|____________|_________|_________|

For individual rows, the TID is usually expressed as three numbers
separated by colons (example:  10:5:8), where the first number is the
DBEFile number; the second is the page number; and the third is the slot
number.  Version numbers are not used for identifying rows of data, but
they are used in identifying other objects in the DBEnvironment, such as
tables.  The DBEFile number and page number indicate the file page in
which the tuple is found.  The slot number, further described below,
indicates an offset on the page where the row is located.

Slot Table.     

At the end of the page is a slot table, which contains an entry for each
tuple on the page.  When ALLBASE/SQL uses a TID, the slot number found in
the last field in the TID is an index into this table.  Each slot table
entry has the following format:

      __________________________________________
     |FL|FL|FL|FL| Byte offset to tuple on page |
     |__|__|__|__|______________________________|

Each entry is 16 bits long; the first four bits are flags (FL), and the
remaining 12 bits are the byte offset to the tuple within the page,
counting from the beginning of the page.  There is a maximum of 256
entries in a slot table, which means that a maximum of 256 rows can be
stored on a page.

Indirect Rows 

An indirect row is created when a row's length increases during an
update, and the page that currently holds the row does not have enough
space to store the new information.  In cases like this, the updated row
is stored on another page, and the original page is updated with the TID
of the new row on the new page.  An indirect row can only be accessed by
first fetching one page to find the address of the row and then fetching
a second page to obtain the row itself.

An indirect row may be created in any of the following circumstances:

   *   A variable length data field is updated with a longer value than
       the one previously stored.
   *   A NULL column is updated to contain a non-null value.
   *   An ALTER TABLE statement adds a column to a table and supplies a
       default value that must be added to each row.

Hash Storage 

Hash structures in ALLBASE/SQL are tables that you define to be hashed
according to a specific key at the time you create them.  In addition to
the key, you specify a number of primary pages, which become the hashed
locations of the data in the DBEFile or DBEFiles holding the table.  Rows
are inserted based on the value in the hash key; a row is said to hash to
a specific page, which means that the primary page location is calculated
from the value of the key, which must be unique.  A search array is
maintained on each page.  This structure contains slot numbers in key
order.  ALLBASE/SQL does a binary search using this array to arrive at a
specific row quickly.  When a row is inserted, the array is updated.

If there is not enough room for an inserted row on a hash page, the row
is placed on an overflow page.  Overflow pages are linked to the primary
page and to one another using the NXT and PRV pointers on the page.  A
large number of overflow pages means slower access to data.  To avoid
this, choose a good hash key with a uniform distribution of values within
the table.  Evenly distributed key values result in hashing to an evenly
distributed set of pages.  A key with a skewed distribution of values
would result in hashing to a skewed set of pages.  If the hash key does
not have a uniform distribution, then the time required to access some
rows will be much slower than the time required to access other rows.

Except for the search array and the NXT and PRV pointers, the format of a
hash data page is similar to that of an ordinary data DBEFile page, as
shown in the following:

      ____________________________________
     |      HEADER  (48)                  |
     |____________________________________|
     | NXT  | PRV | Search  |Shared| Tuple|
     |______|_____|_Array___|Header|_Body_|
     | Tuple | Tuple | Tuple | Tuple |    |
     |_Body__|_Body__|_Body__|_Body__|... |
     |                                    |
     |       ____________ ________________|
     | <--- | SLOT TABLE |  TRAILER (16)  |
     |______|____________|________________|

Page Compression 

When a row of data is inserted on a DBEFile page, it enters the region
known as the free area on the page.  The free area is all the space that
is marked as available for data.  When a row is deleted, its space does
not immediately return to the free area.  Thus, additional inserts
following a deletion will first fill up all the space in the free area on
the page.  If the free area becomes full at insert or update time, the
space left over from deletions is returned to the page's free area.  This
process is known as page compression.

Storage of Index Data on DBEFile Pages 

Index entries are stored just as data entries are, with the following
exceptions:

   *   Index pages are either leaf pages or non-leaf pages.
   *   Leaf pages actually point to rows on DBEFile pages.  Each leaf
       page tuple contains a key value and the TID of a data row
       containing that key.
   *   Non-leaf pages contain tuples with key values and pointers to
       other non-leaf pages or to leaf pages.

A leaf index page for a common type of index looks like the following:

      ____________________________________
     |      HEADER  (48)                  |
     |____________________________________|
     | NXT  | PRV | Shared  | Tuple |Tuple|
     |______|_____|_Header__|__Body_|_Body|
     | Tuple | Tuple | Tuple | Tuple |    |
     |_Body__|_Body__|_Body__|_Body__|..._|
     |                                    |
     |       ____________ ________________|
     | <--- | SLOT TABLE |  TRAILER (16)  |
     |______|____________|________________|

The NXT and PRV fields point to the next and previous leaf pages in the
index.  These pointers make an index scan in key order extremely fast.
The shared header describes the tuple's length and characteristics.

The tuple body on an index leaf page has two parts:

      __________________________
     |     KEY    |  Data TID   |
     |____________|_____________|

The key is the actual index key value as it appears in the table, and the
data TID is the address of the row pointed to by this index entry.

How Indexes are Used 

The basic index structure in ALLBASE/SQL is a B-tree or balanced tree.
The B-tree consists of a root page and a number of non-leaf and leaf
pages.  Typically, the root page points to a non-leaf page that contains
entries for particular ranges of key values, and each non-leaf page
points to still other pages containing entries for progressively narrower
ranges of values.  The lowest level in the tree is called the leaf page,
which contains pointers to (TIDs of) specific rows in DBEFiles.

The following diagram shows how a B-tree provides access to data.  After
deciding to use a particular index, ALLBASE/SQL accesses the root page
(1).  Then it reads non-leaf pages (2) until it obtains a leaf page (3)
which contains the TID of a qualifying row.  Finally, it accesses the
data page containing the row (4).  Assume that the values of C1 in the
following are percentages between 1 and 100:

     isql=> SELECT C1, C2 FROM T1 WHERE C1 = 79;
                             _______
                            |       |
                            | Root  |    (1)
                            |_______|
                                |
                                |
                ________________|_________________
               |                |                 |
             __|____         ___|___           ___|___
            |  Non  |       |  Non  |         |  Non  |
            |  Leaf |       |  Leaf |         |  Leaf |    (2)
            |_______|       |_______|         |_______|
             1-35             36-77              | 78-100
                                                 |
                                 ________________|_________________
                                |                |                 |
                              __|_____        ___|___           ___|___
     Index entry: key value  |        |      |       |         |       |
                  plus TID   |79|1:6:1| (3)  |  Leaf |         | Leaf  |
                             |________|      |_______|         |_______|
                                | 78-84        85-92             93-100
                                |
                             ___V__
                            | TID  |
                  __________| 1:6:1| Values in row: C1:79, C2:ABCD
                 |          |______|
                 |
      __________ | _______________________
     |           |                        |  DBEFile 1
     |___________v________________________|
     | Tuple |79|ABCD|22|XYZZ|49|NTGH|    |  Data Page 6  (4)
     |Header_|___1___|___2___|___3___|... |
     |                                    |  Tuple 1
     |       ____________ ________________|
     | <--- | .... 3 2 1 |  TRAILER (16)  |
     |______|_Slot Table_|________________|

How PCRs are Stored 

A PCR (parent-child relationship) is a special kind of B-tree that
supports a referential relationship between two tables--the parent table
and the child table.  This kind of index has entries that point to the
rows in the referring table and different entries that point to the rows
in the table referred to.  The table referred to must also have a unique
index defined on it.

In the leaf pages of a PCR, an entry for the parent table (that is, the
table referred to) precedes the entries for child tables (tables
referring to the parent).  Keys are distinguished by adding a 0 to the
end of a parent key, and a 1 to the end of a child key.

The leaf index page in a PCR would have tuples like the following:

      ____________________________________
     |      HEADER  (48)                  |
     |____________________________________|
     | NXT  | PRV | Shared  | Tuple  Tuple|
     |______________Header_____Body___Body|
     |79|0|1:6:1|79|1|4:2:3|79|1|7:8:7|...|  Index entry:
     |_Parent___|__Child___|__Child___|___|  Key (79) followed
     |                                    |  by 0 plus TID of parent and
     |       ____________ ________________|  key (79) followed by
     | <--- | SLOT TABLE |  TRAILER (16)  |  1 plus TID for each child
     |______|____________|________________|

Page Splitting 

As rows are inserted into a table, index entries are also inserted into
the B-tree.  As the pages of a B-tree fill up, new branches of the tree
are created through a process known as page splitting.  In page
splitting, two new index pages are allocated, and the index entries on
the old page are copied to the new pages--half the entries to each new
page; then the new entry is inserted, and the old page is freed for
reuse.  The process is shown for a typical case in the following two
figures.  (1) shows a portion of an index before splitting; (2) shows the
index after splitting.

                             _______
                            |  Non  |
                            |  Leaf |    (1)
                            |_______|
                                |
                                |
                ________________|_________________
               |                |                 |
             __|____         ___|___           ___|___
            |       |       |       |         | Full  |   Attempt to
            | Leaf  |       | Leaf  |         | Leaf  |   insert key: 96
            |_______|       |_______|         |_______|
             78-85            86-93             94-100

                             _______
                            |  Non  |
                            |  Leaf |    (2)
                            |_______|
                                |
                                |
                ________________|________________________________
               |                |                 |              |
             __|____         ___|___           ___|___        ___|___
            |       |       |       |         |       |      |       |
            | Leaf  |       | Leaf  |         | Leaf  |      | Leaf  |
            |_______|       |_______|         |_______|      |_______|
             78-85            86-93             94-96          97-100

When the key value being inserted is greater than the largest value
already in the index or smaller than the lowest value, page splitting is
one-way.  This means that only one new page is allocated, and half the
entries from the old page are moved to it, after which the new key value
is added to the new page.  In the previous example, an attempt to insert
a value of 110 on a full leaf page where the highest value is 100 would
result in one-way page splitting.



MPE/iX 5.0 Documentation