KSAM FILE STRUCTURE [ KSAM/3000 Reference Manual ] MPE/iX 5.0 Documentation
KSAM/3000 Reference Manual
KSAM FILE STRUCTURE
A KSAM/3000 file is two physical files: a data file and a key file. The
data file portion of a KSAM file contains all the data in the file and
contains nothing but the data. Data records are written to the data file
in the order in which they are received from a program. (The last record
added is always written to the end of the file.) This chronological
order is not necessarily in sequence by key value. At the time the file
is opened, you can specify that records must be written in primary key
sequence, but the default mode is to write records in any order.
The key file portion of a KSAM file contains the key entries that
maintain the sequence of the data records. As a data record is written
to the data file, a key entry is added for each key defined for the file,
and the sequential connections between key entries maintained. This
means that if there is a primary key and two alternate keys, three key
entries are added with each new data record, and three sets of pointers
are updated to reflect the new key sequence of each key.
The structure of the data file is like that of any MPE file. Data
records may be fixed or variable in length. If fixed, each record is the
size specified when the file is created (default size is equivalent to
one 128-word disc sector). If variable, the actual size of each record
is included in the record itself, and the maximum size of any record is
used to determine the blocking. By default, data records are blocked one
record per block.
The structure of the key file is more complex. The key file is organized
so that locating a particular key requires the least number of accesses.
For this purpose, the key files are organized in a particular structure
known as a "B-Tree".1
Described in "Organization and Maintenance of Large Ordered Indexes"
Bayer and McCreight, Acta Informatica, Springer Verlag,1972, pp 173-189.
B-tree structure has two main advantages:
* The number of file accesses is limited to the number of levels in the
tree. If there are two levels, no more than two reads of the key
file are needed to locate a particular key.
* The key file is balanced. This means that each level pointer
associated with a particular key value points to approximately as
many higher key values as lower key values at the next level of the
tree.
B-tree structure in general is discussed below, followed by a discussion
of how KSAM key files use this structure.
B-TREE STRUCTURE
In a B-tree, there is always one root level block that points to blocks
at a lower level. At the lowest level, the blocks are called leaves and
they do not reference another level. In a two-level structure (see
Figure B-1), the blocks at the second level all "leaves". If the tree
has more than two levels, intermediate blocks (nodes or branches) are
referenced by a higher level and themselves reference a lower level.
Unless this lower level is a leaf, it also references a lower level.
This continues until the lowest (leaf) level is reached.
The notion of higher and lower level does not refer to the key values,
The root block key values are always central and point to blocks with
lower values and blocks with higher values. Thus if there are two
entries at the root or a branch level, there will be three pointers to
the next level: one for key values less than the first key value, one
for key values less than the second key value but greater than the first,
and one for key values greater than the second.
Within each block, values are stored in ascending order. Although not
all blocks are filled with values, each block in a tree is the same size.
Figure B-1 illustrates a simple 2-level tree with one root block and
three leaf blocks. The root is a single block and each leaf is a block
of the same size. (This example uses the KSAM minimum key block size
consisting of four key entries per block.)
Figure B-1. Two-Level B-Tree Structure
ADDING OR DELETING KEYS.. When a key block is full and new keys are
added, the root level block may need to be split, causing a new root
block to be introduced and adding a new level to the tree. This process
is illustrated in Figure B-2 where the addition of one new key to a
partially filled block does not affect the tree structure, but the
addition of a second key to the full block causes the block to split.
Again, this example assumes the minimum key block size for the sake of
simplicity. Note that all key file maintenance is performed in the KSAM
extra data segment where space is allocated for one more key than the key
block size. This allows the addition of a key to a "full" block. Before
the block is read back into the key file, it is split so that the key
block size is maintained.
Figure B-2. Split Causes New Level in Tree
When the root block and all the leaves are full, another split becomes
necessary. Figure B-3 illustrates a split caused when a new key is added
to a full two-level tree structure, forcing it to a three-level
structure.
Figure B-3. Tree Growth from Two to Three Levels
Note that key blocks must always be defined with an even number of keys.
As a result, when a key is added to a full block, there will be a middle
value to form a block at a new level. This maintains the balance
essential to B-tree structure.
As records are deleted from the data file, two blocks at the same level
(brothers) may be merged into one block. If sufficient records are
deleted, the root block may be merged into a higher level, thereby
contracting the number of levels in the key structure.
KSAM KEY FILE STRUCTURE
A KSAM key file consists of three types of information:
* Control--contains general control information such as the KSAM
filename, and the number of keys defined for the file,
* Key descriptor--contains general key information for each key such as
the starting location in the data record of the key field, and the
location in the key file of the root key entry.
* Key entries--Each key entry contains information about a key
associated with a data record. This information consists of:
* the key value
* a pointer into a data record in the data file with the same key
value.
* pointers into other records within the key file.
The control and key descriptor information is contained in two blocks
(physical records) at the beginning of each file. Regardless of the
number of keys in the file, each block is 128 words (l sector) long.
Thus, every key file is preceded by two sectors of control and key
descriptor information.
The key entries are also organized into blocks of a fixed size. However,
the exact number of blocks and the size of each block is based on a
variety of factors, such as the key size, the number of keys in the file,
the number of key values for each key, the key blocking factor, and so
forth. (Calculation of key block size is discussed later in this
section.) These key entry blocks are organized into the Btree structure
discussed above. A separate key structure is maintained for each key
defined for the file, Thus there may be up to 16 separate tree structures
in a single KSAM file.
Refer to Figure B-4 for a simplilied diagram of a KSAM key file with two
keys each organized into a two level tree structure. For a detailed
description of the three types of block, refer to Figure B-5 and Figure
B-6.
CONTROL BLOCK.. This 128-word block contains information on the data
file associated with the key file, and keeps track of the number and type
of access to the key file. It also specifies the number of keys (primary
and alternate) defined for the KSAM file. The name of the data file and
the number of keys are essential for associating the key file with the
data file. The number of keys determines how many entries are in the Key
Descriptor Block. (Refer to Figure B-5.)
KEY DESCRIPTOR BLOCK.. This 128-word block contains one 8-word entry for
each key defined for the KSAM file. The first entry describes the
primary key, the next entry describes the first alternate key (if there
is one), and each subsequent entry describes any additional alternate
keys. The first word of each entry points to the root block for that
key; another important item is the location of the key in the data file
record. (Refer to Figure B-5.)
Figure B-4. KSAM Key File Structure With Two Keys
Figure B-5. Control Block and Key Descriptor Block
KEY ENTRY BLOCKS.. Each block in the key file contains, in addition to
the key values, pointers that link the key blocks to each other and
pointers that link each key value to an associated data record.
Preceding these entries, the first item in every key block is the address
of the block on disc; the next item is the number of keys in the key
block.
All key block access for search and modification is performed in the KSAM
extra data segment. The disc address in each key block insures that the
block is returned to its correct location on disc from the extra data
segment.
Figure B-6 illustrates the general layout of all key entry blocks. Each
key value is followed by a pointer to a data record and a pointer to the
block at the next level with higher key values. The first, pointer in
each block points to a block at the next level with lower key values.
These pointers are set to zero for key blocks that have no next level
(the leaves on a tree structure).
Figure B-6. Key Entry Block Structure
RELATION OF KEY TO DATA FILE
The purpose of the KSAM key file is to maintain the order of data records
in the data file. In order to maintain sequential order for each key,
the keys blocks are connected through pointers. In addition to these
pointers, each key entry must also contain a pointer linking the key
value to the data record containing the corresponding key value.
When the KSAM file is created, each key is defined by its starting
location in the data record, its length, and its type. The location is
specified as the character position where the key value starts; the
length is the maximum number of characters used by the key value; its
type is the type of the value such as, an integer, a character string, or
a double-word integer.
Thus, if the primary key is defined as a character string that starts in
character position 3 and is 20 characters long, then KSAM expects that
each data record will contain such a value in that location. Whatever is
placed in the defined location is treated as the primary key and
determines the order in which data records are sequenced.
The order in which records are physically written to the date file is
called chronological sequence. This sequence may or may not also be a
key sequence. If the records were written to the file in primary key
sequence, then this sequence and the chronological sequence are the same.
If there is an alternate key for the file, however, it is very unlikely
that alternate key sequence is the same as the chronological sequence.
NOTE Key sequence in KSAM files is always in ascending order by key
value.
Refer to Figure B-7 for a simplified diagram of the relation between the
primary keys in the key file and the associated data records in the data
file. (A similar diagram could be set up for the alternate key.) The
diagram shows the pointer in each key entry pointing directly to a record
in the data file.
When a data record is to be located by key value, the root block for the
appropriate key is searched first, using a binary search method. If the
key is in the root block, the search is over. If it is not, the key
value is between two root block values or it is less than the lowest
value or greater than the highest. Using the pointer in the appropriate
location, a block at the next level is located. This block is then
searched for the selected key. Again, if the key is found, the search is
over. If the key is not found at this level, the appropriate pointer to
the next level is used and the search continues.
When the selected key value is found, the pointer to the data file
associated with that key value is used to locate the record in the data
file.
Figure B-7. Data File/Key File Relation
MPE/iX 5.0 Documentation