![]() |
![]() |
ALLBASE/SQL Database Administration Guide: HP 9000 Computer Systems > Chapter 3 Physical Design![]() Calculating Storage for Database Objects |
|
A good database storage design requires the following:
The numbers derived from the calculations described in the following sections are used to assign a value for a number of pages in the PAGES= clause of the CREATE TABLE and CREATE DBEFILE statements. DBEFiles are used to store table and index data. They are composed of 4096-byte pages. DBEFiles can be from 2 to 524,287 pages. The number of pages in an ALLBASE/SQL file is determined when it is created. In the case of expandable DBEFiles, the maximum size and the size of an increment are determined when the DBEFile is created. All DBEFiles must be associated with a DBEFileSet before they can be used to store data. A DBEFileSet is a logical grouping of one or more DBEFiles. Figure 3-1 “DBEFiles in DBEFileSets” shows a DBEFileSet (represented by the dotted lines because it is a logical construct) which contains three DBEFiles (represented by solid lines because they are physical constructs). The amount of storage available in a DBEFileSet is the sum of the pages of all the DBEFiles in that DBEFileSet. The DBEFileSet in Figure 3-1 “DBEFiles in DBEFileSets” shows a total of 200 pages. When a table in the DBEFileSet illustrated needs more than 200 pages to store data, additional DBEFiles can be added to the DBEFileSet to accommodate more data. You can specify the type of data that a DBEFile can contain. DBEFiles can be of type MIXED, TABLE, or INDEX. MIXED DBEFiles can store either table or index data. TABLE DBEFiles can store only table data. INDEX DBEFiles can store only index data. As shown in Figure 3-2 “Data Stored in DBEFiles within a DBEFileSet”, DBEFile1 can be of type TABLE, while DBEFile2 and DBEFile3 must be of type MIXED. All the DBEFiles for a given table and its indexes must be contained in one DBEFileSet. Table and index data can span more than one DBEFile within a DBEFileSet. Figure 3-2 shows the relationship between DBEFileSets, DBEFiles, and data. Note that in the figure the tables and indexes are stored in one or more DBEFiles, but they are all contained within a single DBEFileSet. You will need the following information to calculate the number of DBEFile pages needed by each table and index:
Use the following procedure to calculate the necessary number of DBEFile pages needed for your tables:
The examples presented below show worst-case calculations. They assume that each row has its own tuple header. Refer to the section "Defining Null Values for Columns" in the "Logical Design" chapter for more information about tuple headers. The row length is dependent on the data type and size of the columns in the table. Column size is calculated in bytes. INTEGER, SMALLINT, REAL, and FLOAT columns are a fixed number of bytes. The size of BINARY, VARBINARY, DECIMAL, CHAR, and VARCHAR columns depends on the size given in the column definition. Refer to Table 3-1 for the storage requirements of the various data types. Table 3-1 Data Type Storage Requirements
As you begin calculating row lengths, try to use physical space as efficiently as possible. Tables can share DBEFiles, but only one table can store rows in a given DBEFile page. A row whose columns add up to a length greater than 2000 bytes will potentially waste page space. For example, if you have a table with a row length of 2050 bytes, a DBEFile page of 4096 bytes with a 100 byte overhead would only be able to hold one 2050 byte row. The remaining 1950 bytes would be left empty. The Parts table used in the examples in this chapter has three columns with the following characteristics:
Use the following calculation to determine the row length for the table:
The calculations in the following examples assume 30,000 rows will be stored in the Parts table. The following formula is used to calculate the number of rows per page (NRP): ![]() where RL is row length, and NC is the number of columns in the table, divided into the number of bytes that can fit on a page (a set value of 4030). This formula includes overhead for column values that can be NULL as well as overhead for the size of VARCHAR and VARBINARY data. Using the values for the Parts table, the number of rows that can fit on a page is calculated as follows: ![]() The value for NRP is rounded down to the next integer because a partial row cannot be stored on a page. The result is 62 rows per page. The following formula is used to calculate the number of DBEFile pages (NDP) needed to hold all rows in the table: ![]() where NR is the number of rows in the table, and NRP is the number of rows per page. Using the previous value for NRP of 62 and an assumed value of 30,000 rows in the Parts table, the number of pages needed for the table is calculated as follows: ![]() The value for NDP is rounded up to the next integer because DBEFiles are not created with partial pages. The result is 484 data pages. For every 252 pages in a DBEFile, ALLBASE/SQL creates a page table page as a directory to store information about the next 252 pages. It contains pointers to data and keeps track of which pages are empty and which tables contain rows in which pages. The directory overhead (DO) can be calculated with this formula: ![]() where NDP is the number of data pages which is divided by a set value of 252. Using the previous value for NDP of 484 pages, the directory overhead is calculated as follows: ![]() The value for directory overhead is rounded up to the next integer because DBEFiles cannot have partial page table pages. The result is 2 page table pages. The total number of DBEFile pages needed to store 30,000 rows in the Parts table is calculated as follows:
Perform the calculation for each table to get the total estimated number of DBEFile pages needed for the DBEnvironment. Be sure to leave enough space in the DBEFiles for minor expansion of the tables, so you do not need to add DBEFiles to DBEFileSets frequently. For indexes, ALLBASE/SQL uses a doubly linked balanced tree (B-tree) structure, which can have several levels between its initial node, or root page, and the leaf node, or leaf page, containing the pointer to the requested row. (For basic information about B-tree indexes, refer to "Designing Indexes" in the "Logical Design" chapter.) Calculate the following values to determine the number of DBEFile pages needed for an index:
The total number of index pages needed is the sum of the number of leaf pages, non-leaf pages, and overhead pages. The index key length (KL) is calculated by adding 10 to the sum of length of the columns (SLC) on which the index is created:
The 10 bytes includes 8 bytes for the data TID entry and 2 bytes needed for the slot table entry. Consult the preceding table, "Data Type Storage Requirements," for the column length of each data type. If the index is defined upon a CHAR(16) column, an INTEGER column, and a CHAR(20) column, the index key length (KL) is calculated as follows:
The size of the index header (IH) can be calculated with the following formula:
NIC is the number of columns upon which the index is defined. The constant 1 represents the byte needed for the TID. An additional 2 bytes is needed for overhead. If the index is defined upon 3 columns then the size of the index header is determined as follows:
The formula used to calculate the number of rows per leaf page (RLP) depends on whether the index header can be shared. If the columns on which the index has been defined allow NULL values, or if their data type is VARCHAR, then the index header cannot be shared. Use the following formula to calculate the number of rows per leaf page (RLP) when the index header can be shared : ![]() 2 is 2 bytes for the slot table entry; 8 is 8 bytes for the data TID entry. If the index header cannot be shared, use the following formula: ![]() 4006 is the number of bytes available in a page. For a conservative estimate, assume that the leaf pages are 2/3 full. If the result is a fraction, round down to the nearest integer because a partial row cannot be stored in a page. For example, if the index key length is 50 and the index header cannot be shared, the number of rows per leaf page is calculated as follows: ![]() If the index header can be shared, calculate the number of rows per non-leaf page (RNLP) with the following formula: ![]() Where 2 is 2 bytes for the slot table entry; 8 is 8 bytes for the data TID entry; 8 is the next data TID pointer. Use the following formula if the index header cannot be shared: ![]() The value is multiplied by 1/2, because we assume that the non-leaf pages are half full. If the result is a fraction, round down to the nearest integer because a partial row cannot be stored in a page. For example, if the index key length is 50 and the index header cannot be shared, then the calculation is as follows: ![]() Each row of table data is pointed to by a row in a leaf page. To calculate the number of leaf pages (LP), divide the number of rows your table will contain (RT) by the number of rows per leaf page (RLP). If the result is a fraction, round down to the nearest integer because partial pages do not exist. The formula is as follows: ![]() If the table will contain 651090 rows of data, and the value of rows per leaf page is 44, the number of leaf pages is calculated as follows: ![]() To determine the total number of non-leaf pages, you must calculate the number of non-leaf pages at each level in the B-tree. Start at the lowest non-leaf level, that is, the level just above the leaf pages, and move up the B-tree until the level has only one page. At each level, use the following formula: ![]() NLP(n) is the number of non-leaf pages a level n, RL(n) is the number of rows needed at level n, and RNLP is the number of rows per non-leaf page. If the result is a fraction, round down to the nearest integer because partial pages do not exist. Since the rows of non-leaf pages point to the leaves at the next level down, the value of RL decreases with each higher level. In the example that follows, assume that the rows per non-leaf page is 33 and the number of leaf pages is 14797. First, calculate the number of pages at the lowest non-leaf level. The rows at this level point to the leaf pages. Since there are 14797 leaf pages, the RL(0) is 14797. The calculation is as follows: ![]() Next, calculate the number of non-leaf pages at the next level up. The rows at this level point to 448 non-leaf pages. The calculation is as follows: ![]() At the next level up, the page rows point to 13 non-leaf pages. After rounding up to the nearest integer, the result of the following calculation is 1, indicating that the highest level has been reached: ![]() The total number of non-leaf pages is the sum of the non-leaf pages at each level:
For every 252 pages in the B-tree, a directory overhead page (DO) is required: ![]() A directory overhead page is also referred to as a page table page. Using the values from the previous examples, the number of overhead pages is calculated as follows: ![]() Remember the following when allocating DBEFile storage:
The following factors affect how tables should be grouped in DBEFileSets:
Grouping tables in separate DBEFileSets increases traceability of storage for particular tables. For example, if you store all tables in the SYSTEM DBEFileSet, you will not be able to tell which DBEFiles hold system catalog data and which hold data for a particular table. As a result, all tables and the system catalog would have to be taken into consideration when storage space is added to the DBEnvironment. All maintenance functions such as the UPDATE STATISTICS statement would take longer if all tables are stored in a single DBEFileSet. Therefore, you should create a DBEFileSet for each table or group of tables that you want to maintain separately. You may want to place tables that are used infrequently in the same DBEFileSet to use space more efficiently. You can add DBEFiles to the DBEFileSet to accommodate the space requirements of several tables. Hash tables must be created in separate DBEFiles. It may be useful to create separate DBEFileSets for your hash tables. Large tables should have their own DBEFileSet. If a small table is in the same DBEFileSet as a large one, the performance of sequential scans on the small table will not be as good as if the small and large tables are separated. You may want to place related tables in the same DBEFileSet. In the sample database, one DBEFileSet is created for the internal parts information in the Parts and Inventory tables. Another DBEFileSet is for the vendor information in the Vendors and SupplyPrice tables. A third DBEFileSet contains the order data of the Orders and OrderItems tables. Since related tables are often updated simultaneously, DBEFiles can be added to the DBEFileSet to accommodate the growth of multiple tables. Figure 3-3 “DBEFileSets in the Sample DBEnvironment” shows how the sample DBEnvironment is divided into three DBEFileSets. When a DBEFile is created, it can be specified as one of three types:
You can control performance by selecting DBEFile types carefully and by locating DBEFiles on the appropriate devices. The characteristics of the transactions to be processed determine if you should create a separate INDEX DBEFile. If your applications access indexes frequently, placing the indexes in a separate DBEFile (and possibly on a different, faster device) may improve performance. If your applications access the indexes infrequently, having index and table data share the same DBEFile uses disk space more efficiently. The default file type is MIXED. If table and index data are stored together in a MIXED DBEFile, the disk drive does not have to search multiple DBEFiles when ALLBASE/SQL uses an index. However, this may not be true for large tables that span several DBEFiles, or for tables with multiple indexes. You can improve performance by placing table and index data in different DBEFiles and locating the DBEFiles on different devices. Then, when an index is used during query processing, each disk drive accesses either index or table data and reads and updates are distributed over multiple devices rather than concentrated on a single disk drive. To ensure that table and index data will be stored in different DBEFiles, create separate DBEFiles of type TABLE and INDEX, and do not create any MIXED DBEFiles. If there are no MIXED DBEFiles, ALLBASE/SQL must place all index data in the INDEX DBEFile and table data in the TABLE DBEFile. If space of the appropriate type is not available, an error is generated. If you use separate TABLE and INDEX DBEFiles in a DBEFileSet, you should use the results of the calculations presented above in "Disk Space for Tables" to provide enough space in the TABLE DBEFiles to contain all the tables in the DBEFileSet. Similarly, use the results of the calculations presented above in "Disk Space for Indexes" to provide enough space in the INDEX DBEFiles for all the indexes defined on all tables in the DBEFileSet. Figure 3-4 shows how the Orders and OrderItems tables in the sample database are stored in one TABLE DBEFile, and the indexes stored in an INDEX DBEFile. Note that the OrderFS DBEFileSet does not contain any MIXED DBEFiles. You can use the MOVEFILE command in SQLUtil to locate INDEX DBEFiles on separate devices from the TABLE DBEFiles in the same DBEFileSet. Since you can only move physical files (i.e. DBEFiles), you must keep tables physically separate to be able to place them on different devices. Simply storing tables in different DBEFiles does not ensure that they are physically separate. You may want to create separate DBEFileSets for two tables that are accessed frequently at the same time by users or applications. Then tables can be associated with different DBEFileSets and located on different disk drives to minimize disk drive workload. In Figure 3-5, some of the DBEFiles are located on different disks even though they belong to the same DBEFileSet. You specify a DBEFile size in the PAGES clause of the CREATE DBEFILE statement. Once you have determined the storage requirements for each table and index and you have determined which tables and indexes will be assigned to which DBEFileSets, you can estimate how big the DBEFiles should be. Determining DBEFile size involves a trade-off between convenience (that is, how often you have to add additional DBEFiles) and use of direct access storage space. Remember the following when choosing a size for your DBEFiles:
Initially, DBEFiles should be large enough to hold the estimated number of rows for all the tables and indexes in the DBEFileSet. The DBEFile in the OrderFS DBEFileSet should be large enough to contain the data for both the Orders and the OrderItems tables. The formulas discussed earlier in this chapter were used to arrive at the following page requirements assuming 30,000 rows per table: Table 3-2 Page Requirements for Table Data
A DBEFile of type TABLE with at least 645 pages should be created to contain the data for the two tables. You should create the DBEFiles slightly larger than the estimate to make room for minor growth. DBEFiles can be added to make room for significant growth. The Orders table has two indexes and the OrderItems table has one index. Again, using the formulas on the previous pages, the following numbers are calculated: Table 3-3 Page Requirements for Index Data
A DBEFile of type INDEX with at least 1290 pages will accommodate 30,000 keys for each of the three indexes. The DBEFiles should be created slightly larger to make room for minor growth. Again, additional DBEFiles of type INDEX can be added to accommodate significant growth. The amount of disk space used by a hash structure consists of the primary pages you define when you create the structure, page table pages used as overhead by ALLBASE/SQL, and any overflow pages used for rows which do not fit on the primary page pointed to by their hash key. You can use the following formula to estimate the approximate number of primary pages: ![]() Page Size is the amount of space available on a page for data. After subtracting overhead from a 4096-byte page, about 3900 bytes are free for tuples. Fill Factor is the percentage of each primary page that should be filled. Suppose you will have a table with 2000 rows that are 350 bytes long and you wish to allow 30% space for additional growth: ![]() Rounding up to the next whole page, the result is 257 primary pages. For non-integer keys, you should round up again to the next prime number of pages, which will yield the best results. When ALLBASE/SQL cannot find room for a new row on a primary page, it places the data on an overflow page. Overflow pages can be in the same DBEFile as the primary pages, or they can be in any other TABLE or MIXED DBEFile within the same DBEFileSet. When designing the hash structure, you can create a DBEFile that accommodates both primary pages and overflow pages. Specify a DBEFile size that is larger than the number of primary pages; the extra pages will then be available only for use by the hash structure. Some overflow is accommodated by the fill factor described above. You must also estimate the amount of overflow space needed due to variation in key values. For an integer key with sequential values, no additional pages are needed. For a non-integer key, add 20% of the number of primary pages for overflow. When deciding how large to make the DBEFiles for your hash structures, start by calculating the total amount of space needed, as follows:
In addition to primary pages and overflow pages, allow one additional page for every 252 primary pages for page directory overhead. You can allocate the space for primary hash pages over several DBEFiles. Using this approach, you create several smaller DBEFiles that add up to the total number of primary pages plus any overflow. These files can then be moved, if you wish, to different devices. The total number of DBEFiles allocated for primary pages in a hash structure cannot exceed 16. Remember to allow one directory page for every 252 primary pages in each DBEFile. Remember that DBEFiles used by hash structures are considered bound, that is, unavailable for any other purpose. The DBEFiles for a hash structure are allocated in the reverse of the order in which they were added to the DBEFileSet in which the table is created. If you create the hash structure in the same transaction as the one in which you create the DBEFiles for the hash and add them to the DBEFileSet, you can be sure that other transactions will not use the new files. The primary pages for a hash table can be spread over no more than 16 DBEFiles. Overflow pages for a hash table can be placed in any non-bound non-index DBEFile with space for them. DBEFiles containing the hash primary pages are considered bound and therefore unavailable for use by any other table. When UPDATE STATISTICS is executed, all DBEFiles bound to hash primary pages are listed as 100% full to indicate that no more new pages can be allocated from these DBEFiles. (However, data may still be inserted if the allocated pages are not yet full, and the last bound DBEFile may contain overflow pages if primary pages did not use all of it.) Given a number of DBEFiles with specific sizes and a number of primary pages, it is possible to determine the physical location of a logical page number within a hash structure. This information may be useful in performance tuning. DBEFile pages are allocated for use by the hash structure in the reverse of the order in which they were added to their DBEFileSet. For example, suppose you have created three DBEFiles, each with 100 pages, and added them to DBEFileSet HashFS in the following order: File1, File2, and File3. Then, in the same transaction, you create a hash structure in HashFS with 261 primary pages. The distribution of logical pages is shown in Table 3-4 “Logical Page Number and DBEFile Location in Hash Structure”: Table 3-4 Logical Page Number and DBEFile Location in Hash Structure
Page 0 in each DBEFile is allocated as a page table page for directory information. A new page table page must be allocated every 252 pages if the file is large enough. The files in the above example do not have additional page table pages. If you are using large DBEFiles, you should include additional page table pages in your calculations. Integrity constraints make use of index structures to enforce the constraint condition. In addition, each constraint definition is stored in the system catalog. Unique constraints make use of unique indexes. When you define a primary key or specify the UNIQUE option in creating a table, ALLBASE/SQL will create a unique B-tree index. The storage required for this index is the same as for unique indexes, described in a previous section. Unique indexes for unique constraints are stored in the same DBEFileSet as the table on which the primary or unique key is defined. In addition to the unique index that is built on a table that has a PRIMARY key, each referential constraint you define on the referencing table uses a separate index structure known as a parent-child relationship (PCR). A PCR is different from a standard B-tree index in that it contains pointers to both the referencing and the referenced tables. The data for a PCR is stored in the same DBEFileSet as the referenced table. You can get a size estimate for a PCR by using the formulas discussed in the section, "Calculating Storage for Indexes." Be sure to include the keys from both the referenced table and the referencing table when calculating the index length of the PCR. When a unique constraint is defined as you create a new table using the HASH ON CONSTRAINT clause, the table is built as a hash structure, and space is allocated as shown in the section, "Calculating Storage for Hash Structures." |