 |
» |
|
|
|
Physical data design means the arrangement of data inside tables, which
reside in DBEFiles, and the arrangement of DBEFiles in DBEFileSets.
This section includes the following topics: Unloading and Reloading to Avoid Indirect Rows.
Unloading and Reloading to Remove Overflow Pages.
Tips on Deletions from Tables.
Additional information relating to physical design is found in the
section "Load Balancing" in the chapter "Guidelines on System
Administration." You may also find it useful to review the
"Physical Design" chapter in the ALLBASE/SQL Database Administration Guide. Creating DBEFileSets |  |
DBEFileSets are logical groupings of DBEFiles. DBEFileSets are also
the logical home of tables, which are always created in a particular
DBEFileSet. The following guidelines offer suggestions on how to
distribute tables among DBEFileSets. Avoiding the SYSTEM DBEFileSet for User DataAlthough you can create user-defined tables in the SYSTEM DBEFileSet,
it is recommended that you create new DBEFileSets to keep user
data separate from system catalog data to facilitate maintenance
and performance tuning. Here are the reasons why: System catalog data is accessed frequently. If user data is kept
in the SYSTEM DBEFileSet, then the pages used for system data
become intermingled with the pages used for user data. If
ALLBASE/SQL needs to access a system page that physically
resides near the end of the DBEFileSet, then it must first look
through all the page table pages that come before it. The
average time to access a particular system page increases because
of the extra data stored within the DBEFileSet.
Sometimes ALLBASE/SQL uses pages in the SYSTEM DBEFileSet for
temporary space. If this space is not available, processing
will terminate in an unsuccessful state. Mixing user and
system data makes it more likely that temporary space will not
be available when needed. See the section "Temporary Space in
SYSTEM" in the chapter "Guidelines on System Administration."
Grouping tables in separate DBEFileSets allows you to identify
the space requirements for particular tables. If you store all
tables in the SYSTEM DBEFileSet and the DBEFileSet runs out
of space, it will be difficult to tell how fast each table is
growing. It will also be harder to determine which table's update or insert
operation caused the system to run out of space.
In particular, you should avoid creating hash structures in the
SYSTEM DBEFileSet. Placing Large Tables in Separate DBEFileSetsYou should place tables larger than 1000 pages in their own DBEFileSets.
When more than one table is stored in the DBEFileSet, the pages of
DBEFiles in the DBEFileSet become a mixture of pages from the
different tables. In doing a sequential scan, if ALLBASE/SQL needs
to access a data page that
physically resides near the end of the DBEFileSet, it must
first look through all the page table pages that come before it. The
average time to access a data page thus increases because of the
extra data stored within the DBEFileSet. The time increases severely
when several large tables are stored in the DBEFileSet.
To determine the size of a DBEFileSet, run SQLMON and go to the
Static DBEFile screen. Maintenance functions such as UPDATE STATISTICS also will take longer
if many large tables are stored in a single DBEFileSet. Therefore,
create a separate DBEFileSet for each table or group of tables that
you want to maintain separately. Gathering Small Tables into DBEFileSetsFor smaller tables, there may be a performance advantage in grouping them
together into DBEFiles of 1000 pages or less. When each small table
occupies its own DBEFileSet, you can expect some of the following
performance problems: I/O might increase, for several reasons:
Each page table page has little information on it. More pages
would have to be read in from disk because the data physically
resides on more separate pages.
Because each page table page has little information on it,
it will not stay in the data buffer pool. Pages with more
information have more locality of reference and are more
likely to remain in the buffer pool (thus reducing real I/O).
If the data buffer pool were large enough, the actual increased
I/O might be negligible.
Each DBEFileSet requires at least one DBEFile, which is an actual
operating system file. Opening a file to read from it is a slow
operation. Putting each table into its own DBEFileSet increases
the number of files, and therefore increases the average
time needed to access the data in these tables.
Operating systems have limits on the number of files that can
be open simultaneously. To avoid exceeding the limit, successive
close and open file operations may be required, which will degrade performance.
ALLBASE/SQL imposes its own limit of 50 files that may be open
simultaneously.
The operating system must search for the correct file descriptor,
which may take longer with more files open.
Disk space usage would increase, since many virtually empty
page table pages would exist.
When small tables are placed in a single DBEFileSet, disk
space is used efficiently and I/O is minimized. For example,
if you have fifty tables that average 5 pages, they will all fit
comfortably into a single DBEFile and use only one page table page. Creating DBEFiles |  |
DBEFiles are the physical repositories of ALLBASE/SQL data on the
operating system. Since you have great flexibility in defining DBEFile
types, sizes, and assignment to DBEFileSets, your choices often affect
the overall performance of your system.
Here are some suggestions for creating DBEFiles: Create DBEFiles in sizes that are multiples of 253 pages to minimize
the space used by page table pages. Each
253-page segment contains one page table page plus
252 pages of data following the page table page.
Create separate TABLE and INDEX DBEFiles, especially for
large tables, so that they can be placed on separate disks to
improve I/O performance.
Create a MIXED DBEFile in its own DBEFileSet for a small table
(less than 1000 pages) with a single index. (In this case, data and index rows
will be on alternating pages, which will result in faster processing and
better space utilization).
Use dynamic DBEFile expansion to avoid running out of DBEFile space
at run time. In the CREATE DBEFILE statement, specify the
initial size, the maximum size, and the increment size.
Use pseudo-mapped I/O
to shorten the
I/O path length and eliminate operating system buffering of database
pages. This feature is appropriate only for large DBEFiles containing
large tables accessed randomly. Standard I/O is more appropriate for
small DBEFiles or DBEFiles that are frequently accessed sequentially.
When appropriate, use pseudo-mapped I/O with memory-resident
data buffers. File system prefetching is not available
when using pseudo-mapped I/O.
Restrictions on pseudo-mapped files:
They cannot be dynamically expanded.
You cannot use concurrent backup (that is, SQLUtil STOREONLINE does
not work with pseudo-mapped files).
For additional information on pseudo-mapped files, refer
to the appendix "Using Pseudo-Mapped DBEFiles" in the the ALLBASE/SQL Database Administration Guide. Avoiding Extra DBEFile SpaceWhen sequential scans are made frequently for queries involving tables
in a particular DBEFileSet, extra space should be minimized. When
sequential scans are made, every page table page in the DBEFileSet
is read to determine if pages might qualify for the query.
When a large amount of extra space exists, then extra I/O is required
to fetch the page table pages for empty DBEFile space.
One way to minimize the extra space in your DBEFiles is
by specifying dynamic space expansion in the CREATE DBEFILE statement. Creating Tables |  |
Here are some general suggestions on table creation: Create PUBLIC and PUBLICROW tables for maximum concurrency.
PUBLICREAD can improve performance by reducing internal page locking.
Use PRIVATE for special purposes. Note that PRIVATE is the default;
usually, you want to specify something else.
Choose data types which suit the programming language so as
to avoid data conversions.
Native language (NLS) users should
be aware that indexes and predicates that use USASCII
(NATIVE-3000)
columns perform faster than indexes and
predicates that use native language columns.
If there is a choice, use USASCII for index columns in your tables,
create indexes on USASCII
columns, and use USASCII columns in your predicates.
Integer values can be 5 - 15% more efficient than packed decimal values.
When adding a column with ALTER TABLE and adding a default value,
every row in the table is immediately updated to add the new
value. This causes all rows to become indirect rows. It may be
better to unload data, drop and recreate the table with the new
column and default, then reload the data.
If a column is to be used consistently in the WHERE clause, it
should have an index.
Wherever possible, load data for non-hash tables
in sorted order. This is
essential for tables with clustering indexes. If a multi-column
index exists, the order of its columns should be the same as the
order of the columns in the load file. Hash tables should not
be loaded in sorted order.
Consider combining tables that are constantly joined
together. It is a good idea to revisit normalization
issues when actually creating tables.
For large tables, place data and indexes on separate disks using
the SQLUtil MOVEFILE command. In general,
spread the load of database and log files over your disks, and
keep data and log files on separate devices.
Refer to the section "Load Balancing" in the "Guidelines on System
Administration" chapter for more information.
Avoiding NULL and Variable Length DataAvoid NULLs and variable length data types (VARCHAR and
VARBINARY) for the following reasons:
Nulls always require additional CPU to check for null values.
Using NOT NULL can save as much as 5% in CPU
because of the overhead of checking for nulls.
Use of NULL and VARCHAR or VARBINARY may cause wasted space due
to the inability to use a shared tuple header. (A separate
header must be stored for each tuple that has a different size.)
However, the use of nulls or variable length columns
may actually save space if there are only a few such rows
per page. If the difference between the maximum size of the row and average
size of data stored in the row is greater than the size of the
header, then use of NULL or variable size columns may be more efficient.
Updating a NULL, VARCHAR, or VARBINARY column may cause the data
to move to a different page, leaving an indirect tuple behind and
increasing the number of I/Os required to access the data tuple.
You can create a table with default values instead of
nulls so that when columns are updated they will not have to move
to another page because of tuple size increase.
Use an appropriate (not an unreasonably long) length for variable
length columns. The maximum length can affect the performance of
BULK SELECT and cursor operations.
Indexes whose keys will be updated
should never include NULL columns as keys, and rarely should
they include VARCHAR or VARBINARY columns.
Variable length keys can cause long index chains and rearranging when
index values are updated.
Using INTEGER Rather than SMALLINT DataSince SQLCore only performs 4-byte arithmetic, all SMALLINT
values are automatically converted to INTEGERs before processing
by SQLCore. These conversions do not affect the optimizer's choice
of a scan plan, but they may be costly in CPU for very large databases.
To avoid the conversion, consider defining your SMALLINT columns
as INTEGER columns. The drawback, of course, is that INTEGERs
require twice as much storage space as SMALLINTs. Initial Table Loads |  |
For optimal performance, use the following ISQL commands before
starting the load operation: SET LOAD_BUFFER
Use this statement to enlarge the load buffer beyond the default size
of 16,384 bytes.
SET AUTOLOCK ON
Avoid lock contention by locking the table in exclusive mode when
the load is performed.
For PUBLIC and PUBLICROW tables,
locking the table in exclusive mode avoids the overhead of
obtaining locks as pages of data are added to the table.
If you are loading a table that has an index defined on it,
locking the table in exclusive mode also provides optimal logging.
SET AUTOCOMMIT ON and SET AUTOSAVE
Together these commands
cause the number of rows specified with AUTOSAVE
to be automatically committed
when the load buffer is full.
Should the load operation subsequently fail, you can insert the remaining
rows with the LOAD PARTIAL command.
SET SESSION DML ATOMICITY AT ROW LEVEL
Setting the DML atomicity to row level guarantees
that savepoints will not be generated during a load,
reducing logging overhead
when running in non-archive mode.
SET SESSION UNIQUE, REFERENTIAL, CHECK CONSTRAINTS DEFERRED
By deferring constraints, you avoid problems caused by the order
in which dependent values are inserted into a table when foreign
or primary key constraints exist.
Constraint checking is deferred until the end of the load operation.
You may wish to load a table first, then add constraints to it
using the ALTER TABLE statement. When the constraint is added,
the data in the table is checked for consistency, and if inconsistent
data if found, an error is generated. In this case, it is your
responsibility to remove the inconsistent tuples before attempting
to add the constraint.
Deferring constraints and setting atomicity
to the row level together have the effect of reducing logging overhead
greatly in non-archive mode because of the use of no-log pages. However,
logging is still performed for the allocation of new
pages, and the non-archive log file must be large enough to include a record
for each page.
Additional tips for improving load performance: For subsequent unload and load operations, use the INTERNAL option of
the ISQL UNLOAD and LOAD commands.
LOAD INTERNAL is 3-5 times faster than LOAD EXTERNAL.
Using archive logging during loading requires additional log file space,
but it actually requires fewer I/Os than nonarchive logging.
However, the saving of I/Os must be weighed against the time
required to back up log files so they can be reused.
Set the number of log buffers to at least 120 while loading.
Set data buffer pages to 200 to reduce the size of the linked list
and the expense of maintaining it.
Add indexes after data is loaded. For clustering indexes, sort the
data before the load.
If you wish to use rules tied to INSERT operations, create the
rules after loading the table. Otherwise, the operation of the
rule will degrade the performance of the load. Note, however,
that the operation of the rule is not retroactive, so you must
ensure the consistency of your data at the time the rule is
created.
Unloading Data |  |
UNLOAD operations are subject to the following performance considerations: To recluster data to improve the cluster count, unload the table
with an ORDER BY clause, delete all rows from the table, then reload
the data in the new sorted order.
This process is described more thoroughly earlier in this
chapter under "Reclustering a Table." Also, see the next item.
For large tables, you can UNLOAD INTERNAL to tape. The tape must
be labeled. It is important to use a blocking factor that is as close to
16K bytes as possible, since ISQL uses 16K byte chunks. If your block
size is less than 16K, the unload and load will take longer and use
more tape.
For UNLOAD operations on large amounts of data that you intend
to sort, consider unloading the data first and then sorting
with an external utility rather than using
the ORDER BY as part of the UNLOAD. In this case, you need to use
UNLOAD EXTERNAL.
Unloading and Reloading to Remove Indirect Rows |  |
The existence of indirect rows increases the amount of I/O that
must be performed to obtain data. Indirect rows also consume more
disk space, since they frequently are not able to use the shared
tuple header. Therefore, indirect rows should
be avoided. Use the following procedure to remove indirect rows
from a table: Unload the data from the table using the INTERNAL option.
Drop any referential constraints on other tables that refer to
the unloaded table.
Drop the table. This will also drop any B-tree indexes
that have been defined on the table.
Issue the CREATE TABLE statement to recreate the table. Do not
include constraints.
Issue the COMMIT WORK statement to reclaim space.
Load the data back into the table using the INTERNAL option.
Use the CREATE INDEX statement to recreate any
B-tree indexes that may have been dropped, and use the ALTER TABLE
statement to add back constraints (including referential
constraints) that may have been dropped.
Do an UPDATE STATISTICS for the table to update system catalog
information about the table and its indexes and constraints.
Issue the COMMIT WORK statement.
Unloading and Reloading to Remove Overflow Pages |  |
Since many key values can hash to the same page address, it is
possible for a page in a hash structure to become full. When
this happens, a new row must be inserted on an overflow page.
Overflow pages increase the amount of I/O that must be performed
to obtain table data. The larger the number of overflow pages,
the slower the average access to any tuple.
Observe the AVGLEN column in SYSTEM.HASH
or the AVGOVERFLOW field of the SQLMON Static Hash screen
to see the average
number of page accesses required to retrieve a particular row. As the
number of overflow pages increases, so will this number. Increasing
the capacity of the table should reduce AVGLEN, and thus improve
performance. Use the following procedure: Unload the data from the table using the INTERNAL option.
Drop any referential constraints on other tables that refer
to the unloaded table.
Drop the table. This will also drop any B-tree indexes that have
been defined on the table.
Issue a COMMIT WORK statement. This makes the space occupied by
the table and its related indexes and constraints available for reuse.
If necessary, create additional TABLE or MIXED DBEFiles and
add them to the DBEFileSet that will contain the redefined table.
Issue the CREATE TABLE statement, specifying a larger number of
primary pages than when you previously created the table. The
CREATE TABLE statement should not include any constraints.
Load the data back into the table using the INTERNAL option.
Use the CREATE INDEX statement to recreate any
B-tree indexes that may have been dropped, and use the ALTER TABLE
statement to add back constraints (including referential
constraints) that may have been dropped.
Issue an UPDATE STATISTICS statement for the table to update system catalog
information about the table and its indexes and constraints.
Issue the COMMIT WORK statement.
Tips on Deletions from Tables |  |
When deleting all rows from a table, it is more efficient to
use the TRUNCATE TABLE statement instead of the DELETE statement.
On a large table, the DELETE statement
may cause the log to run out of space,
whereas the TRUNCATE TABLE statement incurs minimal logging.
The performance of the TRUNCATE TABLE statement degrades when
the table is hashed. Dropping the table
and recreating it may be faster than using the TRUNCATE TABLE statement.
If the table specified by the TRUNCATE TABLE statement is included in
a referential constraint, it may be more efficient to
drop the constraint, issue the TRUNCATE
TABLE statement, and readd the constraint.
|