DBA Guidelines [ ALLBASE/SQL Performance Guidelines ] MPE/iX 5.0 Documentation
ALLBASE/SQL Performance Guidelines
DBA Guidelines
Several aspects of database administration affect performance. To
improve performance in these areas, you need DBA authority.
Avoiding Runtime Revalidation of Sections
When you run an application, ALLBASE/SQL automatically and transparently
checks each section for validity before executing it. When a section is
found to be invalid, ALLBASE/SQL revalidates it (if possible), then
executes it. You may notice a slight delay as the revalidation takes
place. Revalidation also can reduce concurrency, since revalidating a
permanent section involves updates to certain system catalog tables, and
this requires obtaining exclusive page locks on the system catalog until
the transaction is committed. To avoid problems of runtime revalidation,
you can re-preprocess the program or use the VALIDATE statement to
revalidate the affected modules prior to run time. Complete syntax for
the VALIDATE statement is presented in the ALLBASE/SQL Reference Manual.
You may find it appropriate to use the VALIDATE statement after executing
an UPDATE STATISTICS statement or any DDL statement, since these
statements invalidate stored sections. If you issue such statements
during a period of low activity for the DBEnvironment (at night, for
example), the DBEnvironment experiences minimal performance degradation.
* Use the UPDATE STATISTICS statement carefully, since it results in
invalidating sections. When sections are invalid, they must be
revalidated at run time before execution. For best performance,
after the initial statistics for a table have been established,
you should use UPDATE STATISTICS during periods of low
DBEnvironment activity, then use the VALIDATE statement to
revalidate modules and procedures after using UPDATE STATISTICS.
Developing Application Programs
The following tips relate to preprocessing of application programs:
* During development, frequent re-preprocessing of applications
locks system catalog pages for extended periods of time.
Therefore it is a good idea to employ separate development and
production DBEnvironments on your system. Use the ISQL INSTALL
command to move finished module files into your production
DBEnvironments. The INSTALL command locks the same system catalog
pages as the preprocessor, but it only does so a single time.
* Avoid using DDL (including the UPDATE STATISTICS and CREATE INDEX
statements) when running in multi-user mode, since DDL places
exclusive locks on system catalog resources and can reduce
throughput significantly. It is good practice to restrict DDL to
single-user applications that run during off-peak hours. You can
set the DDL Enabled flag to NO to achieve significantly higher
throughput in a multiuser environment if there is no data
definition going on.
* Section caching can result in some performance improvement. Note
that the first connect to a DBEnvironment is slower when DDL is
disabled, since extra caching is being done. See "Section Caching
and Directory Caching" later in this chapter for more information.
* Use the TERMINATE USER statement to abort a user program or ISQL
session if necessary. DBCore does a set critical while accessing
the DBEnvironment, and it only allows a program to abort at
certain strategic points. If DBCore is filling the tuple buffer,
it completes this operation before allowing a break. TERMINATE
USER allows the program to abort immediately.
Load Balancing
Load balancing is an effort to spread out I/O in your system among the
available disk devices for the most efficient operation. Large systems
can benefit significantly from an analysis of system load followed by
reallocating the DBEFiles for specific DBEFileSets or within a given
DBEFileSet. In general, the DBEFiles on which the most I/O is performed
should be placed on the fastest devices available.
For best performance, separate data DBEFiles, index DBEFiles, and log
files across the available disks on your system:
* Log files should be put on different devices from data DBEFiles
for improved performance and insurance against a possible head
crash. For the same reasons, dual log files should be put on
different devices.
* On MPE/iX, an easy way to scatter files over several devices is to
use different volumesets, which specify different devices. Any
files over 10 megabytes in size are scattered among devices by
default on MPE/iX.
* To move DBEFiles to different devices, use the SQLUtil MOVEFILE
command for DBEFiles and the MOVELOG command for log files.
* Indexes on large tables should be physically separated from data
DBEFiles for performance reasons. For best results, put the index
DBEFile on a different device by using the SQLUtil MOVEFILE
command. Doing this can have the effect of reducing the movement
of the disk heads. For small tables, the performance gain that
results from having the indexes and data on separate spindles is
probably not as apparent.
Placing Concurrently Used Objects on Different Drives.
Any two database objects that are used concurrently (that is, have I/O at
the same time) should be located on different disk drives to minimize
disk head contention. This includes any two tables that show I/O at the
same time, any two B-tree indexes that show I/O at the same time, or a
table that shows I/O at the same time the B-tree index defined on it
shows I/O. When objects are accessed simultaneously, the disk heads may
move back and forth between the objects. This movement slows access to
data. To avoid this problem use the SQLUtil MOVEFILE command to place
the competing objects on different drives.
Remember that the mapping of database objects to DBEFileSets is not
one-to-one, and that MOVEFILE moves DBEFiles, not DBEFileSets. In order
to achieve the goal of placing different objects on different drives, you
must ensure that they are in different DBEFiles. To this end, it is
useful to create separate INDEX and TABLE DBEFiles, and to define a
different DBEFileSet for each table or index that may need to be
separated from other objects.
Calculating Shared Memory Allocation
As you adjust the various configurable parameters in ALLBASE/SQL, be sure
you do not exceed the amount of shared memory available on your system.
You can derive an approximate figure for the total shared memory used (in
kilobytes) from the following formula:
SM = (4.2*DBP) + (0.5*LBP) + (0.16*NTXN) + (4.1*CBP)
where
SM = Total Shared Memory in kilobytes
DBP = DataBufferPages [ Limits vary with different systems ]
LBP = LogBufferPages [ 24-1024; default 24 ]
NTXN = Number of Concurrent Transactions [ 2-240; default 2 ]
CBP = Control Block Pages [ 17-800 (512 on Series 300); default 37 ]
The result of this equation must be within the amount of shared memory
you have available. This equation also gives an idea of the relative
effect of changing different parameters on the total shared memory size.
For the default parameters, you can use the following:
SM = 4.2*DBP + 164
For the maximum parameters, you can use the following:
SM = 4.2*DBP + 3830
Use these equations to arrive at a first approximation.
Choosing a Number of Data Buffer Pages
Data buffers are used to cache data pages touched by ALLBASE/SQL. Up to
the limit of buffer space, a data page is retained in memory until the
space it occupies is needed for another page that is on disk. You can
set the number of data buffer pages your DBEnvironment uses.
The following chart shows the minimum, default, and maximum numbers of
data buffer pages for ALLBASE/SQL on different hardware configurations:
Minimum, Default and Maximum Data Buffer Pages
--------------------------------------------------------------------------------------------------
| | | | |
| System | Minimum | Default | Maximum |
| | | | |
--------------------------------------------------------------------------------------------------
| | | | |
| HP-UX Series 700/800 | 15 | 100 | None |
| | | | |
--------------------------------------------------------------------------------------------------
| | | | |
| HP-UX Series 300/400 | 15 | 15 | 240 |
| | | | |
--------------------------------------------------------------------------------------------------
| | | | |
| MPE/iX | 15 | 100 | 16,383 |
| | | | |
--------------------------------------------------------------------------------------------------
How many data buffer pages should you allocate? There is no explicit
equation to find the ideal buffer pool size. The following general
suggestions may help, though this is not an exhaustive list:
* Total shared memory should not exceed free real memory available
on the system. Available real memory can be determined only at
run time since it depends on the total number of processes on the
system and on the type of application programs running on the
system. The DBA may be able to make estimates. Excessive shared
memory causes page faults.
* You never get a performance benefit by defining more page space
than will stay in real memory. If the data buffers force paging
of virtual space, too many buffers will degrade performance.
On MPE/iX when not using pseudo-mapped files, use about 6 to 12
times the number of active transactions. Shared memory only holds
buffers temporarily, since real buffering occurs in the operating
system. With pseudo-mapped files and on HP-UX, try to use a large
buffer pool within the limits of the first two suggestions.
However, a larger pool in general slows down checkpoints if the
fraction of dirty pages remains the same. There is a trade-off
between checkpoint time and response time.
* When possible, with pseudo-mapped files on MPE/iX and on HP-UX,
the buffer pool should accomodate all pages that are accessed
frequently. If a small table or index is frequently accessed,
then the buffer pool should accomodate all its pages, if possible.
* If a table is being accessed without much locality of reference
(i. e., with almost random access) and its total size is orders
of magnitude larger than any practical buffer pool size, then
increasing the size of the buffer pool does not help much for that
particular table. For example, if a table is 100 MBytes large,
then a 2000 page buffer pool does not work much better than a 1000
page pool.
* With respect to the previous suggestion on HP-UX and with
pseudo-mapped files on MPE/iX, if the table has a 2000 page index
that is used frequently, then a 2000 page buffer pool performs
better than a 1000 page pool.
When should you increase the number of buffer pages? There is not much
advantage in increasing the buffer pool size on MPE/iX if you are not
using pseudo-mapped files. However, if you are using pseudo-mapped files
in MPE/iX, or in HP-UX, if there is a high hit ratio, increasing the
number can help performance. If the set of applications does not have
locality of data and index pages, it may not matter how many data buffers
there are. Once you have enough data buffer pages, performance is not
very sensitive to adding more.
How do you keep a small, frequently used group of pages in memory
consistently? The trick is to cause other pages to be swapped out first.
ALLBASE/SQL employs a least recently used (LRU) algorithm to discard
older pages that have not been used recently to make room for new pages.
In order to retain a special group of pages in memory, there must be
other pages that are available for use by each read or write operation
that does not use the special group. Furthermore, these other pages must
be allowed to age sufficiently that they become less recently used than
the special group. If you use a large enough number of pages outside the
special group, each page will age to the point of being least recently
used before any of the special group is swapped out. (This point is
further clarified in the example that follows.)
Use the following suggestions to determine how many data buffer pages is
enough:
* Examine the performance-crucial transactions. (Usually there are
only a few which are sensitive or dominate performance). Look at
the tables which are used in these transactions.
* INSERT actions tend to cluster at the end of a table, so if
there is insert activity, add a buffer page for each active
user (i.e., the instantaneously active count).
* If a table or index is small enough to fit in real memory
and is heavily accessed, add the number of pages in the
table or index.
* If an index is large, but the number of high level index
pages is small and the table is heavily accessed, add a
number of pages corresponding to the number of index levels
above the leaf level.
* If a table is too big to fit in real memory and the access
is random, add a buffer page for aging for each transaction
before the small table is accessed. This allows for LRU.
* Estimate the table access pattern in your transactions. If
small tables are accessed in 50% of them or more, you
should gain by having a large enough buffer pool to hold
the small tables.
* The right number of buffers is approximately the number needed to
hold the small tables and the aged pages of the large tables.
Basic Example.
The dominant transaction in a sample application has three actions:
* Update a small table.
* Update a large table.
* Insert to a third table.
Assume that the tables for the sample occupy the following space in
DBEFiles:
* The small table and its index use 50 pages: one root index page,
10 leaf index pages, and 39 data pages.
* The large table uses 50,000 pages: 12,000 index pages (including
one root page, 100 pages of second level index, and 11,899 leaf
pages), and 38,000 data pages.
Also assume the following:
* Each insert into the third table requires 40 bytes, and the pages
of the third table fill up about every 100 transactions.
* There are about 10 active transactions in the database at any
instant.
At a given instant, the buffer pool in this scenario will come to look
like the following:
-------------------------------------------------------------------------------------------------
| Source | Oper- | Pages Already | Pages Requiring |
| | ation | In Data Buffer | Physical I/O |
-------------------------------------------------------------------------------------------------
- Small table - Update - 11 (index root and leaf) - -
- - - 10 (data) - 40 -
-------------------------------------------------------------------------------------------------
- Large table - Update - 1 (index root) - -
| | | 10 (2nd level index pages | 100 (total index 2nd level |
| | | active per transaction) | pages) |
| | | 10 (leaf pages active per | 12,000 ( total index leaf |
| | | transaction) | pages) |
| | | 10 (data pages active per | 50,000 (total data pages) |
| | | transaction) | |
-------------------------------------------------------------------------------------------------
| Third table | Insert | 10 (data pages active per | |
| | | transaction) | |
-------------------------------------------------------------------------------------------------
| Total in | | 62 | |
| Cache | | | |
-------------------------------------------------------------------------------------------------
Based on the numbers of pages given above, and assuming the default
number of data buffer pages (100), the application will require an
average of 7 I/O's per transaction, as shown in the following table:
----------------------------------------------------------------------------------------------
- Source - Operation - Number of I/O's -
----------------------------------------------------------------------------------------------
- Small table - Read Data - 1 -
- - Write Data - 1 -
----------------------------------------------------------------------------------------------
- Large table - Read 2nd level index - 1 -
- - Read leaf index - 1 -
- - Read data - 1 -
- - Write data - 1 -
----------------------------------------------------------------------------------------------
- Third table - Write data - .01 -
----------------------------------------------------------------------------------------------
- Commit - Write Log - 1 -
----------------------------------------------------------------------------------------------
- Total - - 7.01 -
----------------------------------------------------------------------------------------------
There is a 25% chance that a random access to the small table will want a
page that is already in the buffer pool. There is a 10% chance that the
large index 2nd level page is already in the buffer pool. These reduce
the I/O count to about 6.4 instead of 7, as shown in the following table:
----------------------------------------------------------------------------------------------
- Source - Operation - Number of I/O's -
----------------------------------------------------------------------------------------------
- Small table - Read Data - .75 -
- - Write Data - .75 -
----------------------------------------------------------------------------------------------
- Large table - Read 2nd level index - .9 -
- - Read leaf index - 1 -
- - Read data - 1 -
- - Write data - 1 -
----------------------------------------------------------------------------------------------
- Third table - Write data - .01 -
----------------------------------------------------------------------------------------------
- Commit - Write Log - 1 -
----------------------------------------------------------------------------------------------
- Total - - 6.41 -
----------------------------------------------------------------------------------------------
First Threshold for Performance Gain.
System performance will improve with the addition of more data buffers.
The first threshold of performance gain is when there are enough data
buffers to prevent the pages of the small table from being swapped out.
This will happen only if the small table's pages never become least
recently used. Suppose we want 62 pages to stay in memory from the
previous calculation, and there will be about 10 insert pages. In each
transaction, accessing the large table will require touching a root page,
two more index pages, and a data page. If the two non-root index pages
and the data page are read each time, we touch three pages that are new.
We need enough pages in memory so that pages from the large table will
always be considered less recently used than the pages from the small
table. A minimum estimate would be 120 pages for that; a more
comfortable estimate would allow 160 pages. The buffer pool in this
scenario (using the 120 page minimum) will come to look like the
following:
-------------------------------------------------------------------------------------------------
| Source | Oper- | Pages Already | Pages Requiring |
| | ation | In Data Buffer | Physical I/O |
-------------------------------------------------------------------------------------------------
- Small table - Update - 11 (index root & leaf) - -
- - - 40 (all pages in buffer) - -
-------------------------------------------------------------------------------------------------
- Large table - Update - 1 (index root) - -
| | | 40 (2nd level index | 100 (total index 2nd level |
| | | pages--active and "aging") | pages) |
| | | 40 (leaf pages--active and | 12,000 ( total index leaf |
| | | "aging") | pages) |
| | | 40 (data pages--active and | 50,000 (total data pages) |
| | | "aging") | |
-------------------------------------------------------------------------------------------------
| Third table | Insert | 10 (data pages active per | |
| | | transaction) | |
-------------------------------------------------------------------------------------------------
| Total in | | 182 | |
| Cache | | | |
-------------------------------------------------------------------------------------------------
Here is the resultant pattern of I/O's:
----------------------------------------------------------------------------------------------
- Source - Operation - Number of I/O's -
----------------------------------------------------------------------------------------------
- Small table - Read Data - 0 -
| | Write Data | 0 (deferred until |
| | | checkpoint) |
----------------------------------------------------------------------------------------------
- Large table - Read 2ND level index - .9 -
- - Read leaf index - 1 -
- - Read data - 1 -
- - Write data - 1 -
----------------------------------------------------------------------------------------------
- Third table - Write data - .01 -
----------------------------------------------------------------------------------------------
- Commit - Write Log - 1 -
----------------------------------------------------------------------------------------------
- Total - - 4.91 -
----------------------------------------------------------------------------------------------
Second Threshold for Performance Gain.
The next performance gain comes from fitting all the pages in the second
level of the large table's index into the data buffer cache. There are
100 pages in the second index level, so the buffer pool in this scenario
will come to look like the following:
-------------------------------------------------------------------------------------------------
| Source | Oper- | Pages Already | Pages Requiring |
| | ation | In Data Buffer | Physical I/O |
-------------------------------------------------------------------------------------------------
- Small table - Update - 11 (index root & leaf) - -
- - - 40 - -
-------------------------------------------------------------------------------------------------
- Large table - Update - 1 (index root) - -
- - - 100 (2nd level index pages) - -
| | | 100 (leaf pages--active and | 12,000 ( total index leaf |
| | | "aging") | pages) |
| | | 100 (data pages--active and | 50,000 (total data pages) |
| | | "aging") | |
-------------------------------------------------------------------------------------------------
| Third table | Insert | 10 (data pages active per | |
| | | transaction) | |
-------------------------------------------------------------------------------------------------
| Total in | | 362 | |
| Cache | | | |
-------------------------------------------------------------------------------------------------
Here is the resultant pattern of I/O's:
----------------------------------------------------------------------------------------------
- Source - Operation - Number of I/O's -
----------------------------------------------------------------------------------------------
- Small table - Read Data - 0 -
| | Write Data | 0 (deferred until |
| | | checkpoint) |
----------------------------------------------------------------------------------------------
- Large table - Read 2nd level index - 0 -
- - Read leaf index - 1 -
- - Read data - 1 -
- - Write data - 1 -
----------------------------------------------------------------------------------------------
- Third table - Write data - .01 -
----------------------------------------------------------------------------------------------
- Commit - Write Log - 1 -
----------------------------------------------------------------------------------------------
- Total - - 4.01 -
----------------------------------------------------------------------------------------------
This analysis predicts one plateau of performance at about 180-200 pages
in the buffer pool, a second plateau at about 360-400 pages. The next
plateau would require something like 25,000 pages, which is impractical
for most installations.
In practice, there is little penalty for extra pages, so use a generous
allocation (that is, 200 or 400 pages or more), if you can afford to.
NOTE The previous calculations assume only the pages involved in one
transaction type involving a small and a large table. If you have
other transactions contending for the database, be sure to allow
adequate buffer space for them too.
Cautions.
* The size of the page pool combined with other demands on memory
should not exceed available real memory.
* On a Series 300 system, where real memory is in the 2 MB - 8 MB
range, the tradeoff deserves more attention.
* Checkpoints may take longer if the buffer pool is larger. A
checkpoint writes to disk all dirty pages in the buffer pool. If
there is a consistent fraction of dirty pages, the checkpoint
takes longer with a large buffer pool. However, if the large
buffer pool is essentially filled with read-only data, the
checkpoint may not take much longer. A rough estimate can be
obtained by looking at the buffer pool size derived in the
analysis above.
Choosing a Number of Log Buffer Pages
The number of log buffers is independent of the number of data buffers.
The log buffer pages are only 512 bytes (in contrast with the 4096 byte
pages for the data buffers). If the log buffer pool becomes full before
any transaction completes, then at least two I/Os will be needed.
You should provide enough log buffers to allow all active transactions to
have log space. Additional buffers will not help performance, but adding
pages probably doesn't hurt either.
In the above transaction, we assume that the data in the log takes 2
pages. (Before and after images of the changed data cause about 600
bytes to be logged; the insert is 100 bytes; we allow 100 bytes for log
overhead; and we round up to 1024 bytes or 2 pages.) This implies that
20 log buffer pages should be sufficient for 10 transactions. If there
are occasional bursts of more than the average, 40 or 50 pages might be a
good idea.
Choosing the Number and Size of Log Files
ALLBASE/SQL gives you great flexibility in choosing the number and size
of log files. The following suggestions may be useful in making choices
for your system.
Nonarchive Log Guidelines.
The size of nonarchive log files determines the frequency of automatic
system checkpoints, and it therefore determines rollback recovery time.
The total size of the log (all files taken together) determines the
largest transaction that can be accomodated in the DBEnvironment.
* Total log space should be large enough to accomodate changes from
the largest transaction. Otherwise large transactions may always
abort.
* An automatic checkpoint is done every time a log file gets full.
By choosing a specific file size, you can force a checkpoint to
take place at specific intervals, thereby controlling the amount
of time required for rollback recovery.
* In general, a large log file is good for nonarchive logging.
However, a checkpoint can cause significant loading to be put on
the system when the log file is large. When a checkpoint occurs,
all data pages relating to the transactions currently held in the
log file must be forced to disk if they have not already been
swapped out. If very large log files are used, the checkpoint
process could flood the I/O subsystem and cause interference with
other higher priority processing. If such flooding occurs, use a
smaller log file or use the CHECKPOINT statement to force a
checkpoint at regular intervals.
* The transaction that runs out of log space triggers a checkpoint,
and this causes all transactions to wait until the buffers are
flushed.
* Use a large log buffer size to avoid frequent flushing of the
buffer to disk.
* Large non-archive log files can reduce the number of system
checkpoints. Use the SYSTEM.COUNTER view to display the number of
system checkpoints.
* Checkpoints should be postponed as far as possible, so long as
rollback recovery time is within limits. The reason is that
checkpoints flush dirty pages, and this prevents the saving of
writes in those cases where the same page gets updated frequently.
Also, checkpoints increase disk contention.
Archive Log Guidelines.
* The transaction that runs out of log space triggers a checkpoint,
and this causes the transaction to wait until the buffers are
flushed. Other transactions may continue if logging is in archive
mode.
* In archive mode, use at least two log files, so that one can be
backed up while the other is being used.
* In archive mode, use large log files to minimize the frequency of
log backups and checkpoints.
Sorting Operations
The sort method in ALLBASE/SQL is memory-intensive, using the tournament
sort algorithm for sorting and merging. It sorts in two phases:
* Run generation. It reads the input rows, sorts them in the
tournament tree and generates runs (a run is a list of sorted
tuples). The number of runs generated depends on the input data
and on the sort memory size. For large sort operations, the
sorted runs (intermediate results) may be written to scratch files
created dynamically in user-defined TempSpaces and purged after
sorting is completed. If all input tuples fit in sort memory,
then the merge phase is completely avoided and no scratch files
are used.
* Merge Phase. In the second phase, the sorted runs are merged.
The merge width depends on the size of the sort memory. If the
number of generated runs is less than the merge width, then only
one merge is done. Otherwise several merge passes may be
required.
Using TempSpaces.
Create TempSpaces on your system to accomodate the scratch files needed
for sort operations. You use the CREATE TEMPSPACE statement to define a
group (MPE/iX) or directory (HP-UX) in which ALLBASE/SQL may open scratch
files. Scratch files are opened to store intermediate results of
operations such as sorts from queries using ORDER BY, etc., and from
index creation. It is recommended that the DBA create at least one
TempSpace in the DBEnvironment. If you do not create a TempSpace,
ALLBASE/SQL opens scratch files in the current group (on MPE/iX) or in
the /tmp directory (HP-UX).
CREATE TEMPSPACE does not itself use any disk space; scratch files are
not created at CREATE TEMPSPACE time. Instead, they are dynamically
created and purged during sort operations. Disk space under the defined
TempSpaces may dynamically go up and down when scratch files are created
and purged by ALLBASE/SQL.
Tips for TempSpace Usage.
* If large sorting operations or index creation are expected, use
CREATE TEMPSPACE to define TempSpace locations. Be sure to
specify a MaxTempFileSize large enough for your needs.
* On MPE/iX, create one TempSpace per volumeset to spread scratch
files across volumesets and avoid filling up a single volumeset.
On HP-UX, create one TempSpace for each disk partition that is
available for scratch files.
* ALLBASE/SQL opens a new scratch file after reaching the limit
specified in the CREATE TEMPSPACE statement. If the sort is large
and the default is small, ALLBASE/SQL may have to open many small
scratch files. To avoid this, specify a large MaxTempFileSize
value. For small sorts, ALLBASE/SQL does not create large scratch
files even if MaxTempFileSize is large; hence, it is safe to
specify MaxTempFileSize generously.
* If multiple TempSpaces are defined in the DBEnvironment,
ALLBASE/SQL opens scratch files by rotation in each TempSpace.
* When doing large sorts on HP-UX systems, choose a MaxTempFileSize
value that is large enough to prevent the creation of more scratch
files than permitted on your HP-UX system. The default limit of
open files for one process (configurable in the kernel file)
is 60 files. Thus, the total number of files open by
ALLBASE/SQL--including scratch files and other files opened by the
system--cannot exceed this limit. If the limit is reached,
ALLBASE/SQL returns DBERR 3061 together with an HP-UX file system
error number. If this error appears, drop and recreate your
TempSpaces with larger MaxTempFileSize values.
Disk Space for Sorting.
All intermediate sort results are written to scratch files, not to any
DBEFileSet in the DBEnvironment. Sorting may require 100% to 200% of the
input data size in temporary disk files for intermediate results. If a
join and an ORDER BY are both involved, the amount could be as much as
300%. However, scratch files are purged as soon as the sort operation is
finished.
Final sort output for queries with GROUP BY, ORDER BY or DISTINCT is
written to the SYSTEM DBEFileSet, requiring space worth 100% of the
sorted output. For CREATE INDEX, the final output is directly written to
the DBEFileSet containing the index, requiring space worth the size of
the index.
Summary:
* CREATE INDEX: 100-200% of the size of the input key data in
TempSpace files and 100% in the INDEX DBEFileSet.
* DML statements using a sort: 100-300% of the size of the input
data in TempSpace files and 100% in the SYSTEM DBEFileSet.
Controlling the Use of Temporary Space.
The order in which you carry out database operations may determine the
amount of temporary space required. For example, if you create a large
table, load it, then create an index, you will probably use scratch file
space for sorting; but if you create the table, then create the index,
and load after index creation, you will not use scratch files. In the
latter case, of course, the load will be 3 to 10 times slower because of
the need to create index entries during loading. The size of the index
will also be about 50% larger if you are loading unsorted data.
Memory Utilization in Sorting.
Sorting is memory-intensive, and the amount of memory required is
estimated from the number of input tuples. This in turn is estimated
based on table statistics, so it is important to have up-to-date
statistics (especially for large tables) when executing an ORDER BY,
GROUP BY, or CREATE INDEX operation.
Performance Hints for Large Sorts.
* Issue the UPDATE STATISTICS statement after loading a large table
and before issuing the CREATE INDEX statement. Issue another
UPDATE STATISTICS after CREATE INDEX to get the correct statistics
for the index.
* Make sure you have enough disk space in the TempSpaces to hold the
scratch files.
Use of the Sort/Merge Join Algorithm.
ALLBASE/SQL uses two join methods: one is nested-loop join, and the
other is sort/merge join. You cannot directly select a join method; the
optimizer chooses one depending on the query and the statistics in the
system catalog for the tables involved. However, it may still be useful
to understand the differences in the algorithms when designing your
databases.
The nested loop method scans the second table once for each qualifying
row in the first table. This can require considerable I/O and CPU time
when the table is larger than the buffer cache, but it may be the optimal
approach for joins involving small tables.
Sort/merge join can be viewed in two phases: the sort and the merge/join
phase.
* The sort phase. In order for the sort/merge join to work, tuples
from two or more tables to be joined together are sorted in
ascending order. If the scan on the table is an index scan on the
joined column (that is, there is a join on col1 and the index
being picked is also on col1 or on col1, col2), then the sort can
be skipped since the returned tuples from the scan are already
sorted.
* The merge/join phase. Multiple tables are merged or joined
together without scanning a table over and over as happens in a
nested loop join. This saves a significant amount of I/O and CPU
time, especially for large tables.
Sort/merge join can be of help for result tables that are larger than the
buffer cache, since the nested loop method with multiple scan passes
would result in more I/O's. Sort/merge join also helps for tables that
have indexes that are larger than the buffer cache size.
Temporary Space in the SYSTEM DBEFileSet.
Query results from queries using ORDER BY and GROUP BY clauses are stored
in temporary pages in the SYSTEM DBEFileSet. Be sure to include enough
TABLE or MIXED DBEFile space in SYSTEM to accomodate your largest sorted
query. As your system catalog grows, you should monitor the DBEFile
space in SYSTEM occasionally (after doing an UPDATE STATISTICS on at
least one system view) to make sure enough temporary space is still
available. As needed, add DBEFiles to SYSTEM to supply temporary pages.
Section Caching and Directory Caching.
Section caching retains a number of sections in memory between
transactions so that they do not need to be read in again for the next
execution. Up to 12 sections (4 on Series 300 and 400 computers) are
retained. Directory caching stores in memory the DBCore directory, which
contains the locations of tables and indexes. Directory caching is
turned on when you set the DDL Enabled flag to NO by using ALTDBE in
SQLUtil. Section caching is more efficient when DDL Enabled is set to
NO.
Section caching is subject to the following rules:
* When there are 12 or fewer sections in memory (four for Series 300
and 400 computers), the system does not try to delete any
permanent sections at the end of a transaction.
* When there are more than 12 sections in memory (four for Series
300 and 400 computers), the system deletes only those sections
that are not opened. Sections are considered opened if they are
cursor sections and have been opened by the OPEN CursorName
statement.
* Dynamic sections are not deleted even if the cache limit is
exceeded.
* More than 12 (or 4) sections remain in memory if all of them are
opened cursors. The only limit is the amount of user heap space
available.
This feature is helpful for an application that executes the same
sections again and again, especially if it has a small number of
sections.
Setting Limits for Section Caching
By default, ALLBASE/SQL allocates shared memory for up to 4 cached
sections on Series 300 and 400 systems, and 12 on Series 700, 800, and
900 (MPE/iX) systems. You can increase this number by setting an
environment variable. In HP-UX, the variable name is HPSQLsectcache. In
MPE/iX, the JCW is called HPSQLSECTCACHE. HPSQLSECTCACHE can be set to
any value from 4 to 128.
Using Multi-connect Functionality
It is possible to establish a maximum of 32 simultaneous database
connections. When your application must access more than one
DBEnvironment, there is no need to release one before connecting to
another. Performance is greatly improved by using this method rather
than connecting to and releasing each DBEnvironment sequentially.
For more detailed information, refer to the chapter "Transaction
Management with Multiple DBEnvironment Connections" in the ALLBASE/SQL
Release F.0 Application Programming Bulletin and the section "Using
Multiple Connections and Transactions with Timeouts" in the ALLBASE/SQL
Reference Manual chapter "Using ALLBASE/SQL."
Using Timeouts to Tune Performance
When an application requests a database resource that is unavailable, it
is placed on a wait queue. If the amount of time the application waits
is longer than the timeout value specified for a given DBEnvironment
connection, an error occurs and the transaction is rolled back. Your
strategy for specifying timeout values and handling timeout errors
depends on the specific needs of your application and on your business
procedures. By default, the timeout value is infinite.
For example, a transaction may require resources that are locked by
another transaction that consumes a great deal of time to execute. In
this case, you could specify a reasonable amount of time for the
transaction to wait before a timeout occurs. Such a strategy might be
essential in a distributed environment. For more detailed information,
refer to the chapter "Transaction Management with Multiple DBEnvironment
Connections" in the ALLBASE/SQL Release F.0 Application Programming
Bulletin and the section "Setting Timeout Values" in the ALLBASE/SQL
Reference Manual chapter "Using ALLBASE/SQL."
MPE/iX 5.0 Documentation