![]() |
![]() |
|
|
![]() |
![]() |
ALLBASE/SQL Performance and Monitoring Guidelines: HP 9000 Computer Systems > Chapter 1 Basic Concepts in ALLBASE/SQL Performance![]() Optimization |
|
Optimization is the process by which ALLBASE/SQL finds the most appropriate access path to data for the execution of a particular SQL statement. An access path is one of the following ways of getting at data:
Usually the most appropriate path is the one requiring the fewest I/O operations needed to read requested data. Optimization also includes decisions about join order, join method (nested loop or sort/merge), and sort operations (indexed or not). Optimization is primarily done on the basis of elements in the WHERE clause of an SQL statement. The WHERE clause may contain one or many components, known as factors, which limit the amount of data that will be manipulated by the statement. Information about these factors can potentially be used to improve the performance of a query. The following example has two EQUAL factors:
Assume that C1 is defined as CHAR(40) NOT NULL and C2 is defined as INTEGER NOT NULL. To retrieve the data, the query processor must scan the table to retrieve the rows in which C1 is Jones and C2 is 123534. There are several ways of doing this:
Assuming there are B-tree indexes on both columns, what would the best approach be? In order to answer this, we need to consider several variables:
If the table occupies only 3 pages in the DBEFile, the most that would be required to access the data without using an index is 3 I/Os (plus some I/Os to access system information; we ignore this hereafter). If the index occupies only 2 pages, the same query might need to access as few as 5 pages, but it could conceivably access many more than this, if the cluster count is high. There is clearly no advantage in using an index in this case. As the size of the table increases, the use of an index may become more advantageous. Assume that the table has 10,000 pages, that data is stored in sorted order and that key values are found on 100 data pages. In this case, 10,000 I/Os would be needed to do the query with a serial scan. If a B-tree index on the table has 3 levels and if key values are found on two index leaves and 100 data pages, then only 104 I/Os would be needed for the same query. The index scan clearly makes sense now. How can the optimizer tell whether to use a B-tree index on C1 or on C2, assuming that both exist? Index predicate selectivity and cluster count are the most important factors; the size of the index and the characteristics of the key also determine the choice. As an example of selectivity, suppose the index on C1 resulted in the selection of 200 rows, but the index on C2 resulted in 2000. Other things being equal, the optimizer would select C1 as the index with the higher selectivity, that is, the index that retrieves the smallest number of rows for its predicate. An index on C1 would be considerably larger than one on C2. Since key values of C1 are 40 bytes, and key values in C2 are 4 bytes, the B-tree on C1 would be ten times the size of the B-tree on C2, and the number of I/Os needed to retrieve the same number of key values using C1 would be ten times the number needed for the index on C2. In this case, if selectivity is the same, the optimizer would choose C2. The cluster count of an index indicates the number of times ALLBASE/SQL has to access a different data page to retrieve the next row during an index scan. Assume the table is sorted in the order of keys in C1. In this case, the cluster count for a B-tree on C1 would be equal to the number of pages on which rows are found. Let's say this is 100 pages for C1. Assume the same selectivity for C2, but random distribution of key values over the pages of the DBEFile in which data is stored. Each row retrieved with C2 would require a new I/O, and the cluster count could be closer to the number of rows in the table. In the present case, C1 is chosen. Cluster count is described in more detail in the "Guidelines on Logical and Physical Design" chapter. To monitor the cluster count, run SQLMON and go to the Static Cluster screen. The optimizer makes all its choices in a similar fashion, weighing the relative cost of each scan plan and eventually choosing the most appropriate one. |
![]() |
||
![]() |
![]() |
![]() |
|||||||||
|