![]() |
![]() |
|
|
![]() |
![]() |
ALLBASE/SQL Performance and Monitoring Guidelines: HP 3000 MPE/iX Computer Systems > Chapter 2 Guidelines on Logical and Physical Design![]() Logical Data Design |
|
Logical data design concerns normalization of data, followed by the creation of tables, views, indexes, and authorization schemes. When designing logical objects in ALLBASE/SQL, know your queries beforehand. The following paragraphs present ways of adjusting your design to arrive at the best performance for the functionality you need to implement. The following topics are discussed:
Normalization results in tables with a relatively small number of columns, and a minimum amount of redundancy within a given table. In general, this yields an efficient design, but there are some drawbacks. The following table shows some of the pros and cons of normalization. Pros and Cons of Normalization
Here are some tips:
For example, consider the following two tables in third normal form:
If you have a continual need to find the location of the employee, you might consider combining the two tables. In this case, you will incur extra maintenance if the location of the department changes, since it now appears in every tuple of Table C:
When should you not combine tables, but stick with third normal form? Retain the third normal form for tables in which you must maintain several rows of data which are fully dependent on one particular key:
Since Table E may contain many orders for each customer number, the amount of redundancy would be unacceptable if you were to include Table D's columns in each row of Table E. Combining tables can also increase the concurrency between batch processes and on-line activities, since there is less contention for I/O resources within the system. Look for tables that are consistently being joined, and consider combining tables or duplicating some columns to prevent the need for the join. Considerations:
Large tables that contain historical data may be good candidates for division into two tables with the same column definition. To identify potential candidates for horizontal partitioning, review all your queries, looking for table scans with a low number of rows returned. Review other processes which would be impacted by dividing the table in two. Note especially the impact on processes which need all rows. Tune for the most critical or most frequently used processes. As an example, consider a table containing daily stock prices. If the current year's prices are the only ones that are accessed frequently in looking up prices, then earlier data can be moved to a different table. Suppose the table is created as follows:
You can define a different table with the same column definition for the historical data:
Then use a Type 2 INSERT to transfer the rows:
Finally, delete the 1992 data from the Securities table:
If some columns are infrequently used or if they require special security, consider moving them to their own table. Be sure to include the primary key values with every partition. Vertical partitioning will make the rows smaller and more efficient to process (since there will be more rows per page). A disadvantage of having more rows per page is the potential of reducing concurrency because of page level locking. The use of row level locking may help offset this disadvantage. You can include calculated data in tables instead of performing calculations when data is retrieved. In this case, the overhead of the calculation occurs at INSERT or UPDATE time rather than at query time. Stored procedures are a useful way of recalculating whenever data values change. Some calculations can be done using ALLBASE/SQL functions such as SUM, MIN, AVG, etc. You can create a view to return the results of a calculation, but this may degrade performance in some cases. Specifically, do not create views that contain aggregate functions or GROUP BY clauses. For better performance, define the view without aggregates or GROUP BY clauses and then apply the aggregation and grouping to the SELECT done on the view. The use of indexes affects performance directly, since an index scan is generally faster than a serial scan under the following conditions:
Indexes are fast for retrievals, but slow for inserts, deletes, and some updates. You should consider the following questions when designing indexes:
If you are using unique or primary key constraints, ALLBASE/SQL automatically creates unique indexes which can be used for data retrieval. If you are using referential constraints, ALLBASE/SQL creates a PCR (parent-child relationship), which is an index on the two tables in the referential relationship. These may be less efficient than an index you create yourself.
Indexes for large tables require special consideration:
Periodic review of indexes ensures the greatest efficiency. Use the following recommendations:
Applications which frequently access rows in index order will have better performance if the rows are physically stored on disk in index order. Performance is better because I/O is minimized. For example, a query that needs to read all rows that have a certain value will be faster if all those rows reside on one page, instead of on many pages. To automatically maintain table data in index order, ALLBASE/SQL allows B-tree indexes and referential constraints to be defined as clustering when they are created. When a clustering index has been defined on a table, rows added to the table are placed physically near other rows with similar key values whenever there is space available on the page. If space cannot be found on the desired page, then the next row is inserted onto the first page in the DBEFileSet that has space. A clustering index does a good job of maintaining index order when the number of insertions is approximately equal to, or less than, the number of deletions on the table. Under these conditions, space for a newly inserted row will usually be found on a page having rows with similar keys. But when the number of insertions is greater than the number of deletions, space will usually not be found on the desired page for the row. In this case, time and I/O will be wasted in calculating, locating, and examining the desired page even though the row must be inserted in some other page in the DBEFileSet. Thus, you pay the price at INSERT time for the attempt to optimize the placement of the row using a clustering index. If space does not usually exist on table pages, then a clustering index should not be defined on the table, even if your applications frequently access the data in index order. You should use a non-clustering index instead, and manually maintain the table data in index order. You do this by monitoring the cluster count of the index, and performing UNLOAD operations followed by sorting and reloading. This process is explained more full later in this chapter, in the sections "Monitoring the Cluster Count" and "Reclustering a Table." Consider using a clustering index when accessing a subset of rows with the same key value or when retrieving a sequential range (including an ORDER BY, GROUP BY, DISTINCT, or UNION). Note the following:
If a table rarely has rows deleted from it, and if it frequently has rows added to it, then you should not define a clustering index. Extra time (and I/O) will be wasted when rows are inserted in looking for free space for the row. If you decide to use clustering, sort your data before loading the table. Create the clustering index after sorting data and loading the table. To measure how well clustered a B-tree index is (whether it is defined as clustering or non-clustering), ALLBASE/SQL uses the cluster count value, which is stored in the CCOUNT field of SYSTEM.INDEX or SYSTEM.CONSTRAINTINDEX when you do an UPDATE STATISTICS. The CCOUNT is the number of data page accesses that were made during the scan. The CCOUNT is increased by one each time the next row is not stored on the same page as the last row seen. The same page can be accessed multiple times if the data is not physically stored in index order. The best case is when the CCOUNT value is equal to the number of pages in the table. If the CCOUNT is higher than this value, then more I/O than is minimally necessary might be required for an index scan over the entire table. The worst case is when the CCOUNT value equals the number of rows in the table. To monitor the cluster count with SQLMON, invoke the Static Cluster screen and examine the CCOUNT, TOT ROWS, and UNLOAD/LOAD SUGGESTD fields. You can also query the system catalog to examine the cluster count for all B-tree indexes:
Use the following query to examine the cluster count for all unique or referential constraints:
You should use this command after doing an UPDATE STATISTICS on the tables of interest to make sure the cluster count is up-to-date. For indexes used in sorting or for retrieving large numbers of rows, the cluster count is especially important. A low cluster count may indicate good performance. Note that a clustering index does not necessarily result in a good cluster count. If the UNLOAD/LOAD SUGGESTED value of the SQLMON Static Cluster screen is high, or if the CCOUNT in SYSTEM.INDEX or SYSTEM.CONSTRAINTINDEX grows to twice the number of pages in a table, you should recluster the table. You may be able to improve performance by reclustering the table using the following procedure:
For some tables, hashing offers better performance than B-tree indexes. Hashing works best with a fairly uniform distribution of key values, spreading the corresponding rows evenly over the number of pages available. A key with a skewed distribution will attempt to place all rows on a correspondingly skewed set of pages. The worst key results in hash values that cluster tightly in a narrow range of primary pages, leaving others empty. You must define the hash at the same time you create the table. The hash key must be unique. Considerations:
It is important to consider whether a B-tree index is an appropriate choice for a given table. The larger the table and the lower the level of update activity, the greater the benefit of a B-tree index. A B-tree never hurts a read-only application. Indeed, B-tree indexes are usually defined to improve the read access requirements of an application. B-tree indexes can, however, degrade the performance of an application that updates the table on which the B-tree is defined because whenever a row is inserted or deleted, the B-tree index must also be updated. In addition, if the application updates a key column in any row, then the B-tree index must also be updated. The needs of the applications that access a table must carefully be considered before a B-tree index is defined on the table. If the mix of transactions involving the table is primarily read-intensive, then a B-tree index can be defined to improve performance. If the mix is primarily update-intensive, then consider the following:
In order for the optimizer to make the best choice among available indexes, use the UPDATE STATISTICS statement on tables after creating indexes and after doing many inserts or deletes. After updating statistics, use the VALIDATE statement to revalidate any stored sections that were invalidated by the UPDATE STATISTICS statement. While revalidation is taking place, performance may be impaired due to obtaining exclusive locks on the system catalog. To avoid concurrency problems, use the VALIDATE statement after using the UPDATE STATISTICS statement. Preferably, you should use both statements during periods of low activity.
The creation of authorization schemes can also affect performance, since ALLBASE/SQL has to look up authorizations before carrying out specific commands. The more complex the authorization scheme, the more system table pages must be read by the query processor. The only authorization verified at run time for non-dynamic queries is RUN authority. Therefore, it helps to keep a simple hierarchy for RUN authorization. In general, user group hierarchies should be kept simple (one or two levels) unless it is absolutely necessary to use more. If the user running an application is the owner of the module or has DBA authority, the authorization check is faster. The overhead of authorization checking is higher for dynamic queries. This is because the first execution of a dynamic query in a transaction requires the query to be re-authorized. It is therefore all the more important to keep the authorization hierarchy simple. In designing authorization schemes, you need to weigh the needs of security and ease of maintenance against your requirements for performance. |
![]() |
||
![]() |
![]() |
![]() |
|||||||||
|