HP 3000 Manuals

Handling Large IMAGE Tables [ Information Access Server: Database Administration ] MPE/iX 5.0 Documentation
Information Access Server: Database Administration

Handling Large IMAGE Tables 

Access Server performs slowly when a large IMAGE table is loaded.  For
example, the serial load of a table from a 60,000-record dataset can take
from 10 minutes to an hour, depending on the model HP 3000 you're using
and the demand on system resources at the time of the load.

There are two major ways to alleviate this problem:

   *   Users, in accessing a large IMAGE table, can first define a query
       so that the loaded table is considerably smaller.

   *   For situations where view tables make sense (when a user routinely
       needs a well-defined subset of a large table), the DBA can
       structure the Where Clause of the view table definition to
       optimize performance.

These methods are discussed in detail below.

When Does a Load Occur? 

There are three cases when a load of the complete IMAGE table takes
place:

   *   The user goes to the Display Table screen without specifying a
       query.

   *   The user initiates a sort operation on the Sort Table screen
       without specifying a query.

   *   The user initiates a summarize operation on the Summarize Table
       screen without specifying a query.  (Unlike local tables, there is
       no need to sort columns in a remote table before summarizing
       them.)

When quick performance is important to your users, they will want to
avoid these situations if at all possible.

Using the Where Clause to Reduce Table Size 

Below is a discussion of the role of the Where Clause in reducing the
size of the table to be loaded, and how you can structure the Where
Clause to optimize your view table definitions.

Serial Reads Using the MATCH Operator.   

If your users consistently access the same part of a large IMAGE table,
the performance will be precisely the same whether the user builds a
"view" by drawing the tables together in Access PC or in the Host Batch
Facility, or accesses a view table you have defined using the Add View
Table screen.  But defining a view table saves users the unnecessary
effort of reconstructing the same view of the data every time they need
to reference it.

Suppose, for example, that in the database SAMPL2 (from the tutorial in
the Information Access:  Learning the Administrator Utility manual), the
INVENTORY detail dataset contained 20,000 records.  Suppose, further,
that you have a group of users interested only in paper products, another
group interested only in desk products, and so on.

You might first configure the entire dataset as one IMAGE table
TOTAL-INVENTORY.

Then you might configure four view tables, one for each product line,
using the MATCH operator and the wildcard character @.  For paper
products, for example, the view table PAPER-INVENTORY could be configured
thus:

     Item Clause:  PRODUCT-NBR,WAREHOUSE,BACKORDERFLG,ON-HAND-QTY 

     Using Clause:  TOTAL-INVENTORY 

     Where Clause:  PRODUCT-NBR MATCH "PAP@" 

The other three view tables would look identical except that you would
replace PAP (in the Where Clause) with DES, FIL, or PEN.

If a user now accesses the view table PAPER-INVENTORY, it is likely to be
a good deal smaller than the original table (perhaps 5,000 instead of
20,000 records).  And the load need not occur until the user reduces it
further with a Define Query.


NOTE  But suppose that, infrequently, the same user only wants to see
      records for envelopes, for which PRODUCT-NBR is PAP006.  In this
      case, since a chained read can be set up on the Define Query
      screen, it would be much quicker for the user to select the IMAGE
      table TOTAL-INVENTORY and define a query against it (= PAP006),
      than to select and pare down the view table PAPER-INVENTORY with
      the same query.  Chained reads are discussed below.


Chained Reads.   

The disadvantage in using the MATCH operator is the time it takes to
perform serial reads on a large dataset.  If you can take advantage of
chained reads on search items in an IMAGE database or keys in a KSAM
file, you can speed up the search considerably.

For example, the WAREHOUSE item is a search item in the INVENTORY
dataset.  Dividing the IMAGE table TOTAL- INVENTORY into several view
tables, one for each warehouse, is easily done and highly efficient.

Suppose, as in the DBA tutorial, that one of your warehouses is in
Fresno.  You could set up a view table called FRESNO- INVENTORY this way:

       Item Clause:  PRODUCT-NBR,BACKORDERFLG,ON-HAND-QTY 

       Using Clause:  TOTAL-INVENTORY 

       Where Clause:  WAREHOUSE = "Fresno" 

View tables for the other warehouses would differ only in the Where
Clause.

The chained read on the value of WAREHOUSE happens very quickly.
Moreover, the size of the table to be loaded is considerably smaller than
the original IMAGE table.  The result is a reduction in the time your
users need to wait for their information.  (Note:  But for long chains,
e.g.  where 50% of the records are chained, chained reads may actually
take longer than serial reads.  The shorter the chain, the better.)

Structuring for Chained or KSAM Keyed Reads.   

To optimize performance when dealing with large IMAGE datasets or KSAM
files, construct Where Clauses to use chained or keyed reads rather than
serial reads wherever possible.

The following guidelines indicate how to do this:

   *   A Where Clause of the form

                         DATA-ITEM = constant 

       results in a chained or keyed read if the data item is a search
       item of a detail dataset.  If, however, the relational operator is
       anything other than the equal sign (that is, <, >, <=, >=, or
       MATCH), a serial read always occurs.

   *   A Where Clause of the form

                      DATA-ITEM RELOP constant 

       results in a serial read if the data item is not a search or key
       item.  (RELOP indicates any of the relational operators =, <, >,
       <=, >=, or MATCH).

   *   A Where Clause of the form

                     DATA-ITEM1 RELOP DATA-ITEM2 

       results in a serial read if DATA-ITEM1 and DATA-ITEM2 are members
       of the same dataset.

       If the data items are members of different datasets, we are in a
       PRODUCTION or JOIN situation.  For a PRODUCTION, chained reads
       will be used if RELOP is an equal sign and at least one of the
       data items is a search item.  For a JOIN, a chained read is
       triggered by the existence of search items in the Using Clause.  A
       Where Clause that can use a chained read will also increase the
       efficiency of JOIN processing.

   *   The use of AND and OR in a Where Clause can impact the ability of
       Access Server to use a chained read.  In the examples below,
       assume that all data items are from the same IMAGE dataset.

               DATA-ITEM1 = constant1 AND DATA-ITEM2 = constant2 

       results in a chained read if either or both of the data items is a
       search item.

               DATA-ITEM1 = constant1 OR DATA-ITEM1 = constant2 

       results in two chained reads on the same path if DATA-ITEM1 is a
       search item.

               DATA-ITEM1 = constant1 OR DATA-ITEM2 = constant2 

       always results in a serial read, even if both DATA-ITEM1 and
       DATA-ITEM2 are search items.  Using two chained reads on different
       data items would result in duplicate records if the chains
       intersect.

       In clauses containing both AND and OR, you should use parentheses,
       if possible, to group all clauses known to require serial reads,
       and AND with a clause known to use a chained read.  The result
       will be that the required data is located using a chained read.
       Set it up like this:

                (chained-read clause)   AND (serial-read clause) 

       When nested levels of view tables are used (that is, when TABLE1
       specifies TABLE2 in its Using Clause, TABLE2 specifies TABLE3 in
       its Using Clause, and so forth), then if ANY of the tables can use
       a chained read, ALL processing to satisfy the nested table
       specification will be chained.  This occurs because nested tables
       generate a logical AND of selection criteria.


NOTE  Mixing ANDs and ORs in a Where Clause can cause complexity beyond
      the software's capability to recognize chained read opportunities.
      It is better to formulate a Where Clause so that a search item is
      only ANDed with other items, or where only the same search item is
      ORed on different equality conditions.  The user can then query on
      the result to arrive at the desired item selection.

      For example,

                 (I1=C1 AND I2=C2) OR (I1=C3 AND I2=C4) 

      will result in a serial read even though I1 and I2 are both search
      items.  In this case, performance may be dramatically improved by
      taking advantage of the chained read resulting from

                             I1=C1 OR I1=C3 

      and then let the user perform a Query On Result using the longer
      search formulation.  Note that the second Where Clause will result
      in all of the records, and possibly more, than would result from
      the first clause.


KSAM and IMSAM Keyed Reads 

Both KSAM and IMSAM keys can be used for direct comparison in the same
way as IMAGE search items for JOINs and queries, such as:

         "CUST-NAME = 'JOHN SMITH'"

However, because KSAM and IMSAM store key values in sorted order, other
means of keyed access are available.

Partial Match.   

If a wildcard match is given for a KSAM or IMSAM key, and if the wildcard
appears at the end, then a keyed read can be used for the retrieval.

In Access PC, wildcards use the == operator and the wildcard characters
are * and ?.  For the Host Batch Facility, the MATCH operator is used.
For ROMAN-8 installations, the wildcard characters are @ and ?.  For
ISO-7 installations, the wildcard characters are * and ?  because @ is
one of the ISO-7 mapped characters.

For example, if PRODUCT-NAME is a KSAM or IMSAM keyed read, then at the
interactive screen on the PC, the user could get all product names
starting with "HP26" by entering:

           OP PRODUCT-NAME
           == HP26*

In PC batch, the syntax would be:

         SEARCH "PRODUCT-NAME == 'HP26*'"

In the Host Batch Facility, if ISO-7 is not configured, the syntax would
be:

         SEARCH "PRODUCT-NAME MATCH 'HP26@'"

Ranges.   

KSAM and IMSAM keys can be used to quickly retrieve a range of values.
In Access PC, the special operator >< is used to indicate a range.  In PC
batch and in the Host Batch Facility, the <, <=, >, and >= operators are
used with an AND.

For example, if PRODUCT-NAME is a KSAM or IMSAM key field, then at the
Define Query screen in Access PC the user could get all product names
from 2400 to 2699 by entering:

         OP PRODUCT-NAME
         >< 2400...2699

In PC batch or the Host Batch Facility, the syntax would be:

     SEARCH "(PRODUCT-NAME >= '2400')AND(PRODUCT-NAME <= '2699')"

Inequalities.   

If a direct compare, partial match, or range keyed read is not available,
then a KSAM or IMSAM key can be used to gain some speed in retrievals
with a simple <, <=, >, or >= operator.  Such a retrieval would be useful
for exception reporting.  For example,

          (SHIPMENT-VALUE > 10000)

could quickly find all very large orders if SHIPMENT-VALUE were a KSAM or
IMSAM key.
MPE/iX 5.0 Documentation