Running Out of Cache [ HP LaserRX/MPE: A Journey of Discovery ] MPE/iX 5.0 Documentation
HP LaserRX/MPE: A Journey of Discovery
Running Out of Cache
Notice that a fairly high amount of CPU, 10 to 15 percent, was spent on
disc caching on the Global System CPU Utilization graph. It would be a
good idea to check disc performance. To do this:
1. Close all open graphs.
You can play back your macro script to close the graphs quickly.
2. From the Draw Graphs dialog box, select the following:
a. Graph=Global Disc Summary.
Remember to deselect Global Bottlenecks and Global System
CPU Utilization.
b. X-Axis=Year.
c. Points Every...=Day.
d. Shift=All Day.
e. Starting Day=1 March 1988.
3. Click OK.
Global Disc Summary Graph
What does this graph show you? What do the curves mean? Use Help to get
more information, if necessary.
Basically, the four curves show the following:
Logical Rate of disc transfers that would occur if disc caching
were not enabled.
Physical Actual rate of disc transfers.
Mem Mgr Rate of disc transfers caused by memory management
(swapping).
Util Percentage of available disc transfer time being used.
The difference between logical and physical disc transfers equals the
rate of transfers eliminated by disc caching. This difference is the
benefit of having disc caching. If disc caching is unavailable on a disc
drive, logical disc transfers equal physical transfers, and there is no
disc-caching benefit.
Disc utilization is calculated as the percentage of time disc transfers
are taking place (or the system could not initiate a new disc transfer
for some reason). Utilization ranges between zero, if no transfers are
taking place, and 100 percent, if every disc drive is transferring data
as fast as possible.
Factors that increase disc utilization include the following:
* Increased physical I/O rate.
* Increased transfer time. Longer than normal seek times or very
large transfer sizes can increase the transfer time.
* Channel or controller contention. If two discs share a
controller, when one is transferring data the other disc is also
considered busy because it cannot start a transfer until the
controller is released.
* Physically slower disc drives. They will have longer transfer
times than faster disc drives will have when they are transferring
the same data.
What can you tell about Trapper? The logical disc rate is about twice
the physical disc rate, which means about 50 percent of the disc I/Os are
eliminated due to disc caching. This isn't bad, but it isn't terrific
either. It is not uncommon to eliminate 60 to 70 percent of the disc
I/Os. It takes about 1 percent of the CPU used by disc caching to
eliminate one disc I/O per second.
If you apply this rule of thumb to Trapper, you will see that 8 to 10
I/Os per second (logical minus physical) are being eliminated, and disc
caching is taking 10 to 15 percent of the CPU (from the Global System CPU
Utilization graph). This shows that although disc caching helps, it
might be using too much CPU. If Trapper is short on CPU or memory, you
might improve overall performance by turning disc caching off.
Trapper was short on memory until 21 April, but it looks like enough
memory was added to relieve the shortage. Turning caching off would help
only if Trapper is short of CPU. Trapper might be short of CPU, but you
cannot determine that without investigating overall CPU usage.
Examine the Utilization curve. It runs at about 20 percent until 19
April and then drops to less than 10 percent. What might have caused
that? Scan the factors affecting disc utilization listed previously, and
check what you already know:
* Did the physical I/O rate change?
Not really. Although the physical I/O rate does change, it
increases. This would make utilization increase, not decrease.
* Did the transfer times change?
You don't know.
* Was there channel or controller contention?
You don't know.
* Were disc drives physically faster or slower?
You can't tell.
It is apparent that you cannot determine what caused the drop in
utilization. You do know at least one thing that did not cause it, and
learning what did not cause a problem can be an important step toward
learning what did cause it.
What next? You need more details.
At this point, you can close the Global Disc Summary graph or leave it on
your screen for reference. To obtain more details, do the following:
1. From the Draw Graphs dialog box, select the following:
a. Graph=Global Disc Detail.
Remember to deselect Global Disc Summary.
b. X-Axis=Day.
c. Points Every...=Hour.
d. Shift=All Day.
e. Starting Day=Any day before CPU utilization dropped, such
as 21 April.
2. Click OK.
Global Disc Detail Graph
The graph shows each disc drive on the system (or the top--most
used--five drives, if more than five are used). The drives are sorted in
descending order from the most used to the least used. You can see that
the disc I/Os are not balanced. Ldev 2 is used much more frequently than
Ldev 1.
The Utilization value on the Global Disc Summary graph is an average of
all the disc drives, but the Global Disc Detail graph shows each drive's
utilization, independently.
Scroll one day to the right (click the gray area to the right side of the
horizontal scroll bar). Keep scrolling one day at a time until you reach
29 April. What is different?
A new disc drive, Ldev 3, was added. Would adding this drive affect
overall utilization? Yes, overall utilization would be affected if the
drive had its own controller and other factors (such as physical I/O
rate) remained the same. Average utilization should drop. It did.
Trapper's load is still unbalanced when transferring to Ldev 1, but the
load is even between Ldev 2 and Ldev 3. You could try to shift files to
Ldev 1 to balance the load on this system. But because Ldev 1 is a 7925
disc, while Ldev 2 and Ldev 3 are 7935 discs, it might be difficult to
balance the load this way.
Extra Credit Exercise
How much good can you do by balancing the load on the disc drives? In
other words, what is your payback if you make an effort to fine-tune disc
utilization? To find out, display the Global CPU Utilization graph.
Check how much time the CPU was idle while awaiting completion of a disc
transfer.
On a system-wide basis, you cannot expect more throughput than you can
obtain by keeping the CPU's paused time at zero. On the other hand, you
can only assess the effect on terminal response time by calculating how
long specific processes had to wait for disc I/Os to complete.
For now, only determine throughput. You will study transaction response
times on the next part of this journey.
MPE/iX 5.0 Documentation