Coming Home [ HP LaserRX/MPE: A Journey of Discovery ] MPE/iX 5.0 Documentation
HP LaserRX/MPE: A Journey of Discovery
Coming Home
Whether this system has a CPU bottleneck remains a question. Where else
would you look to find the answer? While still using the TRAPPER2.PRF
file, do the following:
1. Close all open graphs.
2. From the Draw Graphs dialog box, select the following:
a. Graph=Global CPU Utilization and Global System CPU
Utilization.
Remember to deselect Application Transaction Response.
b. X-Axis=Week.
c. Points Every...=Hour.
d. Shift=All Day.
e. Starting Day=15 August 1988.
f. Ignore Weekends=Enabled (checked).
Because this file has no weekend data, HP LaserRX/MPE plots
a 5-day week.
3. Click OK.
Global CPU Utilization and Global System CPU Utilization Graphs
The resulting graphs appear virtually identical. Both show how CPU time
was used throughout the week, but different information is detailed on
each.
Look at the Global CPU Utilization graph. CPU use (top of the violet
Other area) reaches 100 percent and then flattens out for 4 or 5 hours
during the middle of the day (as you saw on the Global Bottlenecks graph
at the beginning of this journey). This indicates that if more CPU time
were available, somebody would use it. The question is who?
Many batch jobs are using CPU time. Usually, batch jobs are either using
CPU time or are waiting on disc. Since the Paused-For-Disc area (light
blue) is not large, probably only batch jobs would benefit if more CPU
were available.
Are the interactive sessions getting enough CPU? Yes, the red area at the
bottom of the Global CPU Utilization graph rarely shows more than 10
percent busy. Most sessions run at higher priority than batch jobs.
Therefore, it is reasonable to say that the interactive sessions are
getting all the CPU they need.
Actually, this situation is quite unusual. On most systems, interactive
sessions use 30 to 50 percent of the CPU during the day. Getting more
CPU (or using less) would not help the response times of these
interactive sessions very much.
It looks like the batch jobs are running all day, every day. Perhaps
having the batch jobs complete more quickly would be a good idea. More
available CPU might help the batch jobs run faster and complete more
quickly.
Axiom You can relieve a performance bottleneck either by supplying more
of the critical resource or by using less of it.
If the CPU is the performance bottleneck, you can get more of the
resource by upgrading to a faster CPU (for example, upgrade a Series 58
system to Series 70 or 950). To use less of this resource, you must find
a way to avoid using some of the CPU you are currently using.
Re-examine the graphs. In the Global CPU Utilization graph, Sessions and
Jobs are doing necessary work. You cannot expect them to use less CPU
without reworking the application. System is the amount of CPU used by
processes that do not belong to a job or a session. These are usually
spoolers and data communication monitors. In this case, you might
suspect that most of this CPU is being used by the DS data communication
monitors, and you cannot do much about that.
This leaves the Other category. Paused indicates time when the CPU is
not being used because it is waiting for disc. So paused time is
available for anyone who needs it.
What makes up the Other category of CPU use? Look at the Global System
CPU Utilization graph. It breaks the Other category into its three
components: memory management, disc caching, and interrupt control
stack. Processes is the sum of session, job, and system shown on the
previous graph. Paused is paused.
If you look at the Global System CPU Utilization graph, you will see the
memory manager is not a problem now that extra memory has been added.
Disc caching is fairly high. You saw earlier that disc caching was not
helping that much, so you could disable disc caching if extra CPU were
needed.
This might be the time to disable disc caching. Because ICS is mainly
handling interrupts and it does not appear to be excessive, leave it
alone for now.
Almost a Three-Point Landing
Trapper was short on main memory until 21 April 1988 when additional
memory was installed. This additional memory eliminated the swapping
activity. On 29 April, the addition of a third disc drive lowered
overall disc utilization without affecting throughput significantly.
Later, a disc drive was removed. This increased disc utilization, but
not to the previous level. Disc caching can eliminate about 50 percent
of all disc I/Os, but requires a substantial amount of CPU to do this.
Disabling disc caching on all disc drives might be beneficial to
executing batch jobs. Monitor the system for 1 or 2 days to see if this
change would be beneficial or detrimental.
Without disc caching, the amount of CPU capacity released might be used
by batch jobs during peak times. You might expect total CPU used to stay
below 100 percent for most, if not all, of the day. With caching
eliminated, you also can expect to see more Paused-For-Disc time. The
question to be answered is: Does the increase in Paused time equal or
exceed the amount of CPU released by disabling disc caching? If Paused
time is less than the old Paused + Disc Caching time, the extra CPU is
probably being used by Jobs and Sessions to get finished faster. If the
opposite is true, you underestimated the effectiveness of disc caching,
and you should enable it immediately.
There are many other possibilities. What if you raised the random fetch
quantum on disc caching to 64 sectors? Because database access uses
random I/Os, this might increase the probability of eliminating a
successive I/O and increase main memory usage. Because no other memory
problems are apparent, you might try this. There is a chance that having
fewer, larger disc cache domains will decrease the CPU overhead for disc
caching because there are fewer domains to search.
There is no single correct answer. The best you can do is predict what
might help, try it, and evaluate the results. Then, if it is still
necessary, make a new prediction based on your results and try it again.
When you cannot think of anything more you can do to improve performance,
your system should be optimally tuned.
Undoubtedly, there is more you can learn from analyzing the Trapper
system. Because this logfile is from a real system, there might be more
surprises awaiting discovery. But now you might prefer to take a brief
tour of some of the other available logfiles to study how other
environments look when you examine them using HP LaserRX/MPE.
MPE/iX 5.0 Documentation