 |
» |
|
|
|
The strdb tool can be used in conjunction with other standard
HP-UX kernel debugging tools to provide STREAMS/UX-specific information and
data formatting. Generally, if your system is running normally except
for STREAMS/UX, it is recommended that you use strdb to debug the problem. If your system panics or
hangs, strdb can be used on the resulting system core dump,
along with adb to diagnose the problem. strdb is documented earlier in this chapter, and examples
of using adb and strdb together are given at the end of this chapter. What Is a System Panic? |  |
Unlike user code, programming errors in kernel code can cause
system panics. A system panic will result in a panic message to
the console. Also, a system core dump will be generated. This is
a copy of physical memory at the time of the panic. The panic message
and core dump can be examined using adb and strdb to determine the cause of the panic. There are three main categories of panics. The first category
is when a kernel routine calls panic() because of a system inconsistency from which it
cannot recover. In this case, the panic message contains a string from
the routine that called panic(), explaining why panic was called. In the example
below, the panic string is "ifree: freeing free inode." A hexadecimal
stack trace will also be printed. Interpreting the stack trace will
be described later. System Panic: @(#)9245XA HP-UX (A.10.00) #1: Wed Sep 28 15:47:13 PDT 1994 panic: (display==0xb000, flags==0x0) ifree: freeing free inode PC-Offset Stack Trace (read across, most recent is 1st): 0x0014766c 0x001480b0 0x000b3a38 0x000b411c 0x000b3b78 0x000b76 5c 0x000b10d8 0x000aefd0 0x0001c500 End Of Stack |
The second category is the occurrence of a kernel level trap
or exception condition. These usually involve virtual memory and
are described below. A hexadecimal stack trace is also printed. The third is the occurrence of a High Priority Machine Check
(HPMC), which usually indicates a hardware problem. An HPMC is characterized by
a total, sudden system halt and an HPMC "tombstone" printed
on the console, which records the contents of the system's registers.
If you encounter an HPMC, contact your HP service representative.
Note that an HPMC tombstone is also printed out after a TOC (Transfer
of Control -- see "Transfer of Control In Case of System
Hang" for details). There is no need to contact an HP representative
for an HPMC tombstone that is the result of a TOC. Traps |  |
Some very common panics occur from either the trap routing
or interrupt routing routines. Whenever this low level code detects
a trap occurring in the system and it believes that it cannot be
corrected, it will panic the machine. The most common faults are
described below. Usually, a data segmentation fault occurs when a process (in
kernel mode) attempts to dereference a null pointer. If you receive
a data segmentation fault, information similar to the following
will be printed on the system console: trap type 15, pcsq.pcoq = 0.85b7c, isr.ior = 0.4 @(#)9245XA HP-UX (A.10.00) #0: Sat Aug 13 23:17:54 PDT 1994 panic: (display==0xbf00, flags==0x0) Data segmentation fault
|
pcsq.pcoq is the current instruction address, and isr.ior is the current data address. This trap message
means that the instruction at location 0x85b7c tried to reference
address 4 in space 0. You could look in adb to see what the instruction was trying to do.
The instruction may have been attempting to get a value 4 bytes
off of some pointer. Because of a possible logic problem, the pointer
might not have been initialized. An instruction page fault occurs when a process in kernel
mode jumps to an address which is not mapped, and tries to execute
it. Because the page is not mapped, and the kernel is not paged,
a fault is generated. This would appear as the following: trap type 6 pcsq.pcoq = 0.0 isr.ior = 4.78 @(#)9245XA HP-UX (A.10.00) #0: Sat Aug 13 23:17:54 PDT 1994 panic: (display==0xbf00, flags==0x0) Instruction page fault
|
The pcsq.pcoq pair is important; the user attempted to jump
to page zero and start executing. In this case, because the fault
was an instruction page fault, the isr.ior pair is meaningless. The page fault may have occurred
because of an indirect procedure call, where the address of the
routine to be called was not initialized. A third common panic is the protection violation. This type
of panic occurs when the kernel tries to reference a data structure
that does not belong to the current process. This panic also occurs
if the kernel attempts to reference an object in a way which is
not permitted by the access rights assigned to the page where the
object resides, for example, an attempt to write on a read-only
page. Another frequently overlooked area of protection faults are
unaligned access violations. These appear to be protection faults,
but are caused by performing an operation on an unaligned address,
for example, load word on a non-word aligned address. In each of
these cases, trap type 18 or 7 would be generated. The pcsq.pcoq pair would give the offending instruction, and
the isr.ior would give the offending data address referenced.
|