 |
» |
|
|
|
This section describes how to use adb on core dumps obtained following a system crash.
See "Generating and Retrieving System Core Dumps" for information
on how these dumps are obtained. adb can also be used to examine a system that is currently
running. See the adb(1) man page or ADB Tutorial for
more information. Invoking adb |  |
When using adb on a system core dump, you must use the "-k" option. This
option will tell adb to treat the core dump as a system core dump instead
of a user process core dump, which is organized differently. For example,
to call adb on the dump pair vmcore.1 and vmunix.1, perform the following: When using adb on a running HP-UX system, you also use the "-k" option,
and use /stand/vmunix as the object file and /dev/mem as the core file: adb -k /stand/vmunix /dev/mem
|
You will probably need to be superuser to access /dev/mem. Because you are examining a running (and continuously
changing) system, adb will not be able to set you up in any specific
process context, but you will be able to examine kernel global variables. Context on Entry to adb |  |
adb maintains a set of registers corresponding to
the registers of the machine. The adb command $r will print out the values of these registers.
When adb is invoked on a system core with the -k option, it sets these registers to the values
of the machine registers at the time the system core dump was taken.
These register values are not the values the registers contained
at the point the panic or trap occurred. Instead, they are the values
the registers contained at the time the kernel started dumping a
copy of physical memory to the swap area. How to use these "dump
time" register values to determine the state of the registers
at the time the trap or panic occurred will be described later.
These "panic time" register values enable the
user to examine the context of the process that was running at the
time of the system crash. Debugging Hung Systems |  |
If the system core dump is from a transfer of control (TOC)
of a hung system, adb will be unable to determine the "dump
time" or "panic time" register values.
In these cases, adb can still be used to determine the contents of
the kernel message buffer (see "Finding the Panic Message"), and
to examine kernel global variables (see "Obtaining Important
Kernel Global Variables"), but it will not be able to give
you a stack trace or context for the process that was running at
the time of the system crash. It is especially important, when looking at a dump from a
system which appeared to be hung, to check the kernel globals freemem, freemem_cnt, and avenrun. These variables may indicate that your system
was out of memory or was overloaded. (See "Obtaining Important
Kernel Global Variables" for more information.) It can also be helpful, before doing a TOC on a system which
appears hung, to determine how complete the system paralysis is.
The following table describes hang symptoms, from the least severe
to the most severe. This table may help you determine where your
system fits on this continuum. Symptom | Explanation |
---|
Some processes,
like your shell or your tests, do not run, but other processes are
running. | Your system
is not hung, but there is some other problem holding back your processes.
If you have a terminal session that is working, use strdb and adb to look at the kernel and the STREAMS/UX subsystem
state. | You cannot
login, either locally or remotely. | Your system
may not be hung, its networking software state, terminal I/O or getty processes may be deadlocked in some way. If you
have a terminal session that is working, use strdb and adb to look at the kernel and the STREAMS/UX subsystem
state. | You cannot
ping your system. | Your system
may not be hung, its networking software state may be deadlocked
in some way. If you have a terminal session that is working, use strdb and adb to look at the kernel and the STREAMS/UX subsystem
state. | Carriage
returns do not echo on the console or on other login sessions. | Your system
is hung, but is probably TOC-able. TOC the system and examine the
kernel globals in the dump. | Your system
has an LED activity display which is not being updated; it is showing
no system activity at all. | Your system
is hung, but is probably TOC-able. TOC the system and examine the
kernel globals in the dump. | Your system
has an access port enabled, and typing CTRL-b on the console gives no response, or you attempt
to TOC a system without an access port with no success. | Your system
is ignoring very high-level interrupts, and it is so thoroughly
hung that you will probably be unable to TOC it. Hangs as severe
as this are extremely rare. Hit the system reset button, and try
to debug the problem using other methods such as code reviews, panics, or printfs. |
Finding the Panic Message |  |
The kernel maintains a circular message buffer into which
text can be printed using the kernel printf, msg_printf, and cmn_err routines. At the time of a panic, a panic message
is printed to this buffer. A stack trace consisting of instruction
addresses in hexadecimal is also printed out, as well as the current
instruction and data addresses being accessed at the time of the
crash. Other interesting information may also be located in the
buffer, such as system boot-up messages and kernel error messages
that may help pin down the cause of the panic. To print out this
buffer, invoke adb on the system dump and type the following: Examples of msgbuf contents are included in the examples at the end
of this chapter. Interpreting the Panic Stack Trace |  |
adb can be used to translate the hexadecimal stack
trace printed after the panic message into procedure addresses.
For each hexadecimal number in the stack trace, use the adb i command to determine where in the kernel the address
occurs. For example, the hex stack trace below can be deciphered
as follows: PC-Offset Stack Trace (read across, most recent is 1st): 0x0016da70 0x000e5a68 0x000d34cc 0x0009ea14 0x00099714 0x0009 2fdc 0x0006e0c8 0x0006dbb8 0x0006d2a8 0x001954e8 0x00194fa4 0x000b 7e24 0x001846d4 0x00181730 0x00156538 0x00156af8 0x001567b8 0x000e 6d80 0x000d3aac End Of Stack
|
In adb (text preceded by "#" are comments): 0x0016da70/i # use of adb i command panic+30: addil -1000,dp # adb's response 0x000e5a68/i trap+0xADC: b trap+1004 0x000d34cc/i $call_trap+20: rsm 1,r0 0x0009ea14/i flushq+60: ldbs 0xD(r21),r22 0x00099714/i q_free+1C: ldw -0xA4(sp),r31
|
Manual Stack Back-Tracing |  |
You may need to use adb to manually back-trace your stack. This is necessary
when the hexadecimal stack trace printed by panic is incomplete. For example, panic may print a few hex addresses and then the message: stktrc: cannot find descriptor
|
or You may also need to do a manual stack back-tracing if you
wish to find out how the arguments the routines in your stack trace
were called. You will need the value of the stack pointer for each
routine in the stack and manual stack back-tracing will tell you
these values. PA-RISC Procedure Calling Conventions OverviewThe following is a very brief overview of the PA-RISC procedure
calling convention. More information can be obtained from the PA-RISC Procedure
Calling Conventions Reference Manual. PA-RISC machines have 32 general use registers. These registers
are identical physically, but are assigned different roles by the
PA-RISC operating systems and compilers in order to enable procedure
calls to take place efficiently and consistently. The following
table lists these special roles: Table 6-1 General Use Register Roles Register | Role |
---|
r0 | Value is always zero. | r1 | Scratch
register. | r2 | Return
pointer, also known as rp. This is the instruction address the called procedure
will return to when it is finished executing. | r3 - r18 | Callee
saves. If the called procedure wishes to modify any of these registers,
it must save the original contents on its stack and restore the contents
before returning to the caller. | r19 - r22 | Caller
saves. The called procedure is free to modify these registers without
saving the original contents. If the calling procedure wants to retain
the contents, it must save them before making the procedure call and
restore them after the call returns. | r23 - r26 | First four
procedure arguments, also known as arg0, arg1, arg2, and arg3. The calling procedure loads the first four procedure
arguments into these registers before making the procedure call. | r27 | Global
data pointer, also known as dp. | r28 - r29 | Procedure
return values, also known as ret0 and ret1. The called procedure loads the return values
into these registers before returning. | r30 | Stack pointer,
also known as sp. | r31 | Millicode
return pointer, or scratch register. |
The only registers you need to be concerned with for manual
stack back-tracing are r2 (rp) and r30 (sp), although the other registers become important
when trying to determine what arguments a procedure in the trace
was called with. In order to implement these register roles, at the start of
each procedure a stack frame is allocated and callee save registers which the called procedure is planning
to modify are stored in the stack frame. The stack frame is allocated
simply by incrementing the sp by the size of the stack frame needed, using either
the stwm or ldo instruction. For example, below are the instructions
which create the stack frame for ioctl. Numbers in brackets ([ ]) refer to the notes
below. ioctl: stw rp,-14(sp) [1] ioctl+4: stwm r3,100(sp) [2] ioctl+8: stw r4,-0xFC(sp) [3] ioctl+0xC: stw r5,-0xF8(sp) [4] ioctl+10: stw r6,-0xF4(sp) [5]
|
[1] Store return instruction address at 0x14 above the caller's
stack pointer. Note that the return address is stored in the caller's
stack frame, not the callee's stack frame. [2] Store the contents of r3 at the current sp, then allocate the stack frame by adding 0x100
to sp. The stwm instruction stands for store word and modify. [3] Store the contents of r4 at sp - 0xFC, just below where you stored r3. [4] Store the contents of r5 at sp - 0xF8, just below where you stored r4. [5] Store the contents of r6 at sp - 0xF4, just below where you stored r5. The instruction ldo (load offset) can be used instead of stwm for allocating the stack. For example: doadump: stw rp,-14(sp) [1] doadump+4: ldo 30(sp),sp [2]
|
[1] Store return instruction address in caller's stack frame. [2] Add 0x30 to the current value in register sp and store the result in sp, allocating stack frame. Basic Stack Back-Tracing |  |
Given the stack pointer, sp, and the current instruction address, pcoqh, it is possible to get the previous stack pointer
and instruction address. The starting values for sp and pcoqh are obtained from the adb $r command. As mentioned above, when adb is invoked on a system core with the -k option, it sets these registers to the values
of the machine registers at the time the system core dump was taken.
The $r command prints out these registers. Below are the
first few lines of the $r display. pcsqh 0 pcoqh 24B34 doadump+0xEC pcsqt 0 pcoqt 0 _fp_status rp 0xDBF48 panic_boot+354 arg0 1 arg1 0xC57B arg2 2000 arg3 9BD70152 sp 20F380 ret0 303847 ret1 797 dp 1F6000
|
There are four steps to back-tracing a stack: Determine the size of the current stack frame. The size of the current stack frame is simply the amount the sp is incremented at the entry to the current procedure.
To find that number, use adb to print out the first few instructions of the
current procedure. To determine the initial current procedure, look
at the value of the register pcoqh, which appears at the end of the first line of
the $r output. In most cases, this initial procedure will be doadump. doadump/3i doadump+3: stw rp,-14(sp) ldo 30(sp),sp mfctl iva,r22
|
doadump's second instruction is an ldo which increments the stack pointer by 0x30, so
doadump's stack frame size is 0x30. Determine the previous stack pointer. The previous stack pointer is the current stack pointer, minus
the current stack frame size. adb can be used to keep track of the sp register by calculating the previous stack pointer
using the following adb commands: <sp-0x30>sp [1] .=X [2] 20F350 [3]
|
[1] Take the current value of the sp register, decrement it by 0x30, and store the
result back into the sp register. See adb documentation for more information on adb registers
and the "<" and ">" operators. [2] Print out the new value of sp. This information should be saved in case you
need to find out the contents of registers which have been pushed
onto the stack frame. See adb documentation for more information about the concept
of ".", the current location in the core file. [3] adb output in response to the previous command, .=X Find the current return pointer. Your current procedure is doadump, and you have just set sp so that it is the same value it was when doadump
was first entered, before the ldo instruction was executed. Recall that doadump's
first instruction is: Because you have just set sp to the same value it had when doadump's first
instruction was executed, you can find the rp by looking at what is in sp-0x14: <sp-0x14/X [1] crash_monarch_stack+1EC: 0xDBF48 [2]
|
[1] Print out the value of the location sp-0x14 in hexadecimal. [2] adb's response. crash_monarch_stack+1EC can safely be ignored. 0xDBF48 is the instruction
address which was in rp. Find out which procedure the return pointer points
to. The adb i command will tell you this: 0xDBF48/i [1] panic_boot+354: comibt,=,n 0,ret0,panic_boot+368 [2]
|
[1] use of the i command [2] adb's response
Notice that the $r command has already indicated that rp corresponds to panic_boot+354. To continue back-tracing the stack, iterate the four steps
shown above. Here is the adb sequence of commands and responses to trace the
next two levels back in this stack. Text preceded by "#" are
comments.  |
panic_boot/3i # look at beginning of panic_boot: # panic_boot for stack frame panic_boot: stw rp,-14(sp) # size stwm r3,80(sp) # stack frame size is 0x80 stw r4,-7C(sp) <sp-0x80>sp # calculate new sp .=X # print out new sp 20F2D0 <sp-0x14/X # find rp in caller's crash_monarch_stack+16C: 0xDB938 # stack frame 0xDB938/i # what instruction address boot+24: addil 0,dp # does rp correspond to? boot/3i # look at beginning of boot boot: # for stack frame size boot: stw rp,-14(sp) stwm r3,80(sp) # stack frame size is 0x80 stw r4,-7C(sp) <sp-0x80>sp # calculate new sp .=X # print out new sp 20F250 <sp-0x14/X # find rp in caller's crash_monarch_stack+0xEC: 1518A4 # stack frame 1518A4/i # what instruction address panic+0xF0: ldw -94(sp),rp # does rp correspond to? panic/3i # look at beginning of panic panic: # for stack frame size panic: stw rp,-14(sp) stwm r3,80(sp) # stack frame size is 0x80 stw r4,-7C(sp)
|
 |
If you are doing a manual stack back-trace in order to find
out values of registers which have been pushed onto the stack, it
is useful to save the results of the four steps at each iteration
for future reference. A table such as the following can be helpful: sp | pcoqh | Procedure
Address | Frame
Size |
---|
| 24B34 0xDBF48 0xDB938 1518A4
|
| doadump+0xEC panic_boot+354 boot+24 panic+0xF0
|
| |
Exceptions to the Four Steps |  |
The four basic steps of stack back-tracing have some exceptions: panic:
If your procedure address is in panic, you need to take special steps
to find out the true value of your current stack pointer. Instead of
being the previous sp minus the previous frame size, panic's sp can be found at location panic_save_state. Do the following to find the value using adb
and reset adb's copy of sp: panic_save_state/X [1] panic_save_state: [2] panic_save_state: 7FFE6F48 7FFE6F48>sp [3]
|
[1] Ask adb to print out location panic_save_state in hex. [2] These two lines are adb's response. panic's actual sp is 7FFE6F48. [3] Reset sp to the correct address. Now that you have panic's real stack pointer, the other steps
in the back-tracing process can be executed normally. Text preceded
by "#" are comments. <sp-0x80>sp # calculate new sp .=X # print out new sp 7FFE6EC8 <sp-0x14/X # find rp in caller's 7FFE6EB4: 0xDF108 # stack frame 0xDF108/i # what instruction address trap+0xA28: b trap+0xF18 # does rp correspond to? trap/3i # Look at beginning of trap trap: # for stack frame size trap: stw rp,-14(sp) stwm r3,100(sp) # stack frame size is 0x100 stw r4,-0xFC(sp) <sp-0x100>sp # calculate new sp .=X # print out new sp 7FFE6DC8 <sp-0x14/X # find rp in caller's 7FFE6DB4: 0xD0BD4 # stack frame 0xD0BD4/i # what instruction address $call_trap+20: rsm 1,r0 # does rp correspond to?
|
|
$call_trap, $call_int, $ihndlr_rtn,
$thndlr_rtn, $RDB_trap_patch, $RDB_int_patch: These procedures
do not follow the ordinary procedure calling conventions. They are
written in assembly language, and are used to create a save state structure which saves the values of all registers
at the time of a trap or an interrupt. The save state is then passed to trap() or the appropriate interrupt routine. The save state starts at sp - 0x230, and you can retrieve the previous stack
pointer and current pcogh from the save state, as shown below. The offsets into the save state are for the 10.0 release, and may change from
release to release. <sp-0x230>sp [1] <sp+0x84/X [2] 7FFE6C1C: 96B70 [3] <sp+0x78/X [4] 7FFE6C10: 7FFE6B98 [5] 7FFE6B98>sp [6] 96B70/i [7] qenable+10: ldws 0(r20),r21 qenable/3i qenable: qenable: stw rp,-14(sp) ldo 80(sp),sp stw arg0,-0xA4(sp)
|
[1] Reset sp to point to the top of the save state structure. [2] Save state structure + 0x84 is the location of the pcogh. [3] adb's response -- 96B70 is the return instruction
address. [4] Save state structure + 0x78 is the location of the sp. [5] adb's response -- 7FFE6B98 is the current stack pointer. [6] Reset sp to the correct value. [7] Continue to iterate the four basic stack back-tracing
steps.
The table of results from the back-tracing so far should look
like this: sp | pcoqh | Procedure
Address | Frame
Size |
---|
20F380 20F350 20F2D0 7FFE6F48 7FFE6EC8 7FFE6DC8 7FFE6B98
|
| 24B34 0xDBF48 0xDB938 1518A4 0xDF108 0xD0BD4 96B70
|
| doadump+0xEC panic_boot+354 boot+24 panic+0xF0 trap+0xA28 $call_trap+20 qenable+10
|
| 0x30 0x80 0x80 0x80 0x100 0x80
|
|
Mapping Assembly Language Locations to Source Code
Lines |  |
Once you know the instruction address location where the system
panic or trap occurred, the troubleshooting step is to find where
in the source code the panic or trap occurred. For panics, search
the source code for the panic which uses the same string that was
printed out when the kernel panicked. This will tell you exactly
where the panic occurred in the source code. The method for traps
is to use adb to print out the procedure in which the trap occurred
in assembly language. Then, work backwards from the instruction
address, looking for clues in the assembly instructions which will
help pinpoint the corresponding location in the source. The most
useful clue is a branch to another procedure. In PA-RISC, branches
are done with the branch and link instruction, bl, and in assembly
a branch will look like this: [1] a procedure call to copen() or: bl creat+34,rp (save_pn_info) [1]
|
[1] a procedure call to save_pn_info() By comparing the branches in the assembly code before and
after the instruction where the trap occurred with the procedure
calls in the source code, the corresponding source code line can
often be determined. See the examples at the end of this chapter
for more details. Other useful assembly code landmarks are the use of the extru, extrs, zdep, and ldws instructions in checking and setting flag bits,
and the use of the compare and branch instructions, comb, combf, combt, comib, comibf, and comibt, to implement if statements. For example, the ioctl() source code: if ((fp->f_flag & (FREAD|FWRITE)) == 0)
|
is implemented by the assembly code: ioctl+60: ldws 0(r8),r13 [1] ioctl+64: extru r13,1F,2,r14 [2] ioctl+68: comibf,=,n 0,r14,ioctl+80 [3]
|
[1] Load from memory address pointed to by r8, into r13. [2] Extract 2 bits from r13, starting at bit 1F, place bits
in r14. [3] If r14 is not zero, branch to ioctl+0x80. In the example above, fp is in r8. If fp were null, a trap type 15 would occur at ioctl+60, when attempting to load off of a null pointer. For more information about PA-RISC assembly language, see
the Assembly Language Reference Manual (part
number 92432-90001), the PA-RISC 1.1 Architecture and
Instruction Set Reference Manual (part number 09740-90039),
or the PA-RISC Procedure Calling Conventions Reference
Manual (part number 09740-90015). Obtaining Procedure Argument Values |  |
It is often useful in debugging a problem to know what parameter
values a procedure in the stack trace was called with. For example,
in the following stack trace it would be useful to know the arguments flushq() was called with. panic+30: addil -1000,dp trap+0xADC: b trap+1004 $call_trap+20: rsm 1,r0 flushq+60: ldbs 0xD(r21),r22 q_free+1C: ldw -0xA4(sp),r31
|
 |
Obtaining the First Four ArgumentsArguments 0 through 3 are passed from the calling procedure
to the called procedure by loading the values into registers 23
- 26. These registers are also known as arg0, arg1, arg2, and arg3. For example, here is bmap() preparing to call realloccg() by moving realloccg()'s arguments from the registers they are in to
the argument registers by doing an or on the source registers with r0, which is always
zero: bmap+16C: or r10,r0,arg1 bmap+170: or ret0,r0,arg2 bmap+174: or r8,r0,arg3 bmap+178: or r4,r0,arg0 bmap+17C:
|
Next, here is flushq() preparing to call rmvq() by loading arg0 and arg1 from its stack frame. Note that arg1 gets loaded in the delay slot of the branch instruction bl. See the Assembly Language Reference
Manual or the PA-RISC 1.1 Architecture and
Instruction Set Reference Manual for more information
on branch delay slots. flushq+0xE0: ldw -64(sp),arg0 flushq+0xE4: bl rmvq,rp flushq+0xE8: ldw -34(sp),arg1
|
After allocating its stack frame and saving any callee save
registers, the called procedure will usually load the argument registers
into some of the callee save registers that it just saved the values
of. For example, here is realloccg() saving the contents of the callee save registers r3 - r10 and loading arg0 - arg3 into some callee save registers. realloccg: stw rp,-14(sp) realloccg+4: stwm r3,80(sp) realloccg+8: stw r4,-7C(sp) realloccg+0xC: stw r5,-78(sp) realloccg+10: stw r6,-74(sp) realloccg+14: stw r7,-70(sp) realloccg+18: stw r8,-6C(sp) realloccg+1C: stw r9,-68(sp) realloccg+20: stw r10,-64(sp) realloccg+24: or arg0,r0,r3 realloccg+28: or arg1,r0,r6 realloccg+2C: or arg2,r0,r7 realloccg+30: or arg3,r0,r4
|
Here is rmvq() storing its arguments away in its stack frame: rmvq: stw rp,-14(sp) rmvq+4: ldo 80(sp),sp rmvq+8: stw arg0,-0xA4(sp) rmvq+0xC: stw arg1,-0xA8(sp)
|
 |
If the arguments were put into callee save registers, the next procedure up in the stack
trace will save these registers in its stack frame. You can retrieve
these values from the stack. If the arguments are stored on the
stack frame, you can also retrieve them from the stack. But first
you must make sure that the contents of the callee save registers or the stack frame locations you are
interested in were not modified between the time the arguments were
loaded at the beginning of the procedure and the time the next procedure
call on the stack trace took place. The easiest way to determine
this is to have adb print out the assembly code for the procedure
into a file and use an editor such as vi to find all references
to the register between the beginning of the procedure and the branch
to the next procedure in the stack trace. If none of these references
modify the register, the value which the next procedure has saved
in its stack frame is valid. To print the assembly of a procedure to a file using adb: $>filename [1] procedure,100/ia [2] $> [3]
|
[1] Tell adb to direct stdout to the file filename. There should be no space between $> and
the filename. [2] Print the first 0x400 instructions of procedure. [3] Set stdout back to the terminal. Now, edit filename, and search for all instances of the register
or stack frame location of interest. Any instruction which would
modify the contents of the register could potentially overwrite
the information you are trying to get. Below are some examples of
modifying instructions. Note that in all cases the register being
modified, also known as the target register, is the last register
in the instruction. ldw 10(r3),r4 will overwrite r4 ldhs 4(r3),rp will overwrite rp ldo -1(r20),r22 will overwrite r22 ldwx r31(arg3),r21 will overwrite r21 or r3,r0,arg0 will overwrite arg0 extrs ret1,1F,10,r21 will overwrite r21 zdep r20,1A,1B,r31 will overwrite r31 sub r31,arg1,r31 will overwrite r31 sh3add arg1,r0,r31 will overwrite r31 stw r19,-38(sp) will overwrite memory location sp - 0x38
|
Sometimes an instruction which modifies the register of interest
can appear to occur between the beginning of the procedure and the
call to the next procedure in the stack because of how the assembly
code is laid out. However, the modifying instruction actually would
not have been executed because it was part of a conditional code
path that was not taken. For example, this C code from ioctl(): if ((fp->f_flag & (FREAD|FWRITE)) == 0) { u.u_error = EBADF; return; }
|
compiles into this assembly: ioctl+60: ldws 0(r8),r13 ioctl+64: extru r13,1F,2,r14 ioctl+68: comibf,=,n 0,r14,ioctl+80 ioctl+6C: ldw 68(r3),r19 ioctl+70: ldo 9(r0),r21 ioctl+74: sth r21,312(r19) ioctl+78: b ioctl+7F0 ioctl+7C: ldw -1D4(sp),rp ioctl+80: ldws 4(r5),r7
|
If the if statement is false, the branch at ioctl+68 is taken, and instruction ioctl+6C is never executed because the ,n in ioctl+68 causes the instruction in the branch delay
slot to be nullified, or not executed. ioctl+70 through ioctl+7c
are never executed because the branch at ioctl+68 branches past
these instructions to ioctl+80. If ioctl+6c through ioctl+7C had
been executed, r19, r21, and rp would have been modified. Suppose you have determined that the procedure whose arguments
you are interested in does not modify the registers it loaded the
arguments into before the next procedure call in your stack. You
can look at the appropriate location in the stack frame of the next
procedure call in the stack to get the value. For example, if a
routine whose registers you are interested in has called panic,
you look at the beginning of panic's assembly to see which callee save registers it saves in its stack. panic: stw rp,-14(sp) panic+4: stwm r3,40(sp) panic+8: stw r4,-3C(sp) panic+0xC: stw r5,-38(sp) panic+10: stw r6,-34(sp)
|
Obtain panic's sp by manual stack back-tracing, and then r3 is at sp - 0x40, r4 at sp - 0x3C, and so on. Obtaining Arguments 5 through NOnly the first four arguments to a procedure are passed via
registers. Any remaining arguments are pushed onto the calling procedure's
stack frame, where the called procedure will retrieve them. If you
have the calling procedure's sp you can use adb to get the values of the arguments. For example, symlink() calls lookuppn(), which has six arguments. Here is the assembly
code which sets up the six arguments: symlink+40: stw r4,-34(sp) symlink+44: stw r3,-38(sp) symlink+48: ldo -3C(sp),arg2 symlink+4C: ldo -9C(sp),arg0 symlink+50: or r0,r0,arg1 symlink+54: bl rename+34,rp (lookuppn) symlink+58: or r0,r0,arg3
|
If you want to get the fifth argument, you see that symlink() places it in its stack frame at sp - 0x34. Argument
5 is at -0x34 because the procedure calling convention specifies
that arguments get placed in the stack frame in reverse order, so
arg6 is at sp - 0x38, just above arg5, and if lookuppn() had seven arguments, arg7 would be placed at sp
- 0x3C. If you know symlink()'s sp from doing a manual stack back-trace, you can
use it to get the value of argument 5: 7FFE6B98-0x34/X 7FFE6B64: 2D7298 # adb's response
|
Obtaining Register Contents from Trap save_state or panic_save_state
AreasIf the system core dump was produced by a panic or a trap,
copies of all the registers at the time of the trap or panic were
saved in memory and are available in the core dump. For a trap,
the registers are saved on the stack, in the order specified in
the struct save_state, which is defined in /usr/include/machine/save_state.h. For a panic, the registers are saved in a statically
allocated memory location called panic_save_state, in the order specified in the struct rpb, which is defined in /usr/include/machine/rpb.h. See the examples at the end of this chapter for
details of how to access registers in the trap save_state area. The mechanics of accessing panic_save_state fields are similar, though the offsets into the
save area are different. For example, if you want to get r3 out
of the panic_save_state area, look at /usr/include/machine/rpb.h and note that the field rp_gr3 is the sixth word in struct rpb. Therefore, it can be found at panic_save_state + 5 words == panic_save_state + 0x14. Not all registers in these save areas are guaranteed to be
the same as at the time of the panic or trap, because some registers
must be used by the system to execute the panic or trap path and
save away the other registers. Registers which may not be preserved
are r1, r19 - r22, r31, arg0, arg1, arg2, and arg3. Use your judgment
with the contents of these registers in the save areas. If they
look odd, they may have been overwritten. If your stack trace includes a call to trap(), it will also have a call to panic() higher up (later in time) than the trap. In this
case, it is safer to look in the trap save_state structure on the stack than the panic_save_state area for registers you are curious about, because
the trap saved the registers closer in time to when the problem
which caused the system crash occurred. Obtaining Important Kernel Global Variables |  |
To print out the value of a kernel global variable, simply
use the symbol name with the appropriate formatting option (see
adb(1) and the ADB Tutorial for more information).
The following table lists some of the more interesting kernel globals,
with the appropriate adb format for printing them, and brief descriptions
of what they mean. adb
Command | Description |
---|
msgbuf+0xc/sD | Kernel's
circular printf buffer. | freemem/D | Amount
of free memory, in pages. If zero or a small number, system is out
of memory. | physmem/D | Size of
physical memory, in pages. | maxfree/D | Number
of free pages soon after system boot. | desfree/D | Number
of free pages the system tries to keep available. | minfree/D | Minimum
free pages before system starts swapping processes out. | avefree/D | Average
number of free pages over past 5 seconds. | avefree30/D | Average
number of free pages over past 30 seconds. | freemem_cnt/D | Number
of processes currently waiting for memory. If large number, many
processes are stopped waiting for memory. | avenrun/3F | System
load average, for the last one minute, five minutes, and 10 minutes,
in floating point notation. If large numbers, system may be too
heavily loaded. | lbolt/X | Seconds
since boot. | time/Y | Current
time, printed out in ctime(3C) format. | _release_version/s | HP-UX version
string. | utsname+0x9/s | System
hostname. | utsname+0x12/s | HP-UX release
number. | utsname+0x24/s | System
hardware model number. |
Obtaining Values from the Process Table Entry and User
Area |  |
It is possible to use adb to print out fields of interest from the process table
entry and user area of the process that was running when the system
crashed. The following subsection describes how to print certain important
fields and gives a very brief description of each field. For more information
on the meaning of these fields, see The Design of the
UNIX Operating System by Maurice Bach, pub. Prentice-Hall,
or The Design and Implementation of the 4.3 BSD UNIX
Operating System by Leffler, McKusick, Karels and Quarterman,
pub. Addison-Wesley. adb, when called with the -k option, should print
out the address of the user area and process table entry of the
process that was running when the system crashed. adb will print this out when it is first entered,
so the first output you should see from adb is: u 7FFE6000 u.u_procp 4D2F20
|
u is the location of the user area, and should always
be at virtual address 7FFE6000. When the kernel switches to a new
process, it always maps the physical address of the process' user
area to virtual address 7FFE6000. u.u_procp is the location of this process' process table
entry. This address will vary from process to process. If adb does not print the u and u.u_procp values on entry, it was unable to determine the
currently running process at crash time. adb was unable to print these values probably because
your core dump was the result of a Transfer of Control (TOC). If the process that caused the panic was running on the Interrupt Control
Stack (ICS), the u and u.u_procp pointers will not contain valid information for
the process. When an interrupt occurs the kernel executes the appropriate
kernel code to process the interrupt without switching to a new
user context. The u and u_procp address which adb will print will be the process that was running
when the interrupt occurred. The interrupt interrupted the running
of that process in order to process the interrupt. Look at the panic
message in msgbuf to tell if the panic occurred while on the ICS.
If you see a message like the following after the hex stack trace,
the process was on the ICS. NOT sync'ing disks (on the ICS) (0 buffers to flush):
|
Important User Area FieldsThe table below describes the adb command to use to print important user area fields. u means the value marked u printed on adb entry (see example above). When executing the adb commands in the table below, substitute the u value printed on adb entry for the letter u. Field
Name | Address | Description |
---|
u_procp | u+0x258/X | Pointer
to process table entry. | u_comm | u+0x260/s
[Series 700] u+0x264/s [Series 800] | Name of
command used to start this process. For STREAMS/UX, this is usually strsched. | u_arg | u+0x270/10X
[Series 700] u+0x274/10X [Series 800] | Arguments
to current system call. For STREAMS/UX service routines being run
by strsched, these should all be zero. |
For example, to print u_comm, given the adb entry printout u 7FFE6000 u.u_procp 4D2F20, type: See /usr/include/sys/user.h for more information on fields in the user area.
These offset values are for HP-UX release 10.0, and may change from
release to release. Important Process Table FieldsThe table below describes the adb command to use to print important process table
fields. p means the value marked u.u_procp printed on adb entry (see example above). When executing the adb commands in the table below, substitute the u.u_procp value printed out on adb entry for the letter p. For example, to print out p_flag, given the adb entry printout at the beginning of this section,
type: See /usr/include/sys/proc.h for more information on fields in the proc structure.
These offset values are for HP-UX release 10.0, and may change from
release to release. Field
Name | Address | Description |
---|
p_flag | p+0x20/X
[Series 700] p+0xc/X [Series 800] | per-process
flags, see proc.h | p_flag2 | p+0x24/X
[Series 700] p+0x48/X [Series 800] | per-process
flags, see proc.h | p_mpflag | p+0x10/X
[Series 800 only] | per-process
flags, see proc.h | p_stat | p+0xc/b
[Series 700] p+0x32/b [Series 800] | current
process state, see proc.h | p_uid | p+0x2c/D
[Series 700] p+0x0x50/D [Series 800] | real user
id, used to direct tty signals | p_suid | p+0x30/D
[Series 700] p+0x54/D [Series 800] | set effective
uid | p_pid | p+0x38/D
[Series 700] p+0x5c/D [Series 800] | process
id | p_ppid | p+0x3c/D
[Series 700] p+0x60/D [Series 800] | process
id of parent | p_pgrp | p+0x34/D
[Series 700] p+0x58/D [Series 800] | process
id of process group leader | p_wchan | p+0x40/X
[Series 700] p+0x1c/X [Series 800] | event process
is sleeping on should be zero if currently running | p_sleeptime | p+0x24/X
[Series 800 only] | time of
last sleep or wakeup (in seconds) | p_cptickstotal | p+0x4c/X
[Series 700] p+0x14/X [Series 800] | cpu ticks
(total for life of process) | p_cursig | p+0xe/b
[Series 700] p+0x34/b [Series 800] | number
of current pending signal, if any | p_sig | p+0x10/X
[Series 700] p+0x38/X [Series 800] | signals
pending to this process | p_sigmask | p+0x14/X
[Series 700] p+0x3c/X [Series 800] | current
signal mask | p_sigignore | p+0x18/X
[Series 700] p+0x40/X [Series 800] | signals
being ignored | p_sigcatch | p+0x1c/X
[Series 700] p+0x44/X [Series 800] | signals
being caught by user |
|