 |
|
|
|  |  |
Overview
Troubleshooting an A-Class server is performed to the "Field Replaceable Unit"
(FRU) level. Diagnostic testing can be performed on the A-Class server and most
components can be removed and replaced by the customer or customer representative.
 |
 |
 |
 |
 |
NOTE: The information in this section is meant for users who have at
least a minimum level of hardware troubleshooting experience. Some System
Administrator-level knowledge of the HP-UX operating system is also required. |
 |
 |
 |
 |
This section gives you the information needed to recognize repeatable hardware
failures that prevent completion of the server selftest, or hardware failures that
will not allow the HP-UX operating system to either initiate or complete the boot
procedure.
Use the troubleshooting data and procedures in this section to isolate hardware
failures within the A-Class server. Do not use this information to troubleshoot
external peripheral device problems. Refer to the suspect peripheral's
documentation for troubleshooting assistance.
Procedures for evaluating and describing messages, codes, and indicators are
contained in the subsections listed below:
A-Class Server Selftest Failures/Warnings
A power-on selftest is conducted each time power is applied to the server.
Failures that occur at this point will either prevent selftest from completing
or, upon initial completion of selftest, display warnings on the console.
 |
 |
 |
 |
 |
NOTE: Warnings will include a brief description of the fault. |
 |
 |
 |
 |
If the selftest fails before any output appears on the console, the three front
panel LEDs (Power, Heartbeat, and LAN connection, shown below) will blink in
patterns to identify which section of selftest failed. To troubleshoot selftest
failures using the front panel LEDs, proceed to the next section.
To troubleshoot selftest failures by chassis code analysis, refer to the
"Chassis Code Summary" section.
Troubleshooting with Light-Emitting Diode (LED) Interpretation
The LED icons shown above are physically located on the right-hand side of the
server, when you are facing the front. The icon on the far right is a green
circle that, under normal conditions, emits a steady light when server power is
on. The middle icon of the three is an amber heart shape that emits, under normal
conditions, a "heartbeat" blinking light when the server is operating normally.
The left-most icon is also amber and represents the Local Area Network (LAN) signal.
Under normal conditions, the LAN icon blinks irregularly as dictated by LAN signal
activity. Additional selftest information is provided in the sub-sections that follow.
Second Level Cache/RAM Memory Module Faults
Second Level Cache Memory Module Fault.
This fault occurs when a Second Level Cache (SLC) failure prevents the system
from completing selftest. Chassis codes provided by the HSC Remote Management
card are useful in troubleshooting this type of error. For example, the fault
FLT 2120 chassis code message indicates: "Second Level cache selftest" failure.
 |
 |
 |
 |
 |
NOTE: SIMM is an acronym for Single Inline Memory Module. A SIMM has
components on one side of the card, only.DIMM is an acronym for Dual Inline
Memory Module. A DIMM has components on both sides of the card.The acronym SIMM
will be used throughout this section to refer to either SIMM or DIMM. |
 |
 |
 |
 |
Electrostatic Discharge Precautions.
The procedures in this section require opening the server and exposing the system
to electrostatic discharge. Always observe all electrostatic precautions when
working with components inside or out of the server. Failure to follow these
precautions may result in component damage or loss of system reliability.
Use a grounding mat and an anti-static wrist strap.
Wear the anti-static wrist strap to ensure that any accumulated
electrostatic charge is discharged from your body to ground.
Before You Do Anything...
Remove the top of the server by unscrewing the knurled captive screws
on each side of the rear of the server. Slide the top back, lift it off,
and set it aside.
Single SLC SIMM Troubleshooting.
To troubleshoot SLC faults to a single SIMM, a known-good SLC SIMM is required.
Install the known-good SLC SIMM in slot A and install one of the original
SLC SIMMs in slot B of the pair that failed. Power up and observe front panel
LEDs.
If the fault does not recur, the problem was caused by the SIMM that is
not presently installed. Boot the system and resume normal operations. If
the fault recurs, proceed to step 3.
Replace the SLC SIMM in slot B with the other original SLC SIMM. Power
up and observe front panel LEDs.
If the fault does not recur, the problem is with the SLC SIMM previously
installed in slot B. Boot the system and resume normal operations.
If the fault recurs, the problem is with the system board. To change the
system board, you must replace the A-Class Exchange Base Unit (EBU). Refer
to the
"Replacing an A-Class Server Exchange Base Unit (EBU)" section.
Random Access Memory (RAM) Module Fault.
This fault occurs when a RAM failure prevents the system from completing selftest.
Chassis codes provided by the HSC Remote Management card are useful in
troubleshooting this type of error. For example, FLT 7xxx indicates a failure in
the memory selftest.
 |
 |
 |
 |
 |
NOTE: SIMM is an acronym for Single Inline Memory Module. A SIMM has
components on one side of the card, only.DIMM is an acronym for Dual Inline
Memory Module. A DIMM has components on both sides of the card.The acronym SIMM
will be used throughout this section to refer to either SIMM or DIMM. |
 |
 |
 |
 |
Electrostatic Discharge Precautions.
The procedures in this section require opening the server and exposing the system
to electrostatic discharge. Always observe all electrostatic precautions when
working with components inside or out of the server. Failure to follow these
precautions may result in component damage or loss of system reliability.
Use a grounding mat and an anti-static wrist strap.
Wear the anti-static wrist strap to ensure that any accumulated
electrostatic charge is discharged from your body to ground.
Before You Do Anything...
Remove the top of the server by unscrewing the knurled captive
screws on each side of the rear of the server. Slide the top back, lift it
off, and set it aside.
General RAM Module Troubleshooting.
List which size SIMMs
are installed in which slots. Remove all RAM SIMMs except for
slot 0a and 0b (0a/b). Plug in and power up the server, and observe
the front panel LEDs. If fault does not recur, the
SIMMs installed in slot 0a and 0b are not the cause of the RAM SIMM fault.
Power down the server, refer to the memory configuration list (step
1) and install the next pair of SIMMs. Power up the server and observe
the front panel LEDs. Repeat step 3 until the RAM SIMM
fault recurs. Note which pair of SIMMs caused the RAM SIMM failure.
Either replace both memory SIMMs or continue to troubleshoot to
a single SIMM. For example: When RAM was reinstalled in slots 2a/b, the RAM
SIMM fault returned. Therefore, the problem is with one of the two
RAM SIMMs installed in slot 2a/b. Single
RAM Module Troubleshooting.To troubleshoot RAM faults to a single SIMM, a known-good
RAM SIMM is required. Install the known-good
RAM SIMM in slot A and install one of the original RAM SIMMs in
slot B. Power up the server and observe the front panel LEDs. If the fault does not recur,
the problem was caused by the SIMM that is not presently installed. Boot
the system and resume normal operations. If the fault recurs, proceed
to step 3. Power down the server and replace
the RAM SIMM in slot B with the other original RAM SIMM. Power up
and observe front panel LEDs. If the fault does not recur,
the problem is with the RAM SIMM that was previously installed in
slot B. Boot the system and resume normal operations. If the fault recurs, the problem
is with the system board. To change the system board, you must replace
the A-Class Exchange Base Unit (EBU). Refer to the "Replacing
an A-Class Server Exchange Base Unit (EBU)" section.
I/O
Subsystem or I/O Board FaultThis fault occurs when either an HSC or PCI I/O board prevents
the system from completing selftest. Chassis codes provided by the
HSC Remote Management card are useful to troubleshoot this type
of error. For example: FLT 8xxx indicates an I/O card failure. To troubleshoot I/O Subsystem or I/O Board Faults using the
front panel LEDs, follow the procedure listed below: Electrostatic
Discharge Precautions.The procedures in this section require opening the server
and exposing the system to electrostatic discharge. Always observe
all electrostatic precautions when working with components inside
or out of the server. Failure to follow these precautions may result
in component damage or loss of system reliability. Use a grounding
mat and an anti-static wrist strap. Wear the anti-static wrist
strap to ensure that any accumulated electrostatic charge is discharged
from your body to ground.
Before
You Do Anything... Remove the top of the server
by unscrewing the knurled captive screws on each side of the rear
of the server. Slide the top back, lift it off, and set it aside.
Document the I/O configuration
(write down which type of I/O board is installed in which of the
I/O slots). Remove all I/O boards. Power
up the server and observe the front panel LEDs. If the fault does not recur and
only a single I/O board was removed, that I/O card is the cause
of the I/O Subsystem or I/O Board Fault. Replace that I/O board,
power on the server, and observe the front panel LEDs. If the problem
recurs, proceed to step 8. If the fault does not recur and
two I/O boards were removed, install the bottom I/O card, power
up the server, and observe the front panel LEDs. If the fault recurs, the I/O
card in the bottom slot is the cause of the I/O Subsystem or I/O Board
Fault. Replace that I/O board. If the fault does not recur,
install the second I/O card, power up the server, and observe the
front panel LEDs. If the problem recurs, replace
the top I/O card. If replacing the I/O card does
not clear the I/O Subsystem or I/O Board Fault, the problem is with
the system board. To change the system board, you must replace the A-Class
Exchange Base Unit (EBU). Refer to the "Replacing
an A-Class Server Exchange Base Unit (EBU)" section.
System
Board Fault, High Priority Machine Check (HPMC), or Unknown Fault.This fault occurs when the system board has an irrecoverable
fault or an HPMC prevents the system from completing selftest. Chassis
codes provided by the HSC Remote Management card are useful for
troubleshooting this type of error, because HPMCs generate many
chassis codes and some chassis codes indicate a specific fault. To troubleshoot I/O Subsystem or I/O Board Faults, power on
the server and observe both the front panel LEDs and the console.
If the server does not boot to the point where output displays on
the console, the system board is the problem. To change the system board,
you must replace the A-Class Exchange Base Unit (EBU). Refer to
the "Replacing
an A-Class Server Exchange Base Unit (EBU)" section. If power cycling the server clears the fault, continue troubleshooting
by entering the "ser pim" command at the firmware
main menu screen. Check the timestamp on the PIM data to see if
the time recorded corresponds to the time of the failure. For assistance
with decoding an HPMC, contact Hewlett-Packard. I/O Subsystem or I/O Board faults occur when an HPMC, in response
to an I/O failure, prevents the system from completing selftest.
Chassis codes provided by the HSC Remote Management card are useful
in troubleshooting this type of error, because HPMCs generate many
chassis codes and some chassis codes indicate a specific fault.
To troubleshoot an I/O HPMC fault, refer to the "I/O
Subsystem or I/O Board Fault" section. If power cycling the server clears the fault, continue troubleshooting
by using the "ser pim" command at the firmware
main menu screen. Check the timestamp on the PIM data to see if
the time recorded corresponds to the time of the failure. For assistance
with decoding an HPMC, contact Hewlett-Packard. RAM faults occur when an HPMC, in response to a RAM failure,
prevents the system from completing selftest. Chassis codes provided
by the HSC Remote Management card are useful in troubleshooting
this type of error. To troubleshoot a RAM HPMC fault, refer to the "Random
Access Memory (RAM) Module Fault." section. If power cycling the server clears the fault, continue troubleshooting
by using the "ser pim" command at the firmware
main menu screen. Check the timestamp on the PIM data to see if
the time recorded corresponds to the time of the failure. For assistance
with decoding an HPMC, contact Hewlett-Packard. Second
Level Cache Memory HPMC Fault.SLC faults occur when an HPMC, in response to a SLC failure,
prevents the system from completing selftest. Chassis codes provided
by the HSC Remote Management card are useful in troubleshooting
this type of error. If power cycling the server clears the fault, continue troubleshooting
by using the "ser pim" command at the firmware
main menu screen. Check the timestamp on the PIM data to see if
the time recorded corresponds to the time of the failure. For assistance
with decoding an HPMC, contact Hewlett-Packard. Firmware
Warning Messages |  |
Firmware is the name given to the system boot instructions
and selftest software that is imbedded into a computer chip instead
of stored on disk. Firmware also includes warnings and error messages
that are displayed on the console when selftest finds an error.
Those warnings and messages, and a brief description or action requirement
of each, is shown in the table included below: Firmware Warning Messages | Description/Action Required |
---|
WARNING: Stop boot flag set. System cannot boot. | This message accompanies those warnings
that prevent the system from booting. Look up the accompanying message
and take appropriate action. | WARNING: Not enough memory to boot the OS. | Ensure that the minimum amount of memory
is installed | WARNING: Setting DEFAULTS has failed. | The MFIOC chip failed. This failure requires replacing
the EBU. | WARNING: Memory has been initialized
but not tested as a result of FASTBOOT being enabled. To test memory,
use the 'FASTBOOT' command in the CONFIGURATION menu and reboot
the system. | Disable FASTBOOT by the 'Fastboot OFF' command
in the CONFIGURATION menu and reboot. | WARNING: The processor has failed selftest due
to a co-processor failure. | This condition requires replacing the
EBU. | WARNING: The processor has failed selftest. | A processor selftest failure requires
replacing the EBU. | WARNING: One or more memory banks were not
configured due to a SIMM size mismatch or a SIMM failure. For more
details, use the MEMORY command in the INFORMATION menu. | Use the MEMORY command in the INFORMATION
menu to determine correct configuration. | WARNING: The Bus Converter has failed. | A Bus Converter failure requires replacing
the EBU. | WARNING: FAN FAILURE HAS BEEN DETECTED.
THE SYSTEM WILL BE POWERING DOWN. PLEASE CALL YOUR SERVICE REPRESENTATIVE | A chassis fan failure requires replacing
the EBU. | ERROR: A3342A AP Card must be in bottom slot
only. | Move the AP A3342A card to the bottom
slot. | ERROR: HSC card in wrong slot. Move to bottom
slot. | A mix of HSC and PCI cards has been detected. The
HSC card must be installed in the bottom slot. |
Chassis
Code Summary |  |
Operating Status (OSTAT) and chassis codes are generated by
A-Class server firmware. OSTAT and chassis codes can be read from
stored locations or viewed 'real time' via the HSC Remote Management
card. Chassis codes displayed during selftest only appear at the
console. They will appear on either the ASCII console or the Web
console. Below is an example of how chassis codes appear in response
to a 'control b' key sequence: The HSC Remote Management card shows status information on
the console just above the function key display (shown above) and
is followed by the CM> prompt. The information displayed
is shown below: OSTAT | Chassis Code | REMOTE: | status | activity | password | ACCESS FAULT: |
HSC Remote Management status information is defined as follows: Status Code | Definition |
---|
OSTAT | Operating STATus. Values can be OFF,
FLT, TEST, INIT, SHUT, WARN RUN and ALL. | Chassis | A four digit field used in conjunction
with OSTAT to identify system status. The first digit of a chassis
code is the Major Code Category value. | REMOTE: | REMOTE modem port. Three fields describe
the Remote modem port. enabled/disabled:
Remote modem port is connected or not connected. active/inactive: Remote modem
port is working or idle. single/multiple: Number of
attempts to enter the password.
| ACCESS FAULT: | Number of failed attempts to access the
HSC Remote Management Card. |
For more information on the HSC Remote Management card, refer
to the HSC Remote Management/Access Port Card information in the
Reference section. During selftest, OSTAT and chassis codes are generated and
stored by the server firmware. These chassis codes can be viewed
even if an HSC Remote Management card is not installed. To view
power on chassis codes, either type ser cc at the firmware main menu, or change to the Service Menu
and type cc. A display of OSTAT and chassis codes are shown in the following
example: Chassis codes used in conjunction with the OSTAT value represent
the status of the system: Service Menu: Enter command > cc CHASSIS CODES INFORMATION Chassis Code INIT C4CC INIT C4CD INIT 3002 TEST 30BC INIT 30BC INIT C300 TEST 1030<Press any key to continue (q to quit)> qService Menu: Enter command > OSTAT values
of TEST and INIT are common during selftest. OSTAT values of RUN and SHUT
are common when the HP-UX operating system is running. OSTAT values FLT, OFF, and
WARN are used to indicate when the server firmware has failed a
test or detected a problem that does not keep selftest from finishing.
If a fault prevents the server from completing selftest, the
OSTAT FLT is used and the Chassis Code displays the test number
that failed. If the server stops responding while performing a test,
the OSTAT value will be INIT or TEST and the chassis code represents
the test that was running when the server failed. Refer to the table below to determine which tests were active
when the system either faulted or stopped responding during selftest,
and the corrective action to take in response to the problem.
Troubleshooting
the ASCII Console |
 |
The ASCII Console is typically a "dumb" monochrome
terminal that serves as the communication link between the A-Class
server and the system operator. At power on, the server's
selftest software tests server internal components and external
peripherals to determine the operational status of each. The ASCII
Console displays the operational status of all components and peripherals,
including its own, on the console screen. The screen will display
selftest data output in one of two ways: If
an HSC Remote Management card is installed, the output appears at
the bottom of the screen and is updated until selftest is complete
and the firmware Main Menu screen appears. If an HSC Remote Management
card is not installed, there will be no output to the console screen
until selftest completes.
If the ASCII console does not respond to input or does not
display any output: Make sure that the LAN Web
Console is NOT connected. If it is, instead of going to the ASCII
console, output will re-directed to the LAN Web Console. Make sure the keyboard is
correctly connected to the ASCII console. Cycle power to the ASCII
console. Return configuration settings
to default by pressing the appropriate soft keys on the keyboard.
Consult the Operator's Manual for the specific soft keys
to press and the correct sequence. When the correct soft key sequence
is entered, the DATACOM settings will be: Baud: 9600 | Parity/DataBits: None/8 | EnqAck: Yes | Asterisk: OFF | Chk Parity: No | SR(CH): Lo | RecvPace: Xon/Xoff | XmitPace: None | CS(CB)Xmit: No |
Make sure REMOTE MODE is
ON (asterisk appears in REMOTE MODE block on the screen) and AUTO
Line Feed is OFF (asterisk does NOT appear in AUTO LF block on the
screen) Replace the ASCII console
with a known good console. Try the LAN Web console.
If the LAN Web console works but the ASCII console does not, or
neither console works, the system board is the problem. To change
the system board, you must replace the A-Class Exchange Base Unit
(EBU). Refer to the "Replacing
an A-Class Server Exchange Base Unit (EBU)" section.
If you have an HSC Remote Management card installed and the
ASCII console works but the control B function does not, check the
position of the SERVICE/NORMAL switch on the HSC Remote Management
card. This switch must be in the SERVICE position to enable the
control B function. If the control B function does not restore,
replace the HSC Remote Management card. Refer to the "A-Class
Server I/O Card Removal and Replacement " section.  |  |  |  |  | NOTE: The control B function is unique to the HSC Remote Management
card. If you do not have an HSC Remote Management card installed,
the control B function is not available. |  |  |  |  |
Troubleshooting
the Secure Web Console |  |
The Secure Web Console is typically a "dumb" monochrome
terminal that can serve as the communication link between the A-Class
server and the system operator, as a substitute for the ASCII Console.
At power on, the server's selftest software tests server internal
components and external peripherals to determine the operational
status of each. The Secure Web Console displays the operational
status of all components and peripherals, including its own, on
the console screen. The screen will display selftest data output
in one of two ways: If
an HSC Remote Management card is installed, the output appears at
the bottom of the screen and is updated until selftest is complete
and the firmware Main Menu screen appears. If an HSC Remote Management
card is not installed, there will be no output to the console screen
until selftest completes.
To troubleshoot the Web console: Make sure that
a LAN cable is inserted into the LAN Web Console connector on the rear
of the server. Refer to Chapter 2 "A-Class
Server Installation". Log in to the Web console
as the Administrator and check the DATACOM configuration. It should
be: Baud Rate: | 9600 | Parity/Data Bits: | None/8 | Parity Checking: | Enable should be off | Recvpace/Xmitpace: | Xon/Xoff should be off |
 |  |  |  |  | NOTE: Remember to use SAVE to retain any changes that
you make. |  |  |  |  |
Reset the Web console. Type
the IP address of the Web console in your web browser's
location window and press Enter. When the Web console page
appears, log in. Click on the Reset Web Console option.
 |  |  |  |  | NOTE: Resetting the Web console will log any other users off
of the system. |  |  |  |  |
Ping the IP address of the
Web console to make sure it responds. If it does not respond, contact
your network administrator. Reset the Web console to
its default configuration by pressing the Web Console Reset button
while toggling the server's power switch to OFF then ON.
Do not release the Web Console Reset button until selftest completes
(the only front panel LED lit is the green POWER LED). Connect an ASCII console
to the RS232 port on the rear of the server and remove the LAN cable
from the LAN Web console port on the rear of the server. Refer to
the "Troubleshooting
the ASCII Console" section for ASCII
Console operation tips. If the ASCII console works, but the Web
console does not, the problem is with the system board. To change
the system board, you must replace the A-Class Exchange Base Unit
(EBU). Refer to the "Replacing
an A-Class Server Exchange Base Unit (EBU)" section. If neither the ASCII nor
the Web consoles work, the system board is the problem in this case,
also. Refer to the "Replacing
an A-Class Server Exchange Base Unit (EBU)" section.
 |  |  |  |  | NOTE: If you can access the console but do not get a prompt,
it may be that you do not have write access to the console. Multiple
users may have access to the console (read access) at the same time,
but only one user can write to the console at a time. To see which
user has write access to the console, look at the Web console main
screen. The user whose name is underlined has write access to the
console. Write access can only be taken, not given. To acquire write
access, press ctrl F12. Your username will now be underlined and
you will be able to enter commands into the Web console system.
Everyone else is excluded from writing to the console, but everyone
on the system has read access. |  |  |  |  |
Troubleshooting
Embedded Disks |  |
The A-Class server will support a maximum of two internally
installed and connected disk drives, commonly referred to as embedded
disks. Embedded disks contain both the server's startup
(bootstrap) software and Operating System software. Though embedded disks
are installed internally, the server's front panel LEDs
do not reflect disk status and therefore will not blink in a code
pattern to indicate an embedded disk selftest failure. Also, embedded
disks do not have an individual fault LED to show operational status. If the system cannot boot from an embedded disk, an Input/Output
Dependent Code (IODC) error occurs. The example provided below starts
with entry of the boot command and shows the messages that will
be displayed on the console.
Main Menu: enter command > bo pri
Interact with IPL (Y, N, or Cancel)? > y
Booting... Cannot find ENTRY_TEST
Failed to initialize.
ENTRY_INIT
Status = -10
FFFFFFF6 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000012 00000000 00000000 454E4841 00000000 00000000 454E4841
00000000 00000000 00000000 00000000 00000100 00000000 00000000 00000000
Failed to Initialize
To troubleshoot a recurring embedded disk fault: Use a mapping
tool to verify that the embedded disk is recognized by the server's bootstrap
software. To verify server recognition, type sea at the firmware Main Menu prompt. The server will display
a list of bootable devices on the console. Check this list for the
paths of embedded disks: 8/16/5.6 and 8/16/5.5. If the embedded disk is not
recognized by the server, make sure that power is applied to the
disk. You can refer to either the "A-Class
Server Disk Drive Removal and Replacement " section, or look at the label under the top of the
server cover, to locate the power cable. Make sure that all the disks
on the same bus have unique addresses. Refer to the label on top
of the disk housing for an address jumper diagram. Check disk configuration
jumper settings. Make sure the TERMINATION ENABLED jumper is removed. If steps 1-4 do not correct
the problem, replace the embedded disk.
Off-line Diagnostic Environment (ODE) and Support Tools Manager
(STM) software can also be used to troubleshoot embedded disk faults.
Use mapping utilities (Mapper) to verify that the disks are recognized
by the server. Password-protected disk expert tools are available
via ODE and STM.  |  |  |  |  | NOTE: Only licensed self-maintenance technicians and HP service
personnel have access to the diagnostic passwords required for access
to ODE and STM software, and disk expert tools. |  |  |  |  |
Troubleshooting
LAN |  |
When connected to a hub, the 10 Base-T LAN on A-Class servers
should automatically negotiate the proper speed. If this auto-negotiation
fails, the server will not connect to the hub. Should this symptom
occur, replace the EBU. Be sure to use an EBU with part number A5182-69101
or later.
|