HP 3000 Manuals

Software Resiliency Changes [ COMMUNICATOR 3000 MPE MPE/iX RELEASE 4.0 ] MPE/iX Communicators


COMMUNICATOR 3000 MPE MPE/iX RELEASE 4.0

Software Resiliency Changes 

by Donna Gracyk 
Commercial Systems Division 

In a continuing effort to improve the high availability of the MPE/iX
operating system and make the system more resilient to software and
hardware failures, changes have been made in this release of MPE/iX. This
article discusses two areas of the operating system where software
resiliency changes have been made.

KSAM XL 

Changes have been made in KSAM XL to better handle the detection of fatal
errors (for example, detecting a bad pointer in a corrupt key index
structure).  Instead of aborting the system when an unexpected error is
encountered during the opening of the KSAM file or during an FGETKEYINFO,
KSAM XL sets a field in the KSAM control block, an internal data
structure which is part of the KSAM XL file itself, to indicate that an
internal error was detected.

For recovery purposes, KSAM XL allows you to read from a KSAM XL file
even when the internal error field is set, but the attempt may not be
successful.  KSAM XL does not allow users to write to the file when the
internal error is set to prevent any file corruption (or any further
corruption if the file is already corrupt).  Any attempts to write to a
file with an internal error result in the following file system error:

     KSAM INTERNAL ERROR  (FSERR 175)

Since a file system error is returned in some additional cases now, it is
important to make sure application programs always check the condition
code returned from intrinsic calls.  For more information on recovering a
corrupt KSAM XL file, refer to the "Recovering from Index Corruption"
section in the Using KSAM XL Reference Manual (32650-90168).

VOLUME MANAGEMENT 

Changes have also been made in the area of volume management to make the
system more resilient to fatal errors detected when mounting a user
volume.  MPE/iX can now detect more error conditions that would have
previously brought the system down when mounting a user volume and will
now cause the following error messages to be displayed on the console:

      *** ERROR MOUNTING VOLUME ***
         COULD NOT MOUNT VOLUME ON LDEV x.   INFO  xxxx;   SUBSYS xxxx.
         VOLUME WILL BE MOUNTED AS ERROR VOLUME.

An example of an error that would have previously caused a system abort
is a disk hardware problem, which caused corruption in the disk free
space map that was detected during the volume mount.

The output from a DSTAT command lets you know if an error was detected
during the mount of a user volume.  The STATUS column contains the status
ERROR-MOUNT as shown below:

     SYSA (PUB.SYS): DSTAT ALL
       LDEV-TYPE    STATUS      VOLUME (VOLUME SET - GEN)
      -----------  ---------   ---------------------------
       1- 079371     MASTER      MEMBER1         (MPEXL_SYSTEM_VOLUME_SET-0)
       2- 079371     MEMBER      MEMBER2         (MPEXL_SYSTEM_VOLUME_SET-0)
       3- 079371     MEMBER      MEMBER3         (MPEXL_SYSTEM_VOLUME_SET-0)
       4- 079371     MEMBER      MEMBER4         (MPEXL_SYSTEM_VOLUME_SET-0)
       5- 079371     MEMBER      MEMBER5         (MPEXL_SYSTEM_VOLUME_SET-0)
      35- 079370     MEMBER      MEMBER35        (MPEXL_SYSTEM_VOLUME_SET-0)
      50- 022040     MASTER      MEMBER1         (DEPTB_VS-0)
      51- 022040     ERROR-MOUNT MEMBER2         (DEPTB_VS-0)



MPE/iX Communicators