HP 3000 Manuals

Structure of Each File Organization [ Micro Focus COBOL System Reference, Volume 2 ] MPE/iX 5.0 Documentation


Micro Focus COBOL System Reference, Volume 2

Structure of Each File Organization 

The following sections describe the physical structure of the four data
file organizations.

Sequential Organization 

Sequential files are intended to cater for binary data.  These files
consist of a series of either fixed or variable length records.  The
order of records in these files is set by the order of WRITE statements
when the file is created.  The record order does not change once it has
been set.  New records are added to the end of the file.  Each record in
a record sequential file (except the first record) has a unique record
which precedes it, while each record (except the last record) also has a
unique record that follows it.

Sequential files that are fixed length and are not destined for the
printer have no record delimiter; the end of one record is immediately
followed by the beginning of the next.

Print files cannot be read easily by a COBOL program if you want to
recover print lines.

You can request special processing to take place on the output of print
files.  To do this, specify the LINE ADVANCING clause in the SELECT
statement.  This is performed automatically if the ASSIGN TO PRINTER
clause is used.  This causes:

   *   Trailing spaces to be discarded in output records

   *   Each print record to be terminated by a delimiter which is
       operating system dependent.  See the sections detailing operating
       environment specific information below.

   *   The OPEN statement to add a delimiter which is operating system
       dependent.  See the sections detailing operating environment
       specific information below.

   *   Each WRITE statement without any BEFORE or AFTER clause to behave
       as if you had specified the AFTER 1 clause.

We recommend that you use either the LINE ADVANCING option for all files
which you intend to print or, alternatively, specify either the BEFORE or
AFTER clause in every WRITE statement for that file.


NOTE * You should never use the BEFORE or AFTER clauses for data files which you do not intend to print. * You should not open files destined for the printer for either INPUT or I/O.
Sequential Organization on DOS, Windows and OS/2 Systems. If you specify the ASSIGN TO PRINTER clause: * each print record is terminated by the two-byte delimiter x"0D0A". * the OPEN statement adds the two-byte delimiter x"0D0A" to the file. Each print record has one or more of a line-feed character (x"0D0A"), a form-feed character (x"0C") or a vertical tab character (x"0B") added before or after the print record depending on whether you specified the BEFORE or AFTER clause in the WRITE statement. Sequential Organization on UNIX Systems. If you specify the ASSIGN TO PRINTER clause: * each print record is terminated by the single-byte delimiter x"0D". * the OPEN statement adds the single-byte delimiter x"0D" to the file. Each print record has one or more of a line-feed character (x"0A"), a form-feed character (x"0C") or a vertical tab character (x"0B") added before or after the print record depending on whether you specified the BEFORE or AFTER clause in the WRITE statement. Fixed Format Sequential Structure. In a fixed format sequential file, each record immediately follows the previous record in the file. Each record is the same length as the maximum length record. No additional control characters are written to the file unless you use WRITE BEFORE or WRITE AFTER ADVANCING statements. In this case, carriage return and line feed characters are added as required to ensure correct positioning when the data file is printed. If a file containing such characters is used as input, you may find that records are not read as expected. +--------------------------------------------+ | Fixed length record | +--------------------------------------------+ | Fixed length record | +--------------------------------------------+ . . . . +--------------------------------------------+ | Fixed length record | +--------------------------------------------+ Variable Format Sequential Structure. A variable format sequential file is the simplest form of the variable structure defined above. Each record written is preceded by a record header containing the length of the record; the record is written at the length defined in the program; the file contains a standard variable structure file header record. Up to three padding characters can follow a record to ensure that the next record starts on a four-byte boundary. +-----------------------------------------------+ | File Header record - 128 bytes | | | +--------+-----------------------------+---+----+ | Header | Variable length record | | +--------+-----------------------------+---+--------+---+ | Header | Variable length record | | +--------+------------------------------------------+---+ . . . . . . . . +--------+--------------------------------------+---+ | Header | Variable length record | | +--------+--------------------------------------+---+ Line Sequential Organization Line sequential files are ASCII text files such as those produced by editors and other similar utilities. Because all trailing spaces are removed from the program's record area when the record is written, the records are of variable length. Hence, the length of the record actually written to file is not determined by the length of the record definition used in the WRITE statement. A record delimiter is written after every record. The delimiter character(s) vary depending on your operating environment. See the environment specific sections for line sequential files below for further information. On input, the delimiter is removed and the record area is padded out with spaces as necessary. Inclusion of ADVANCING phrases other than BEFORE 1 causes control characters in addition to the delimiter to be output. If a record in the file is longer than the maximum length defined in the program reading the file, then each access returns "maximum-length" characters until the end of the record. Any file that is to be output to a printer can have LINE SEQUENTIAL or LINE ADVANCING organization. If you specify sequential organization, use the optional BEFORE or AFTER phrase for every WRITE statement to print the file successfully. Characters with a value less than x"20" (space) are written with a preceding null character (x"00") to show that they are text characters rather than formatting characters. On input, the preceding null characters (x"00") are stripped away. You can prevent this insertion of null characters when writing the file by using the -N run-time switch on the command line at run time. You can also switch off insertion of null characters for individual line sequential files by using a call to Function 47 of routine x"91", the interprogram function call. If you use the -N run-time switch to read a file written with the +N run-time switch, these null characters are not stripped away. We recommend that if you store non-ASCII characters in line sequential files you ensure that the N run-time switch is on. Any tab characters in a line sequential file (x"09") are expanded to every eighth character position during a READ (that is, the character following a tab will lie in one of the columns 9, 17, 25, 33, and so on). You can compress space characters to tabs during output by using the +T run-time switch on the command line or by using a call to Function 49 of routine x "91". See the chapter Library Routines (Call-by-Number) for more information about the x"91" routine. See the chapter Running and the appendix Descriptions of Run-Time Switches for more information about the N and T run-time switches. The following sections give environment specific information for line sequential files. Line Sequential Files on DOS, Windows and OS/2 Systems. The record delimiter x"0D0A" is used for DOS, Windows and OS/2 systems. Any single byte x"1A" (user terminate run code) is used as an unconditional file terminator (except when preceded by a null character, as described below). If no x"1A" character is encountered, the physical end of the file serves as the file terminator. When the file is closed, a terminating x"1A" character is NOT written. Instead, the length of the file is used to determine where it ends. If the next two characters after such an access are x"0D0A", a blank line is not returned on the next access. Only the x"0D" acts as a record delimiter. Additional device control characters (such as x"0A", x"0B", x"0C") are discarded. x"1A" acts as a record delimiter and also denotes the end of the file. If you turn the N run-time switch off, you must make sure that any COMP data does not contain bytes with a value of x"1A" (end-of-file character) or x"0D" (record delimiter). Tab characters can be correctly interpreted by your personal computer's graphics printer. Line Sequential Files on UNIX Systems. The record delimiter on UNIX systems is a single byte x"0A" (the default). However, for line sequential and relative files only, this default record delimiter can be changed to that used by OS/2. If you turn the N run-time switch off, you must make sure that any COMP data does not contain bytes with a value of x"0A" (record delimiter). Line Sequential Structure. +----------------------------------------------+---------+ | Variable length record |delimiter| +---------------------------------+---------+--+---------+ | Variable length record |delimiter| +---------------------------------+---------+--+---------+ | Variable length record |delimiter| +----------------------------------------------+---------+ . . . . . . . . . +----------------------------------------------+---------+ | Variable length record |delimiter| +----------------------------------------------+---------+ Relative Organization Relative file organization allows you to access any record randomly by specifying its ordinal position within the file. Data held in relative files can consist of fixed or variable format records which are of fixed length, the length being the length of the longest record defined for the file. This is necessary so that the COBOL file handling routines can quickly calculate the physical location of any record given its record number within the file. Each record is uniquely identified by a record number. The first record in the file is record number one, the second record is number two, and so on. Each record is followed by a record marker unless it is a variable length file which indicates the current state of the record. In a variable format file, the marker follows the fixed length slot. The marker varies depending on your environment. See the environment specific information sections for relative files below for further information. When you delete a record from a relative file, the only action is to change that record's marker. However, the contents of a deleted record physically remain in the file until a new record is written. If, for security reasons, you want to make sure that the data does not exist in the file, then you must overwrite the record using the REWRITE statement before you delete it. A fixed format relative file can be processed as a fixed format sequential organization file by defining the maximum record length to be larger than that for the relative file (see the sections on operating environment specific information for details). A variable format relative file cannot be processed as a sequential organization file. The length of a relative file is determined by the largest record number used when actually writing a record to the file. Relative File Organization on DOS, Windows and OS/2 Systems. On DOS, Windows and OS/2 systems, the current state of the record is indicated by a two-byte marker as follows: Marker (hex) Description ------------------------------------------------------- 0D0A Record present 0D00 Record deleted or never written. A fixed format relative file can be processed as a fixed format sequential file by defining the maximum record length to be two characters larger than that for the relative file. The size of a relative file on DOS, Windows and OS/2 systems is calculated as follows. Fixed format: (max-rec-len + 2) * largest-record-number Variable format: 128 + (max-rec-len + 2 + header) * largest-record-number where header is 2 if max-rec-len is less than 4096, otherwise header is 4. Relative File Organization on UNIX Systems. On UNIX systems, the current state of a record for fixed length relative records is indicated by a one-byte marker as follows: Marker (hex) Description ------------------------------------------------------- 0A Record present 00 Record deleted or never written The current state of a record for variable length relative records is indicated by a two-byte marker as follows: Marker (hex) Description ------------------------------------------------------- 0D0A Record present 0D00 Record deleted or never written A fixed format relative file can be processed as a fixed format sequential file by defining the maximum record length to be one character larger than that for the relative file. The size of a relative file on UNIX systems is calculated as follows. Fixed format: (max-rec-len + 1) * largest-record-number Variable format: 128 + (max-rec-len + 2 + header) * largest-record-number where header is 2 if max-rec-len is less than 4096, otherwise header is 4. Fixed Format Relative Structure. A fixed format relative file is the same as a fixed format sequential file, except each record is followed by a record marker. +-------------------------------------------+------+ | Fixed length record - Record 1 |marker| +-------------------------------------------+------+ | Fixed length record - Record 2 |marker| +-------------------------------------------+------+ . . . . . . +-------------------------------------------+------+ | Fixed length record - Record i deleted |marker| +-------------------------------------------+------+ . . . . . . +-------------------------------------------+------+ | Fixed length record - Record j - unused |marker| +-------------------------------------------+------+ . . . . . . +-------------------------------------------+------+ | Fixed length record - Record n |marker| +-------------------------------------------+------+ For relative files in random access, writing records 1, 2 and 9 will occupy the same disk space as creating a file containing records 1, 2 and 3 on UNIX systems.(UNIX) Variable Format Relative Structure. A variable format relative file follows the basic variable structure defined earlier in this appendix. However, each record is placed into a fixed length slot, the length of the slot being the length of the longest record defined, together with the header and terminator characters. The record header for each record contains the length of the logical record written, not the length of the physical fixed length slot. Each slot is followed by a two-byte record marker. +----------------------------------------------------+ | File Header record - 128 bytes | | | +-------+--------------------------------+------+----+ |Header |Variable length record-Record 1 | pad |0D0A| +-------+--------------------------------+------+----+ |Header |Variable length record-Record 2 |0D0A| +-------+--------------------------------+------+----+ |Header |Variable length record-Record 3 | pad |0D0A| +-------+--------------------------------+------+----+ . . . . . . . . +-------+---------------------------------------+----+ |Header |Variable length record-Record i delete|0D00| +-------+---------------------------------------+----+ . . . . . . . . +-------+---------------------------------------+----+ |Header |Variable length record-Record j unused |0D00| +-------+---------------------------------------+----+ . . . . . . . . +-------+-------------------------------+-------+----+ |Header |Variable length record-Record n| pad |0D0A| +-------+-------------------------------+-------+----+ Indexed Organization Indexed files consist of a series of fixed or variable length records. An indexed file is implemented as two separate files; the data file and the key file. Variable length records are handled by the variable length file handler supplied with this COBOL system. For all file formats other than C-ISAM, both the data and index files are of the variable structure defined in the section Variable Format Sequential Structure earlier in this chapter. See the section Indexed Organization on UNIX Systems for details of C-ISAM and fixed length record. When you name the file, the name is given to the data file; the name of the associated index file is produced by adding a .idx extension to the data file name. For example: Data file Index file ------------------------------------------------- myfile myfile.idx clock.fle clock.fle.idx(UNIX) clock.fle clock.idx (DOS, Windows and OS/2) You should avoid using the .idx extension in other contexts. The index is built up as an inverted tree structure that grows in height as records are added. The number of key file accesses required to locate a randomly selected record depends primarily on the number of records in the file and the key-length. File I/O is faster when reading the file sequentially, but only if other indexed sequential operations do not intervene. We strongly recommend that you take regular backups of all file types. There are, however, situations with indexed files (for example, media corruption) that can lead to only one of the two files becoming unusable. If the index file is lost in this way, you can recover data records from just the data file (although not in key sequence) and, therefore, reduce the time lost due to a failure. You can recover a corrupt indexed file using a utility which rebuilds the index of the indexed file. The utility you use is operating environment dependent and is referred to in each of the sections covering the different operating systems below. Indexed Organization on DOS, Windows and OS/2 Systems. You can recover a corrupt indexed file using the Rebuild utility. See the chapter Rebuild for details of this utility. Indexed Organization on UNIX Systems. If you are using C-ISAM, the C-ISAM file handler handles all fixed length indexed records. The data files are in the relative format described earlier in this chapter. If the C-ISAM file handler is not the default one supplied with this system, or you have substituted your own file handler for the default as described in an add-on product, the format of the data file is dependent on file handler you are using. See your Release Notes for details of the default file handlers supplied with this COBOL system. It is possible to use an environment variable to specify that the index and data files should appear in separate directories. See the section The "&" Character in Environment Variables in the chapter External File-name Mapping (DD_) for further information. To recover an index from the data file when the indexed file has become corrupt, all unused data records are marked as deleted by adding x"00" to each record that contains LOW-VALUES. For existing records, the records are marked with the character x"0A". The recovery operation can, therefore, be performed with a simple COBOL program by defining the data file as ORGANIZATION RELATIVE ACCESS SEQUENTIAL. The records are then read sequentially, the data moved from the relative file record area into the indexed record area and written to a new version of the indexed file. Those records with LOW-VALUES in the last (extra) byte is discarded. Note that this byte (containing a line feed x"0A" in a required record) is not written to the indexed file on recovery, because of the record length discrepancy of one byte in the record definitions. You can also rebuild a corrupt index file using the fhrebuild utility. See the chapter File Handler Utilities for details of how to do this. Index File Structure. On all operating systems, an index file can have several keys. For each key defined, the index file contains an independent index, structured as a B-Tree. A leaf node in an index contains a list of key-values in ascending order, each of which points to the data record (in the data file) to which it belongs. A non-leaf node contains a list of key-values in ascending order which points to a subordinate node in which the key-value is the largest. This index structure provides the fastest possible random access to a data record using any key, as well as efficient processing of data records in sequential key order. The records in an index file are always the same length, whether they are nodes, header records or key information records. The size of the records is determined at the time the file is created and cannot subsequently be changed. The size used is configurable at the time the file is created. See the section Index Node Record later in this chapter for further details on how to do this. The index file starts with the index File Header Record, which contains information about the file. It points to the Free Space record, which is used to maintain a list of free records in the index file. The index File Header record also points to the Key Information record, which contains details of every key defined for the file, and, for each key, points to the root Index Node record of the associated index. Each of these records is described in the following sections. Index File Header Record. The File Header record is located at offset 0 within the index file. The first 128 bytes are the same as a standard variable structure file header record, except for the fields below. Index File Header record description: Offset Size Description of the field ------------------------------------------------- 0 4 Length of the file header. 39 1 Organization of the file. Always contains value 2 (for Indexed organization). 62 14 Always contains zeros. 76 1 Reserved. Set to 4. 124 4 Offset of logical end of the index file. The remainder of the index file header record contains the following fields: Offset Size Description of the field ------------------------------------------------- 132 4 Offset of logical end of data file. 136 1 Value 2. 137 1 Value 2. 138 1 Value 4. 139 1 Value 4. 140 2 Contains the number of keys defined for the file. 142 1 Value 0, or 1 for IDXFORMAT4 files. Table I-0. (cont.) Offset Size Description of the field 143 1 Value 2 or 4. Number of bytes used for occurrence numbers in indices where duplicates are permitted. 144 4 Value zeros. 148 4 Offset of first Key Information record. 152 4 Value zeros. 156 4 Offset of the Free Space record for the data file. For fixed format files, this is a record in the index file of the same format as the index Free Space record, but the addresses point to free records in the data file. For variable format files, this is the address in the data file of the data Free Space record. This record has a different structure to the index Free Space record. 160 4 Value zeros. 164 4 Offset of first Free Space record in index file. 168 4 Value zeros. 172 2 Value zeros. 174 2 Index file record length (node size). 176 8 Value zeros. 184 328 Reserved. Value zeros. For node 840 size 512. For node size 1024. For 3912 node size 4096. Free Space Record. The Free Space record is a record equal to the node size of your file and contains the location of free records in the index file. Continuation records of the same size and structure are created as needed, each pointing to the next. The first Free Space record is pointed to by the File Header record. Free Space record description: Size Description of the field -------------------------------------------- 2 Bit 15 Leading security flag. Value should match value of trailing security flag. Bits Pointer to end of last free 14-0 record address entry, relative to start of this record. 4 Offset of Free Space continuation record. Zero if no further continuation records. 4 Offset of a free record in index file ..... ..... 4 Offset of a free record in index file 2 Bit 15 Security Flag. Value should match value of leading security flag. Bits Reserved. Value x"7F". 14-0 Key Information Record. The Key Information record is a record equal in size to the index node size for your file. It describes the physical characteristics of all the keys used in the indexed file, including the length of each key; where the key is defined within the data record; whether duplicates are permitted, and so on. The File Header record points to the Key Information record. Within the Key Information record structure is a sub-structure, the Key Block. A Key Block is created for each key defined. The first Key Block always describes the prime key. Subsequent Key Blocks define the alternate keys in the order specified when the file was created. If the Key Information record is not big enough to hold Key Blocks for all the keys defined, equal sized continuation records are created, each pointing to the next, until all the keys have been defined. Key Information record description: Size Description of the field -------------------------------------------- 2 Bit 15 Security Flag. Value 0. Bits Pointer to end of last Key 14-0 Block entry in this record relative to start of this record. 4 Address of Key Information continuation record. Zero if no further continuation records. n Key Block for prime key ( . . ) ( . . ) One for each alternate key in file (n Key ) Block 1 Reserved. Value x"FF". 1 Reserved. Value x"7E". Key Block description: Size Description of the field ------------------------------------------- 2 Length of this entry in bytes. 4 Address of the root Index Node record for this key 1 Key compression. Bit 2 Compression of trailing spaces Bit 1 Compression of leading characters Bit 0 Compression of duplicates 5 Key-Component Block (. . . ) If key is split, one block (. . . ) per component ( 5 Key-Component Block ) Key-Component Block description: Size Description of the field -------------------------------------------- 2 Bit 15 Duplicates permitted flag. If set, duplicates are permitted. Bits Length of component in bytes. 14-0 2 Offset of component within data record, starting at 0. 1 Component type. Value zeros. Index Node Record. For each key defined, a complete and independent index is constructed. It consists of a tree of Index Node records, each record being the size of your index node and containing actual key-values associated with data records written to the indexed file. Every key-value in a node will point either to a subordinate Index Node record or, if it is a leaf node, to the data record associated with the key. The top level node is called the root. The default node size is 1024 bytes, but can change depending on the largest key size defined for the file. If the largest key is greater than 238 bytes, the node size will be 4096 bytes. It is possible to change the node size by setting theXFHNODE environment variable on DOS, Windows and OS/2 systems or the isam_block_size run-time tunable on UNIX systems to one of 512, 1024 (the default) or 4096 bytes.
NOTE If the largest Key-value is greater than 120 bytes, the value of XFHNODE or isam_block_size will be overwritten by 1024. Also, if the largest Key-value is greater than 248, the node size will automatically be 4096.
Index Node record description: Size Description of the field -------------------------------------------- 2 Bit 15 Security Flag. Value should match value of the trailing security flag. Bits Pointer to end of last 14-0 Key-Value Block in this record, relative to the start of this record. n Key-Value Block . . . . n Key-Value Block 1 Index number. The value is the same for all nodes belonging to the same index tree. Contains zero if prime key. 1 Bit 7 Security flag. Value should match value of the leading security flag. Bits Level of this node. Leaf 6-0 nodes are level 0. Key-Value Block description: Size Description of the field -------------------------------------------- 1/2 Optional. Compression character count. This field is present only if compression is enabled for this key. It contains a count of the number of characters (leading and/or trailing) that have been suppressed. If both leading and trailing suppression is enabled, this field is two bytes in length. n Key-value. 2 Optional. Duplicate occurrence number. This field is present only if duplicates are allowed for this key. It contains the duplicate occurrence count. The first key stored that is a duplicate has this field set to 1. Second duplicate has this field set to 2, and so on. 4 Bit 31 Reserved. Set if the next Key-value block is a duplicate of this one and duplicate compression is enabled. Bits Address of the data record in 30-0 the data file if this is a leaf node; otherwise, the address of the subordinate Index Node record in the index file. Data File Structure. The data file of an indexed file is a variable format sequential file. The structure of such a file is described earlier in this chapter. This file contains all the data records. It can be processed as a sequential file by defining the file as ORGANIZATION SEQUENTIAL and adding a RECORDING MODE IS V clause to the otherwise unchanged FD. The file can then be opened and read sequentially. Since the data file is not ordered in any particular way, the records read should not be expected in a consistent order. Information about free records in the data file is maintained so that space created by deleting records can be re-used, preventing the file from growing too quickly. In a fixed format indexed file, this information is held in a Free Space record in the index file. This record has the same structure as the index Free Space record, except the addresses point to data file records. In a variable format file the information is held in a system record in the data file. In a variable structure data file, all record slots are a multiple of 4 bytes. For each slot length in a variable format file, a chain is maintained for all slots of that length that are free. The start of the chain for all lengths is maintained in the Data Free Space record in the data file. This system record is always the same length as the maximum slot length defined for the file. Each free slot pointed to contains the address of the next free slot of the same length in the first four bytes after the header record. The last slot in the chain contains an address of zero. Data Free Space record description: Offset Size Description of the field ------------------------------------------- 0 2/4 Header of the record. 2/4 4 Offset of the first free data slot of length 8 bytes. 6/8 4 = Offset of the first free data slot of length 12 bytes. . . ... . . ... n 4 Offset of the first free data slot of maximum length.


MPE/iX 5.0 Documentation