Structure of Each File Organization [ Micro Focus COBOL System Reference, Volume 2 ] MPE/iX 5.0 Documentation
Micro Focus COBOL System Reference, Volume 2
Structure of Each File Organization
The following sections describe the physical structure of the four data
file organizations.
Sequential Organization
Sequential files are intended to cater for binary data. These files
consist of a series of either fixed or variable length records. The
order of records in these files is set by the order of WRITE statements
when the file is created. The record order does not change once it has
been set. New records are added to the end of the file. Each record in
a record sequential file (except the first record) has a unique record
which precedes it, while each record (except the last record) also has a
unique record that follows it.
Sequential files that are fixed length and are not destined for the
printer have no record delimiter; the end of one record is immediately
followed by the beginning of the next.
Print files cannot be read easily by a COBOL program if you want to
recover print lines.
You can request special processing to take place on the output of print
files. To do this, specify the LINE ADVANCING clause in the SELECT
statement. This is performed automatically if the ASSIGN TO PRINTER
clause is used. This causes:
* Trailing spaces to be discarded in output records
* Each print record to be terminated by a delimiter which is
operating system dependent. See the sections detailing operating
environment specific information below.
* The OPEN statement to add a delimiter which is operating system
dependent. See the sections detailing operating environment
specific information below.
* Each WRITE statement without any BEFORE or AFTER clause to behave
as if you had specified the AFTER 1 clause.
We recommend that you use either the LINE ADVANCING option for all files
which you intend to print or, alternatively, specify either the BEFORE or
AFTER clause in every WRITE statement for that file.
NOTE
* You should never use the BEFORE or AFTER clauses for data
files which you do not intend to print.
* You should not open files destined for the printer for
either INPUT or I/O.
Sequential Organization on DOS, Windows and OS/2 Systems.
If you specify the ASSIGN TO PRINTER clause:
* each print record is terminated by the two-byte delimiter x"0D0A".
* the OPEN statement adds the two-byte delimiter x"0D0A" to the
file.
Each print record has one or more of a line-feed character (x"0D0A"), a
form-feed character (x"0C") or a vertical tab character (x"0B") added
before or after the print record depending on whether you specified the
BEFORE or AFTER clause in the WRITE statement.
Sequential Organization on UNIX Systems.
If you specify the ASSIGN TO PRINTER clause:
* each print record is terminated by the single-byte delimiter
x"0D".
* the OPEN statement adds the single-byte delimiter x"0D" to the
file.
Each print record has one or more of a line-feed character (x"0A"), a
form-feed character (x"0C") or a vertical tab character (x"0B") added
before or after the print record depending on whether you specified the
BEFORE or AFTER clause in the WRITE statement.
Fixed Format Sequential Structure.
In a fixed format sequential file, each record immediately follows the
previous record in the file. Each record is the same length as the
maximum length record.
No additional control characters are written to the file unless you use
WRITE BEFORE or WRITE AFTER ADVANCING statements. In this case, carriage
return and line feed characters are added as required to ensure correct
positioning when the data file is printed. If a file containing such
characters is used as input, you may find that records are not read as
expected.
+--------------------------------------------+
| Fixed length record |
+--------------------------------------------+
| Fixed length record |
+--------------------------------------------+
. .
. .
+--------------------------------------------+
| Fixed length record |
+--------------------------------------------+
Variable Format Sequential Structure.
A variable format sequential file is the simplest form of the variable
structure defined above. Each record written is preceded by a record
header containing the length of the record; the record is written at the
length defined in the program; the file contains a standard variable
structure file header record.
Up to three padding characters can follow a record to ensure that the
next record starts on a four-byte boundary.
+-----------------------------------------------+
| File Header record - 128 bytes |
| |
+--------+-----------------------------+---+----+
| Header | Variable length record | |
+--------+-----------------------------+---+--------+---+
| Header | Variable length record | |
+--------+------------------------------------------+---+
. . . .
. . . .
+--------+--------------------------------------+---+
| Header | Variable length record | |
+--------+--------------------------------------+---+
Line Sequential Organization
Line sequential files are ASCII text files such as those produced by
editors and other similar utilities. Because all trailing spaces are
removed from the program's record area when the record is written, the
records are of variable length. Hence, the length of the record actually
written to file is not determined by the length of the record definition
used in the WRITE statement.
A record delimiter is written after every record. The delimiter
character(s) vary depending on your operating environment. See the
environment specific sections for line sequential files below for further
information.
On input, the delimiter is removed and the record area is padded out with
spaces as necessary.
Inclusion of ADVANCING phrases other than BEFORE 1 causes control
characters in addition to the delimiter to be output.
If a record in the file is longer than the maximum length defined in the
program reading the file, then each access returns "maximum-length"
characters until the end of the record.
Any file that is to be output to a printer can have LINE SEQUENTIAL or
LINE ADVANCING organization. If you specify sequential organization, use
the optional BEFORE or AFTER phrase for every WRITE statement to print
the file successfully.
Characters with a value less than x"20" (space) are written with a
preceding null character (x"00") to show that they are text characters
rather than formatting characters. On input, the preceding null
characters (x"00") are stripped away. You can prevent this insertion of
null characters when writing the file by using the -N run-time switch on
the command line at run time. You can also switch off insertion of null
characters for individual line sequential files by using a call to
Function 47 of routine x"91", the interprogram function call.
If you use the -N run-time switch to read a file written with the +N
run-time switch, these null characters are not stripped away. We
recommend that if you store non-ASCII characters in line sequential files
you ensure that the N run-time switch is on.
Any tab characters in a line sequential file (x"09") are expanded to
every eighth character position during a READ (that is, the character
following a tab will lie in one of the columns 9, 17, 25, 33, and so on).
You can compress space characters to tabs during output by using the +T
run-time switch on the command line or by using a call to Function 49 of
routine x "91".
See the chapter Library Routines (Call-by-Number) for more information
about the x"91" routine. See the chapter Running and the appendix
Descriptions of Run-Time Switches for more information about the N and T
run-time switches.
The following sections give environment specific information for line
sequential files.
Line Sequential Files on DOS, Windows and OS/2 Systems.
The record delimiter x"0D0A" is used for DOS, Windows and OS/2 systems.
Any single byte x"1A" (user terminate run code) is used as an
unconditional file terminator (except when preceded by a null character,
as described below). If no x"1A" character is encountered, the physical
end of the file serves as the file terminator.
When the file is closed, a terminating x"1A" character is NOT written.
Instead, the length of the file is used to determine where it ends.
If the next two characters after such an access are x"0D0A", a blank line
is not returned on the next access. Only the x"0D" acts as a record
delimiter. Additional device control characters (such as x"0A", x"0B",
x"0C") are discarded. x"1A" acts as a record delimiter and also denotes
the end of the file.
If you turn the N run-time switch off, you must make sure that any COMP
data does not contain bytes with a value of x"1A" (end-of-file character)
or x"0D" (record delimiter).
Tab characters can be correctly interpreted by your personal computer's
graphics printer.
Line Sequential Files on UNIX Systems.
The record delimiter on UNIX systems is a single byte x"0A" (the
default). However, for line sequential and relative files only, this
default record delimiter can be changed to that used by OS/2.
If you turn the N run-time switch off, you must make sure that any COMP
data does not contain bytes with a value of x"0A" (record delimiter).
Line Sequential Structure.
+----------------------------------------------+---------+
| Variable length record |delimiter|
+---------------------------------+---------+--+---------+
| Variable length record |delimiter|
+---------------------------------+---------+--+---------+
| Variable length record |delimiter|
+----------------------------------------------+---------+
. . .
. . .
. . .
+----------------------------------------------+---------+
| Variable length record |delimiter|
+----------------------------------------------+---------+
Relative Organization
Relative file organization allows you to access any record randomly by
specifying its ordinal position within the file. Data held in relative
files can consist of fixed or variable format records which are of fixed
length, the length being the length of the longest record defined for the
file. This is necessary so that the COBOL file handling routines can
quickly calculate the physical location of any record given its record
number within the file.
Each record is uniquely identified by a record number. The first record
in the file is record number one, the second record is number two, and so
on.
Each record is followed by a record marker unless it is a variable length
file which indicates the current state of the record. In a variable
format file, the marker follows the fixed length slot. The marker varies
depending on your environment. See the environment specific information
sections for relative files below for further information.
When you delete a record from a relative file, the only action is to
change that record's marker. However, the contents of a deleted record
physically remain in the file until a new record is written. If, for
security reasons, you want to make sure that the data does not exist in
the file, then you must overwrite the record using the REWRITE statement
before you delete it.
A fixed format relative file can be processed as a fixed format
sequential organization file by defining the maximum record length to be
larger than that for the relative file (see the sections on operating
environment specific information for details). A variable format
relative file cannot be processed as a sequential organization file.
The length of a relative file is determined by the largest record number
used when actually writing a record to the file.
Relative File Organization on DOS, Windows and OS/2 Systems.
On DOS, Windows and OS/2 systems, the current state of the record is
indicated by a two-byte marker as follows:
Marker (hex) Description
-------------------------------------------------------
0D0A Record present
0D00 Record deleted or never written.
A fixed format relative file can be processed as a fixed format
sequential file by defining the maximum record length to be two
characters larger than that for the relative file.
The size of a relative file on DOS, Windows and OS/2 systems is
calculated as follows.
Fixed format:
(max-rec-len + 2) * largest-record-number
Variable format:
128 + (max-rec-len + 2 + header) * largest-record-number
where header is 2 if max-rec-len is less than 4096, otherwise header is
4.
Relative File Organization on UNIX Systems.
On UNIX systems, the current state of a record for fixed length relative
records is indicated by a one-byte marker as follows:
Marker (hex) Description
-------------------------------------------------------
0A Record present
00 Record deleted or never written
The current state of a record for variable length relative records is
indicated by a two-byte marker as follows:
Marker (hex) Description
-------------------------------------------------------
0D0A Record present
0D00 Record deleted or never written
A fixed format relative file can be processed as a fixed format
sequential file by defining the maximum record length to be one character
larger than that for the relative file.
The size of a relative file on UNIX systems is calculated as follows.
Fixed format:
(max-rec-len + 1) * largest-record-number
Variable format:
128 + (max-rec-len + 2 + header) * largest-record-number
where header is 2 if max-rec-len is less than 4096, otherwise header is
4.
Fixed Format Relative Structure.
A fixed format relative file is the same as a fixed format sequential
file, except each record is followed by a record marker.
+-------------------------------------------+------+
| Fixed length record - Record 1 |marker|
+-------------------------------------------+------+
| Fixed length record - Record 2 |marker|
+-------------------------------------------+------+
. . .
. . .
+-------------------------------------------+------+
| Fixed length record - Record i deleted |marker|
+-------------------------------------------+------+
. . .
. . .
+-------------------------------------------+------+
| Fixed length record - Record j - unused |marker|
+-------------------------------------------+------+
. . .
. . .
+-------------------------------------------+------+
| Fixed length record - Record n |marker|
+-------------------------------------------+------+
For relative files in random access, writing records 1, 2 and 9 will
occupy the same disk space as creating a file containing records 1, 2 and
3 on UNIX systems.(UNIX)
Variable Format Relative Structure.
A variable format relative file follows the basic variable structure
defined earlier in this appendix. However, each record is placed into a
fixed length slot, the length of the slot being the length of the longest
record defined, together with the header and terminator characters. The
record header for each record contains the length of the logical record
written, not the length of the physical fixed length slot. Each slot is
followed by a two-byte record marker.
+----------------------------------------------------+
| File Header record - 128 bytes |
| |
+-------+--------------------------------+------+----+
|Header |Variable length record-Record 1 | pad |0D0A|
+-------+--------------------------------+------+----+
|Header |Variable length record-Record 2 |0D0A|
+-------+--------------------------------+------+----+
|Header |Variable length record-Record 3 | pad |0D0A|
+-------+--------------------------------+------+----+
. . . .
. . . .
+-------+---------------------------------------+----+
|Header |Variable length record-Record i delete|0D00|
+-------+---------------------------------------+----+
. . . .
. . . .
+-------+---------------------------------------+----+
|Header |Variable length record-Record j unused |0D00|
+-------+---------------------------------------+----+
. . . .
. . . .
+-------+-------------------------------+-------+----+
|Header |Variable length record-Record n| pad |0D0A|
+-------+-------------------------------+-------+----+
Indexed Organization
Indexed files consist of a series of fixed or variable length records.
An indexed file is implemented as two separate files; the data file and
the key file. Variable length records are handled by the variable length
file handler supplied with this COBOL system.
For all file formats other than C-ISAM, both the data and index files are
of the variable structure defined in the section Variable Format
Sequential Structure earlier in this chapter. See the section Indexed
Organization on UNIX Systems for details of C-ISAM and fixed length
record.
When you name the file, the name is given to the data file; the name of
the associated index file is produced by adding a .idx extension to the
data file name.
For example:
Data file Index file
-------------------------------------------------
myfile myfile.idx
clock.fle clock.fle.idx(UNIX)
clock.fle clock.idx (DOS, Windows and OS/2)
You should avoid using the .idx extension in other contexts.
The index is built up as an inverted tree structure that grows in height
as records are added. The number of key file accesses required to locate
a randomly selected record depends primarily on the number of records in
the file and the key-length.
File I/O is faster when reading the file sequentially, but only if other
indexed sequential operations do not intervene.
We strongly recommend that you take regular backups of all file types.
There are, however, situations with indexed files (for example, media
corruption) that can lead to only one of the two files becoming unusable.
If the index file is lost in this way, you can recover data records from
just the data file (although not in key sequence) and, therefore, reduce
the time lost due to a failure.
You can recover a corrupt indexed file using a utility which rebuilds the
index of the indexed file. The utility you use is operating environment
dependent and is referred to in each of the sections covering the
different operating systems below.
Indexed Organization on DOS, Windows and OS/2 Systems.
You can recover a corrupt indexed file using the Rebuild utility. See
the chapter Rebuild for details of this utility.
Indexed Organization on UNIX Systems.
If you are using C-ISAM, the C-ISAM file handler handles all fixed length
indexed records. The data files are in the relative format described
earlier in this chapter. If the C-ISAM file handler is not the default
one supplied with this system, or you have substituted your own file
handler for the default as described in an add-on product, the format of
the data file is dependent on file handler you are using. See your
Release Notes for details of the default file handlers supplied with this
COBOL system.
It is possible to use an environment variable to specify that the index
and data files should appear in separate directories. See the section
The "&" Character in Environment Variables in the chapter External
File-name Mapping (DD_) for further information.
To recover an index from the data file when the indexed file has become
corrupt, all unused data records are marked as deleted by adding x"00" to
each record that contains LOW-VALUES. For existing records, the records
are marked with the character x"0A".
The recovery operation can, therefore, be performed with a simple COBOL
program by defining the data file as ORGANIZATION RELATIVE ACCESS
SEQUENTIAL. The records are then read sequentially, the data moved from
the relative file record area into the indexed record area and written to
a new version of the indexed file.
Those records with LOW-VALUES in the last (extra) byte is discarded.
Note that this byte (containing a line feed x"0A" in a required record)
is not written to the indexed file on recovery, because of the record
length discrepancy of one byte in the record definitions.
You can also rebuild a corrupt index file using the fhrebuild utility.
See the chapter File Handler Utilities for details of how to do this.
Index File Structure.
On all operating systems, an index file can have several keys. For each
key defined, the index file contains an independent index, structured as
a B-Tree. A leaf node in an index contains a list of key-values in
ascending order, each of which points to the data record (in the data
file) to which it belongs. A non-leaf node contains a list of key-values
in ascending order which points to a subordinate node in which the
key-value is the largest.
This index structure provides the fastest possible random access to a
data record using any key, as well as efficient processing of data
records in sequential key order.
The records in an index file are always the same length, whether they are
nodes, header records or key information records. The size of the
records is determined at the time the file is created and cannot
subsequently be changed. The size used is configurable at the time the
file is created. See the section Index Node Record later in this chapter
for further details on how to do this.
The index file starts with the index File Header Record, which contains
information about the file. It points to the Free Space record, which is
used to maintain a list of free records in the index file. The index
File Header record also points to the Key Information record, which
contains details of every key defined for the file, and, for each key,
points to the root Index Node record of the associated index. Each of
these records is described in the following sections.
Index File Header Record.
The File Header record is located at offset 0 within the index file. The
first 128 bytes are the same as a standard variable structure file header
record, except for the fields below.
Index File Header record description:
Offset Size Description of the field
-------------------------------------------------
0 4 Length of the file header.
39 1 Organization of the file. Always
contains value 2 (for Indexed
organization).
62 14 Always contains zeros.
76 1 Reserved. Set to 4.
124 4 Offset of logical end of the index
file.
The remainder of the index file header record contains the following
fields:
Offset Size Description of the field
-------------------------------------------------
132 4 Offset of logical end of data file.
136 1 Value 2.
137 1 Value 2.
138 1 Value 4.
139 1 Value 4.
140 2 Contains the number of keys defined
for the file.
142 1 Value 0, or 1 for IDXFORMAT4 files.
Table I-0. (cont.)
Offset Size Description of the field
143 1 Value 2 or 4. Number of bytes used
for occurrence numbers in indices
where duplicates are permitted.
144 4 Value zeros.
148 4 Offset of first Key Information
record.
152 4 Value zeros.
156 4 Offset of the Free Space record for
the data file.
For fixed format files, this is a
record in the index file of the same
format as the index Free Space
record, but the addresses point to
free records in the data file. For
variable format files, this is the
address in the data file of the data
Free Space record. This record has
a different structure to the index
Free Space record.
160 4 Value zeros.
164 4 Offset of first Free Space record in
index file.
168 4 Value zeros.
172 2 Value zeros.
174 2 Index file record length (node
size).
176 8 Value zeros.
184 328 Reserved. Value zeros. For node
840 size 512. For node size 1024. For
3912 node size 4096.
Free Space Record.
The Free Space record is a record equal to the node size of your file and
contains the location of free records in the index file. Continuation
records of the same size and structure are created as needed, each
pointing to the next.
The first Free Space record is pointed to by the File Header record.
Free Space record description:
Size Description of the field
--------------------------------------------
2 Bit 15 Leading security flag. Value
should match value of trailing
security flag.
Bits Pointer to end of last free
14-0 record address entry, relative
to start of this record.
4 Offset of Free Space continuation
record. Zero if no further
continuation records.
4 Offset of a free record in index file
.....
.....
4 Offset of a free record in index file
2 Bit 15 Security Flag. Value should
match value of leading
security flag.
Bits Reserved. Value x"7F".
14-0
Key Information Record.
The Key Information record is a record equal in size to the index node
size for your file. It describes the physical characteristics of all the
keys used in the indexed file, including the length of each key; where
the key is defined within the data record; whether duplicates are
permitted, and so on. The File Header record points to the Key
Information record.
Within the Key Information record structure is a sub-structure, the Key
Block. A Key Block is created for each key defined. The first Key Block
always describes the prime key. Subsequent Key Blocks define the
alternate keys in the order specified when the file was created.
If the Key Information record is not big enough to hold Key Blocks for
all the keys defined, equal sized continuation records are created, each
pointing to the next, until all the keys have been defined.
Key Information record description:
Size Description of the field
--------------------------------------------
2 Bit 15 Security Flag. Value 0.
Bits Pointer to end of last Key
14-0 Block entry in this record
relative to start of this
record.
4 Address of Key Information
continuation record. Zero if no
further continuation records.
n Key Block for prime key
( . . )
( . . ) One for each alternate key
in file
(n Key )
Block
1 Reserved. Value x"FF".
1 Reserved. Value x"7E".
Key Block description:
Size Description of the field
-------------------------------------------
2 Length of this entry in bytes.
4 Address of the root Index Node
record for this key
1 Key compression.
Bit 2 Compression of trailing spaces
Bit 1 Compression of leading
characters
Bit 0 Compression of duplicates
5 Key-Component Block
(. . . ) If key is split, one block
(. . . ) per component
( 5 Key-Component Block )
Key-Component Block description:
Size Description of the field
--------------------------------------------
2 Bit 15 Duplicates permitted flag. If
set, duplicates are permitted.
Bits Length of component in bytes.
14-0
2 Offset of component within data
record, starting at 0.
1 Component type. Value zeros.
Index Node Record.
For each key defined, a complete and independent index is constructed.
It consists of a tree of Index Node records, each record being the size
of your index node and containing actual key-values associated with data
records written to the indexed file. Every key-value in a node will
point either to a subordinate Index Node record or, if it is a leaf node,
to the data record associated with the key. The top level node is called
the root.
The default node size is 1024 bytes, but can change depending on the
largest key size defined for the file. If the largest key is greater
than 238 bytes, the node size will be 4096 bytes. It is possible to
change the node size by setting theXFHNODE environment variable on DOS,
Windows and OS/2 systems or the isam_block_size run-time tunable on UNIX
systems to one of 512, 1024 (the default) or 4096 bytes.
NOTE If the largest Key-value is greater than 120 bytes, the value of
XFHNODE or isam_block_size will be overwritten by 1024. Also, if
the largest Key-value is greater than 248, the node size will
automatically be 4096.
Index Node record description:
Size Description of the field
--------------------------------------------
2 Bit 15 Security Flag. Value should
match value of the trailing
security flag.
Bits Pointer to end of last
14-0 Key-Value Block in this
record, relative to the start
of this record.
n Key-Value Block
. .
. .
n Key-Value Block
1 Index number.
The value is the same for all nodes
belonging to the same index tree.
Contains zero if prime key.
1 Bit 7 Security flag. Value should
match value of the leading
security flag.
Bits Level of this node. Leaf
6-0 nodes are level 0.
Key-Value Block description:
Size Description of the field
--------------------------------------------
1/2 Optional. Compression character
count.
This field is present only if
compression is enabled for this key.
It contains a count of the number of
characters (leading and/or trailing)
that have been suppressed. If both
leading and trailing suppression is
enabled, this field is two bytes in
length.
n Key-value.
2 Optional. Duplicate occurrence
number.
This field is present only if
duplicates are allowed for this key.
It contains the duplicate occurrence
count. The first key stored that is
a duplicate has this field set to 1.
Second duplicate has this field set
to 2, and so on.
4 Bit 31 Reserved. Set if the next
Key-value block is a duplicate
of this one and duplicate
compression is enabled.
Bits Address of the data record in
30-0 the data file if this is a
leaf node; otherwise, the
address of the subordinate
Index Node record in the index
file.
Data File Structure.
The data file of an indexed file is a variable format sequential file.
The structure of such a file is described earlier in this chapter. This
file contains all the data records. It can be processed as a sequential
file by defining the file as ORGANIZATION SEQUENTIAL and adding a
RECORDING MODE IS V clause to the otherwise unchanged FD. The file can
then be opened and read sequentially. Since the data file is not ordered
in any particular way, the records read should not be expected in a
consistent order.
Information about free records in the data file is maintained so that
space created by deleting records can be re-used, preventing the file
from growing too quickly. In a fixed format indexed file, this
information is held in a Free Space record in the index file. This
record has the same structure as the index Free Space record, except the
addresses point to data file records. In a variable format file the
information is held in a system record in the data file.
In a variable structure data file, all record slots are a multiple of 4
bytes. For each slot length in a variable format file, a chain is
maintained for all slots of that length that are free. The start of the
chain for all lengths is maintained in the Data Free Space record in the
data file. This system record is always the same length as the maximum
slot length defined for the file.
Each free slot pointed to contains the address of the next free slot of
the same length in the first four bytes after the header record. The
last slot in the chain contains an address of zero.
Data Free Space record description:
Offset Size Description of the field
-------------------------------------------
0 2/4 Header of the record.
2/4 4 Offset of the first free data
slot of length 8 bytes.
6/8 4 = Offset of the first free
data slot of length 12 bytes.
. . ...
. . ...
n 4 Offset of the first free data
slot of maximum length.
MPE/iX 5.0 Documentation