Byte-Stream Emulation [ COMMUNICATOR 3000 MPE/iX Release 5.0 (Core Software Release X.50.20) ] MPE/iX Communicators
COMMUNICATOR 3000 MPE/iX Release 5.0 (Core Software Release X.50.20)
Byte-Stream Emulation
by Steve Elmer
Commercial Systems Division
To facilitate transparent file sharing, the file system supports several
emulators. The traditional MPE fixed and variable record formats can be
emulated as the byte-stream record format. In addition, the byte-stream
record format can be emulated as the variable record format. File
sharing on MPE/iX is transparent since the file system automatically
binds in the appropriate emulator when necessary.
What is Emulation?
According to Webster, to emulate is "to try to equal or surpass." In this
context, we are attempting to make native file formats equally accessible
to non-native accessors. In other words, we would like the applications
which only understand the byte-stream format to be able to operate on MPE
record format files and vice-versa.
To understand the emulators, it is essential to understand the underlying
internal layout for each record format. To do so, one must view each
file as a simple array of bytes. The fixed record format imposes a
simple structure on its array of bytes--every n bytes comprises a single
record and no extra bytes are evident. The byte-stream record format
imposes no structure whatsoever on its array of bytes; however,
convention dictates that the linefeed character delimits records and that
records can be of any length. The variable record format is the most
complex because each record contains from 2 to 4 bytes of descriptive
information and/or dead space. Furthermore, the variable record format
imposes a second level structure on blocks which causes more bytes to be
used for description and/or dead space.
When emulating one record format to another, the idiosyncracies of the
native format must be transformed into the idiosyncracies of the emulated
format. To understand this better, the concept of a virtual file is
useful. The virtual file is the file that would result if the source
file were actually transformed into the target format. For example, to
translate from byte-stream to variable, all the bytes between linefeed
characters must be fit into records inside blocks in the variable record
file. The resulting file would have record length descriptors, pad
bytes, and block terminators appropriate to variable record files. If
there were an editor which could handle both record formats, the files
would look identical when viewed through that editor.
Unfortunately, we can't simply translate the source file into a target
file and claim to have done our job. To do so would imply that there
were actually multiple copies of the file on the system, each with a
different record format. Clearly, this is too unwieldy and doesn't
really solve the problem. For this reason, the virtual file is simply a
view imposed upon the source file, without changing its internal layout
in any way. Thus, when emulating a fixed record format file to
byte-stream, a read() request returns a record delimited with a linefeed
even though the source file contains no linefeed characters.
The concept of the virtual file encompasses more than just the data
layouts within files. Also included are such attributes as record size,
blocking factor, EOF offset, and file limit. Thus, a fixed record file
with an EOF of 3 records could appear to have 213 bytes when emulated as
byte-stream. FFILEINFO must report the EOF value as 213 for emulated
byte-stream accessors, just as it would for a byte-stream file with 3
records and 213 characters. In a similar vein, the FPOINT and FSPACE
intrinsics must position the data pointer within the virtual file by
finding the corresponding position in the source file.
\ \ \ Important Details \ Please Read Did you infer that FLABELINFO only
returns information from the view
of the native record format of the
file? The reason FLABELINFO can't
pretend to see the file in
non-native formats is that it
doesn't have any way to know that
the named file is being emulated
somewhere. Emulation is bound in
at open time for a particular
file-descriptor.
Caveats
At this point, one might be thinking "this sounds great, any file can be
perfectly viewed as any record format!" Alas, it is not so. Both the
fixed and the variable record formats have distinctive features which
prevent perfect emulation:
* The maximum record size limit interferes with attempts to write
large byte-stream records into files with fixed and variable
record formats.
* Appending data to the file alternately from native and emulated
views cause spurious linefeeds from the emulator's point of view.
* For reasons detailed below, writing to the middle of the
virtual file can cause unpredictable results (look under the
"Idiosyncracies" heading).
When the byte-stream view is used to write records larger than the
maximum, the record is broken down into multiple smaller records. Later,
when the file is viewed again, spurious linefeed characters will have
appeared at the sub-record boundaries.
One feature of the byte-stream view is that records aren't terminated
until the linefeed character is explicitly written to the file. Thus,
the following sequence of writes result in the file shown:
write('abc'); write('def'); write('ghi\n');
"abcdefghi\n"
When the underlying file is either fixed record or variable record, the
behavior is not so simple. If a native write were to be inserted between
the second and third writes shown above, the file contents is handled as
though the linefeed occurred after the "f":
write('abc'); write('def');
FWRITE('MPE TEXT');
write('ghi\n');
"abcdef\nMPE TEXT\nghi\n"
This handling is consistent from the MPE view since MPE applications only
deal in complete records. The emulators implicitly cause the preceding
partial record to be considered a complete record.
Transparent Binding
The file system has some built-in rules about when to use an emulator
view versus when to use the native view:
* The byte-stream view is the default for POSIX applications since
the POSIX C library open() function always requests this view on
the application's behalf.
* The variable record view for native byte-stream files is the
default for all applications other than POSIX applications. This
is the first instance of MPE/iX binding an emulator by default
rather than the native view. This was done to allow maximum
co-existence with traditional MPE applications.
* The byte-stream view may be specifically requested using option 77
of HPFOPEN.
* All other access methods use the native view by default.
Idiosyncracies
Fixed to Byte-Stream.
The native MPE access methods add fill characters to each record written
to round them out to the record size. These characters are required to
reside in the file's data since a byte's file offset determines which
record it is in.
The fill characters in the fixed format file are not a part of the
virtual byte-stream file. Therefore, the fill characters must be
stripped from each record and a linefeed added before presenting the data
back to the byte-stream view. Unfortunately, for records preceding the
EOF record, we cannot tell the difference between fill characters added
by the native view and the same character written by the emulated view.
Therefore, when the emulator writes a record with trailing fill
characters, they do not appear in the resulting virtual file.
Consider also that record boundaries in the native file determine the
placement of linefeed characters in the virtual file. Therefore,
emulated writes must also insert fill characters into the file data so
that the virtual file has linefeed characters in the appropriate places.
Now for the zinger! Fixed record files allow writes to the middle of the
file without changing the EOF. This can cause a record to have a
different number of fill characters after the write than it had before.
The net result is that all subsequent characters have different offsets
in the virtual file! This is a rather nasty consequence of the emulation
which is impossible to predict from within the byte-stream view. Sorry,
we haven't found any way to nullify this effect.
Variable to Byte-Stream.
Writing to the middle of a variable record file causes the EOF to be cut
back to the end of that record. This is a feature imposed by the
variable record format because preserving data down-stream from this
write requires prohibitive overhead. Overlaying the data in the middle
of the file can have down-stream effects all the way to the end of the
file.
Byte-Stream to Variable.
Since a true byte-stream file is just an array of bytes, each record can
be as large as desired. The entire file can be just one record! When
emulating such a file back as a variable record file, a maximum record
size must be chosen. The problem with this is that the maximum record
size is not known. Our solution was to return a maximum record size of
8192, which we hope is larger than most files' largest records. Our
objective was to optimize access while causing the greatest number of
traditional MPE applications to perform correctly with no modifications.
A file with records larger than approximately 8192 appears to be
truncated in this view.
Parting Words
Although the emulators have some difficult corner cases to deal with, in
practice none of the drawbacks occur frequently. Most applications
either don't write huge records, or the effects of added linefeeds are
negligible. Many applications have no need to write to the middle of a
file, typically the entire file is rewritten.
In fact, it was quite a thrill to bring CATALOG.PUB.SYS up into the
shell's vi editor and be able to do paging, searches and gotos without
any mis-steps. The existence of the emulators enable traditional MPE
applications to smoothly interoperate with POSIX applications with little
or no recoding required. You will see many benefits of our effort to
integrate POSIX applications smoothly into the MPE world.
MPE/iX Communicators