DATA TYPES CONVERSION Programmer's Guide: HP 3000 MPE/iX Computer Systems > Chapter 2 Formatting Data TypesRecognizing Primitive Data Types |
|
Data is an abstraction of information. Data must be structured in a form that the computer is designed to process; data conversion is the translation of information to a form acceptable to the computer. The 900 Series HP 3000 Computer Systems instruction set is designed to operate on certain fundamental data types. The following data types are recognized by MPE XL and its subsystems:
Each data type requires a specific bit format. In this manual, bit fields are described as (bit:length), where bit is the first bit in the field and length is the number of consecutive bits in the field. For example, "bits (13:3)" refers to bits 13, 14, and 15. Bit 0 is the most significant bit. Character code formats are primitive data types. Characters are the letters, numbers, and symbols on your keyboard. The computer relates each alphanumeric character to an 8-bit (one byte) binary number, according to a correspondence code. Some of the characters are easily displayable, like +, ?, 8, and z; some are not, like a blank space or the carriage return. MPE supports the two common American English character codes: ASCII (American Standard Code for Information Interchange and EBCDIC (Extended Binary Coded Decimal Interchange Code). Several natural language types are also supported. See Appendix A for ASCII and EBCDIC codes and equivalents. Character data types are useful for storing strings of symbols like names, addresses, or identification numbers, and for reading the keyboard or writing to the screen. Remember, variables saved as data type character are recognized by the computer as symbols, not as numeric values. MPE and its subsystems use ASCII data type to represent character data. ASCII is the format adopted by ANSI, the American National Standards Institute. Most MPE interfaces use ASCII to accept or return character data. Appendix A shows the ASCII and EBCDIC character code values, along with their decimal, octal, and hexadecimal equivalents. ASCII is used in this guide as the name of a data type. ASCII data type corresponds to the ASCII character code format. The codes for byte values in the range 0 to 127 conform to the ASCII standard format. Byte values in the range 128 to 255 are interpreted using Hewlett-Packard's extended ROMAN8 character set. MPE XL and its subsystems use values in this range to support extended (8-bit) character sets. Figure 2-1 “Bit Format: ASCII Character” shows the ASCII data type bit format. EBCDIC is another coding format widely used in the computer industry for character data. Like ASCII, it is based on the byte. EBCDIC is used in this guide as the name of a data type. EBCDIC data type corresponds to EBCDIC character code format for byte values in the range 0 to 255. Appendix A shows the ASCII and EBCDIC character code values, along with their decimal, octal, and hexadecimal equivalents. Figure 2-2 “Bit Format: EBCDIC Character” shows the bit format for EBCDIC data type. MPE XL subsystems support three primitive data types for numbers:
An integer is any positive or negative whole number, including zero. Integers are useful for counting and for incrementing in loops. Signed integers are a useful form for exchanging numeric data between languages. MPE XL integers can be 8, 16, 32, or 64 bits long. They can be unsigned or signed (+ or -). Signed integers are represented in twos complement form. Table 2-1 MPE XL Integer Types
The chart below shows the representation of the whole number (base-ten) 73 as an unsigned integer, a signed positive number, and a signed negative number.
Unsigned integers are stored in the computer in their base-two form. If you are reading or writing unsigned integers in a language, the compiler converts for you, according to the formatting conventions of the individual language. An unsigned n-bit number can represent any value from 0 to 2n-1. Reading an Unsigned Integer: One method of reading an unsigned integer as a base-ten value is to consider the bits as columns whose values are powers of two. The rightmost (least significant) bit is the units column and has a weight of 20, or 1. Going toward the left (the most significant bit), the columns have progressively greater weight: 20, 21, 22, . . . 2n-1. The decimal-based value of unsigned binary numbers is computed by multiplying the value in each column by the weight of the column, and then adding all the results. An unsigned integer represented with ones in the 20, 23 , and 26 columns and zeros in all the other columns would be computed as follows: 1*(20) + 1*( 23) + 1*(26) = 73. Writing an Unsigned Integer: One method of manually determining the unsigned integer representation of a base-ten value is to use successive subtraction. For example, the largest power of 2 that is less than or equal to the value of decimal-base 73 is 26, or 64. Subtracting 64 from 73 leaves a remainder of 9. The largest power of 2 that is less or equal than 9 is 23, or 8. Subtracting 8 from 9 leaves a remainder of 1. The only power of 2 that is less than or equal to 1 is 20, or 1. This leaves a remainder of 0, so the computation is finished. Thus, 73 is represented in binary with a 1 in the 20, the 23, and the 26 columns and a zero in all the others. Signed integers are stored in the computer in twos complement form. If you are reading or writing signed integers in a language, the compiler converts for you, according to the formatting conventions of the individual language. A signed n-bit integer in twos complement form can represent any value from -(2n-1) to +2n-1-1. When the n-bit positive integer i is added to its n-bit integer negative (complement), -i, and both are in twos complement form, the result is always an n-bit zero. Reading a Signed Integer: The computer represents both positive and negative numbers in twos complement form much the same way that it would represent an unsigned integer: beginning at the rightmost (least significant bit) and going toward the left, the columns have progressively greater weight: 20, 21, 22, ...2n-1. The only difference is that the most significant bit of a twos complement number is negative. That is, it has a weight of -(2n-1). To manually convert a signed integer in twos complement form to a base-ten integer, you can use the column method explained in Unsigned Integers, above. However, you give the leftmost column of a twos complement number a weight of -(2n-1). In the example below, this method is used to interpret the signed binary integers 01010101 and 10101010, written in twos complement form, as decimal-based integers:
Writing a Signed Integer: Converting a signed base-ten number to twos complement form is not difficult. You can represent the positive signed integers just as explained in Unsigned Integers, above. You can represent a negative integer quickly and easily using the following technique, which takes advantage of the properties of binary numbers: First, ignoring the sign, represent the value as an unsigned binary integer. Next, reverse all the 0s and 1s. Finally, add 1 to the result. Thus, the twos complement of 10101010 is (01010101 + 1), or 01010110. You can check your conversion by adding the positive and negative numbers (in twos complement form) to see if they total zero. From the example above, notice that adding the 8-bit integer 10101010 to its twos complement, 01010110, yields a 9-bit result, 100000000. However, the system defines the result type to be 8-bit integer and recognizes only the 8 zeros, so the result is zero. Figure 2-3 “Bit Format: 32-Bit Integer” shows bit formats for the 32-bit integer type. A real number is a value in the set of zero and the positive or negative rational numbers. Signed integers and fractions are included, although fractions may be approximated. Imaginary and complex numbers are not included in the set of real numbers, although high-level languages may have constructs for storing and working with them The real data type is a useful form for representing very large or small values. Special formats are reserved to represent zero, infinity, and NaN (not a number). Real data type represents real numbers by using a type of floating-point, or scientific, notation. In this notation, you generally express a very large or very small number as a fraction multiplied by a power of the number base. For example, the base-ten number .000025 could be expressed as +.25 * 10 -4 The general floating-point, or scientific notation, form is: SfF * (B ** Se E) where:
You can represent real numbers four ways. You can choose either in IEEE or HP3000 format and use either single-precision or double-precision size. MPE XL recognizes two formats for storing floating-point real numbers: IEEE and HP3000. Programs compiled in NM use IEEE as the default. Programs compiled in CM use HP3000, the MPE XL emulation of the MPE V/E system floating-point format. NM programs accessing HP3000 data must either specify a special compiler option or convert CM data to NM before operations. You can represent single-precision (32-bit) or double-precision (64-bit) real numbers in both IEEE and HP3000 notation. Table 2-2 “Ranges and Accuracies for Floating-Point Real Numbers” shows a summary of the range and accuracy of each. Table 2-2 Ranges and Accuracies for Floating-Point Real Numbers
In MPE XL format, real numbers have three fields:
Different representations of real numbers have the three fields aligned on different boundaries. In all formats, the sign field is the first bit, the mantissa is in normalized form, and the exponent is biased. The sign field, bit (0:1), is 0 if number is positive, 1 if negative. Mantissas are represented in normalized form. That is, the leading one is stripped and binary point is not explicitly expressed. Each expressed mantissa, then, has an implied leading one and binary point. For example, a mantissa represented by 10101010101010101010101 is interpreted as the value 1.10101010101010101010101. The exponents of real numbers are biased. This means that both positive and negative true exponents are represented using only unsigned binary integers. The bias amount, or excess, is the difference between the true exponent and the represented exponent. The negative true exponents correspond to the lower range of the represented exponents. The positive true exponents correspond to the upper range of the represented exponents. The true exponent zero corresponds to the midpoint in the range of the represented exponents. For example, consider an exponent field n bits long where the true exponent is T, the represented exponent is E, and the bias is b. For any real number x, then, xT = xE-b, and xE = xT+b. Exponent fields of all zeros or all ones are reserved. If the exponent of a floating-point number is all zeros and the mantissa is zero, the number is regarded as zero. If the exponent of a floating-point number is all zeros and the mantissa is not all zero, the number is regarded as denormalized. If the exponent of a floating-point number is all ones and the mantissa is zero, the number is regarded as a signed infinity. If the exponent is all ones and the mantissa is not zero, the interpretation is NaN (Not-a-Number, undefined). If any process attempts to operate on an infinity or a NaN, a system trap may occur and data may be corrupted. Invalid operation is signaled when the source is a signaling or a quiet NaN. The result is the destination format's largest finite number with the sign of the source. Any operation that involves a signaling NaN or invalid operation returns a quiet NaN as the result when no trap occurs and a floating-point result is to be delivered. If an operation is using one or two quiet NaNs as input, it signals no exception; however, if a floating-point result is to be delivered, a quiet NaN is returned that is the same as one of the input NaNs. IEEE numbers conform to the format set up by the Institute of Electrical and Electronics Engineers and the American National Standards Institute (std 754-1985). Single-precision numbers are one NM word, aligned on 32-bit boundaries. Double precision numbers are two NM words, aligned on 64-bit boundaries.
IEEE numbers in MPE floating-point notation contain three fields:
A previous section, “Fields of a Real Number”, explains biased exponent and normalized mantissa. Consider converting an IEEE single-precision floating-point number into a base-ten number using this formula: (-1)sign * 2Exponent-127 * (1.0 + Mantissa + 2-23) where:
The (base-ten) floating-point number 100.00 (hexadecimal $42c80000) is represented as 0 10000101 10010000000000000000000. Using the formula, we obtain the correct result as follows: Table 2-3 Determining the Base-Ten Equivalent of an IEEE Real Number
Figure 2-4 “Bit Format: Single-Precision Real in IEEE Floating-Point Notation” shows the bit format for floating-point real numbers in IEEE single-precision format. Figure 2-5 “Bit Format: Double-Precision Real in IEEE Floating-Point Notation”shows the IEEE real number double-precision bit format. Single-precision HP3000 real numbers are 32 bits (2 CM words), and double-precision are 64 bits (4 CM words). When stored in memory, HP3000 reals are aligned on CM word boundaries.
Real numbers in HP3000 floating-point notation contain three fields:
A previous section, “Fields of a Real Number”, explains biased exponent and normalized mantissa.
Figure 2-6 “Bit Format: Single-Precision Real in HP3000 Floating-point Notation HP3000 Real Number Format.” shows the HP3000 real number single-precision bit format Figure 2-6 Bit Format: Single-Precision Real in HP3000 Floating-point Notation HP3000 Real Number Format. Figure 2-7 “Bit Format: Double-Precision Real in HP3000 Floating-point Notation” shows the HP3000 real number double-precision bit format. MPE V has system microcode instructions to handle packed decimals. For compatibility, MPE XL has compiler library procedures that run in NM and emulate the MPE V instruction set. In MPE XL, three languages use decimal types. COBOL and RPG use packed or unpacked decimals. BASIC has its own type, the floating-point decimal. In the decimal types, numbers are represented decimal digit by decimal digit. The individual digits of the decimal number are each represented in a BCD (Binary Coded Decimal) nibble. Each nibble is four bits long. Figure 2-8 “Bit Format: BCD Nibble”shows the bit format for each BCD nibble portion of a decimal. Packed decimals represent numbers with BCD (Binary Coded Decimal) nibbles. In packed decimals, each decimal digit of the number is individually represented by a 4-bit BCD. Decimals are always an even number of nibbles long. Figure 2-8 “Bit Format: BCD Nibble”, above, shows the bit format for each BCD nibble portion of a decimal. The rightmost (least significant) nibble is for the sign. There are three defined nibble combinations for the sign nibble. The three defined codes are:
Since each of the other nibbles represents the decimal digits 0 through 9, the valid nibble combinations are 0000 through 1001 for all but the last nibble. For example, to represent -52,194 as a packed decimal type, you would use one nibble for each of the five digits and (the last) one for the sign: 0101 0010 0001 1001 0100 1101 5 2 1 9 4 D=negative In COBOL, the PICTURE (PIC) clause specifies the position of the decimal point. For example, the PIC clause 999V99, specifies three digits will be followed by an implied decimal point and two more digits. If you pass the digits 12345 to a variable defined with this PIC clause, its value would be 123.45. In COBOL and RPG, using packed decimal will probably make your program more efficient than using unpacked. If you do use unpacked decimal, the compiler usually converts to packed for calculations. Figure 2-9 “Bit Format: Packed Decimal”shows the bit format for the packed decimal. COBOL and RPG represent numbers with packed and unpacked decimal types. For an unpacked decimal, each decimal digit is one byte long. Unpacked decimals are ASCII characters, interpreted by a correspondence code. The bit format is the ASCII character format in Figure 2-1 “Bit Format: ASCII Character”. For more information, see the notes on COBOL and RPG, “Formatting Data in Programs”, later in this chapter. HP Business BASIC represents decimal and short decimal types in floating-point decimal notation. The floating-point decimal form is similar to the E notation used to represent very small or very large numbers, as when 3.2E-27 is used to represent the value 3.2 x 10-27 The BASIC number is normalized (see below). A decimal in HP Business BASIC/XL is 64 bits long; a short decimal is 32 bits long. Table 2-4 “Range and Precision for Floating-Point Decimals”, below, shows a summary of the range and accuracy of each. Table 2-4 Range and Precision for Floating-Point Decimals
The representation of the value zero is a special case. To represent the value zero, set all the bits to zero. Since the number is normalized, it is assumed that the mantissa never begins with a zero unless the value of zero is intended. Fields of BASIC decimals: Floating-point decimals have three fields:
The exponent field contains a signed integer, represented in twos complement form. The decimal exponent field is the first 10 bits, bits (0:10), and ranges from -511 to +511. The short decimal exponent field is the first seven bits (bits 0:7) and ranges from -63 to +63.
In the mantissa field, each decimal digit of the number is individually represented by a BCD (Binary Coded Decimal) nibble. Each nibble is four bits long. (See Figure 2-8 “Bit Format: BCD Nibble”.) Since each nibble in this field represents the decimal digits 0 through 9, the valid mantissa nibble combinations are 0000 through 1001. The number is normalized. That is,
The mantissa field of a 64-bit decimal is bits (12:48). It has the capacity for 12 digits, each represented in a 4-bit nibble. The mantissa field of a 32-bit decimal is bits (8:24). It has the capacity for 6 digits, each represented in a 4-bit nibble. The sign field of a 64-bit decimal is bits (60:4), which are the four least significant bits, or the least significant BCD nibble. The hexadecimal value C (1100) in the sign nibble indicates the number is positive, and D (1101) indicates the number is negative. The sign field of a 32-bit short decimal is the seventh bit, bit (7:1). A value of 0 in the sign bit indicates the number is positive, and a value of 1 indicates the number is negative. Figure 2-10 “Bit Format: Floating-Point Decimal” shows the bit format for the floating-point decimal. Figure 2-11 “Bit Format: Short Floating-Point Decimal” shows the bit format for the short floating-point decimal. |