Within NLS, each supported language is associated with an
8-bit character set. (One character set may support many languages.)
Before the introduction of NLS, the only widely supported character
set was USASCII, a 128-character set designed to support American
English text. USASCII uses only seven bits of an 8-bit byte to encode
a character, the eighth or high-order bit is always zero.
It is possible to build supersets of USASCII permitting
encoding and manipulation of characters required by languages other
than American English, by using the eighth bit.
These supersets are referred to as 8-bit or extended character sets.
New characters are added with code values in the range 161-254.
|
| |
|
| NOTE:
All character sets are supersets of USASCII, and are occasionally
referred to as ASCII character sets.
|
|
| |
|
Another method of providing foreign characters not supported by
NLS involves 12 existing characters in USASCII with substitution
characters. The 7-bit substitution set eliminates some characters in
favor of others needed by a particular local language. A different
substitution set is necessary for each language. The NLS 8-bit
character sets support all USASCII characters (except for \ in KANA8)
in addition to the characters needed to support several Western
European-based languages, Middle Eastern countries, and KATAKANA.
|
| |
|
| NOTE:
Because 8-bit character sets are used in NLS, all bits of every
byte have significance. Application software must take care to
preserve the eighth bit (high-order), not allowing it to be modified
or reused for any special purpose. No differentiation should be
made between characters that have the eighth bit turned off or on,
as all are characters of equal status in the extended character
set.
|
|
| |
|