8-Bit Character Sets

Within NLS, each supported language is associated with an 8-bit character set. (One character set may support many languages.) Before the introduction of NLS, the only widely supported character set was USASCII, a 128-character set designed to support American English text. USASCII uses only seven bits of an 8-bit byte to encode a character, the eighth or high-order bit is always zero.

It is possible to build supersets of USASCII permitting encoding and manipulation of characters required by languages other than American English, by using the eighth bit. These supersets are referred to as 8-bit or extended character sets. New characters are added with code values in the range 161-254.




	NOTE: All character sets are supersets of USASCII, and are occasionally referred to as ASCII character sets.

Another method of providing foreign characters not supported by NLS involves 12 existing characters in USASCII with substitution characters. The 7-bit substitution set eliminates some characters in favor of others needed by a particular local language. A different substitution set is necessary for each language. The NLS 8-bit character sets support all USASCII characters (except for \ in KANA8) in addition to the characters needed to support several Western European-based languages, Middle Eastern countries, and KATAKANA.




	NOTE: Because 8-bit character sets are used in NLS, all bits of every byte have significance. Application software must take care to preserve the eighth bit (high-order), not allowing it to be modified or reused for any special purpose. No differentiation should be made between characters that have the eighth bit turned off or on, as all are characters of equal status in the extended character set.

8-Bit Character Sets

Technical documentation

» Table of Contents

» Index