Collating is defined as arranging character strings into order
(usually alphabetic). To do this, a mechanism must be
available that, given two character strings, decides which one
comes first. In Native Language Support (NLS) this mechanism
is the NLCOLLATE intrinsic.
|
| |
|
| NOTE:
This appendix deals with collating or lexical ordering and does
not include matching. For matching purposes, there is
generally a difference between A and a.
|
|
| |
|
Look at the full ROMAN8 character set and consider that all these
characters can appear in every European language. Even if a
character does not exist in a language, it can still show up
in names and/or addresses. It is quite useful to address a
letter to Spain correctly, even if it originates in Germany.
Therefore, the full ROMAN8 character set is considered to be
used in all languages, and a collating sequence has been defined
for all characters in the ROMAN8 character set for the languages
it supports. Table B-1 “Collating Sequence Priority” lists the collating sequence for
American-English, Canadian-French, Danish, Dutch, English, Finnish,
French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish.
All characters in an alpha or numeric group collate the same.
These characters usually differ only in uppercase versus lowercase
priority, or accent priority. (Refer to Table B-2 “Collating Sequence” for collating
sequences.) In sorting, they are initially considered the same.
If characters in the two strings do not determine which string comes
first, then the priorities of characters are used to determine the
order. Refer to Table B-1 “Collating Sequence Priority” for examples of collating sequence
priority.
Table B-1 Collating Sequence Priority
Example | Priority Explanation |
---|
Sorted | |
---|
Strings | |
---|
aEb, aEc | The third character in each string is different.
The "b" precedes the "c". |
aéb,aEb | The characters in the two strings are identical,
so accent priority determines the order. The "é" precedes the "E". |
abc, Abd | The last characters in the strings are different.
The "c" precedes the "d". |
aBc, abc | The characters in the two strings are the same, so
the uppercase priority determines the order. The "B" precedes the "b". |
Table B-2 “Collating Sequence” displays the collating sequence in three ways:
The graphic representation of the character.
The decimal equivalent of the character's binary value.
A description of the character.
Table B-2 Collating Sequence
Character | Decimal | Description |
---|
| Equivalent | |
---|
| 32 | Space |
| 160 | Do not use |
0 | 48 | Zero |
1 | 49 | One |
2 | 50 | Two |
3 | 51 | Three |
4 | 52 | Four |
5 | 53 | Five |
6 | 54 | Six |
7 | 55 | Seven |
8 | 56 | Eight |
9 | 57 | Nine |
A | 65 | Uppercase A |
a | 97 | Lowercase a |
Á | 224 | Uppercase A acute |
á | 196 | Lowercase a acute
|
À | 161 | Uppercase A grave |
à | 200 | Lowercase a grave |
 | 162 | Uppercase A circumflex |
â | 192 | Lowercase a circumflex |
Ä | 216 | Uppercase A umlaut/diaeresis |
ä | 204 | Lowercase a umlaut/diaeresis |
Å | 208 | Uppercase A degree |
å | 212 | Lowercase a degree |
à | 225 | Uppercase A tilde |
ã | 226 | Lowercase a tilde |
B | 66 | Uppercase B |
b | 98 | Lowercase b |
C | 67 | Uppercase C |
c | 99 | Lowercase c |
Ç | 180 | Uppercase C cedilla |
ç | 181 | Lowercase c cedilla |
D | 68 | Uppercase D |
d | 100 | Lowercase d |
Đ | 227 | Uppercase D stroke |
đ | 228 | Lowercase d stroke |
E | 69 | Uppercase E |
e | 101 | Lowercase e |
É | 220 | Uppercase E acute |
é | 197 | Lowercase e acute |
È | 163 | Uppercase E grave |
è | 201 | Lowercase e grave |
Ê | 164 | Uppercase E circumflex |
ê | 193 | Lowercase e circumflex |
Ë | 165 | Uppercase E umlaut/diaeresis |
ë | 205 | Lowercase e umlaut/diaeresis
|
F | 70 | Uppercase F |
f | 102 | Lowercase f |
G | 71 | Uppercase G |
g | 103 | Lowercase g |
H | 72 | Uppercase H |
h | 104 | Lowercase h |
I | 73 | Uppercase I |
i | 105 | Lowercase i |
Í | 229 | Uppercase I acute |
í | 213 | Lowercase i acute |
Ì | 230 | Uppercase I grave |
ì | 217 | Lowercase i grave |
Î | 166 | Uppercase I circumflex |
î | 209 | Lowercase i circumflex |
Ï | 167 | Uppercase I umlaut/diaeresis |
ï | 221 | Lowercase i umlaut/diaeresis |
J | 74 | Uppercase J |
j | 106 | Lowercase j |
K | 75 | Uppercase K |
k | 107 | Lowercase k |
L | 76 | Uppercase L |
l | 108 | Lowercase l |
M | 77 | Uppercase M |
m | 109 | Lowercase m |
N | 78 | Uppercase N |
n | 109 | Lowercase n |
Ñ | 182 | Uppercase N tilde |
ñ | 183 | Lowercase n tilde |
O | 79 | Uppercase O |
o | 110 | Lowercase o
|
Ó | 231 | Uppercase O acute |
ó | 198 | Lowercase o acute |
Ò | 232 | Uppercase O grave |
ò | 202 | Lowercase o grave |
Ô | 223 | Uppercase O circumflex |
ô | 194 | Lowercase o circumflex |
Ö | 218 | Uppercase O umlaut/diaeresis |
ö | 206 | Lowercase o umlaut/diaeresis |
Õ | 233 | Uppercase O tilde |
õ | 234 | Lowercase o tilde |
Ø | 210 | Uppercase O crossbar |
ø | 214 | Lowercase o crossbar |
P | 80 | Uppercase P |
p | 112 | Lowercase p |
Q | 81 | Uppercase Q |
q | 113 | Lowercase q |
R | 82 | Uppercase R |
r | 114 | Lowercase r |
S | 83 | Uppercase S |
s | 115 | Lowercase s |
Š | 235 | Uppercase S caron |
š | 236 | Lowercase s caron |
T | 84 | Uppercase T |
t | 116 | Lowercase t |
U | 85 | Uppercase U |
u | 117 | Lowercase u |
Ú | 237 | Uppercase U acute |
ú | 199 | Lowercase u acute |
Ù | 173 | Uppercase U grave |
ù | 203 | Lowercase u grave
|
Û | 174 | Uppercase U circumflex |
û | 195 | Lowercase u circumflex |
Ü | 219 | Uppercase U umlaut/diaeresis |
ü | 207 | Lowercase u umlaut/diaeresis |
V | 86 | Uppercase V |
v | 118 | Lowercase v |
W | 87 | Uppercase W |
w | 119 | Lowercase w |
X | 88 | Uppercase X |
x | 120 | Lowercase x |
Y | 89 | Uppercase Y |
y | 121 | Lowercase y |
Ÿ | 238 | Uppercase Y umlaut/diaeresis |
[yuml ] | 239 | Lowercase /diaeresis |
Z | 90 | Uppercase Z |
z | 122 | Lowercase z |
Þ | 240 | Uppercase thorn |
þ | 241 | Lowercase thorn |
| 177-178 | Currently undefined |
| 242-245 | Currently undefined |
( | 40 | Left parenthesis |
) | 41 | Right parenthesis |
[ | 91 | Left bracket |
] | 93 | Right bracket |
{ | 123 | Left brace |
} | 125 | Right brace |
« | 251 | Left guillemets |
» | 253 | Right guillemets |
< | 60 | Less than sign |
> | 62 | Greater than sign |
= | 61 | Equal sign
|
+ | 43 | Plus |
- | 45 | Minus |
± | 254 | Plus/Minus |
¼ | 247 | One quarter |
½ | 248 | One half |
° | 179 | Degree (ring) |
% | 37 | Percent sign |
* | 42 | Asterisk |
. | 46 | Period (point) |
, | 44 | Comma |
; | 59 | Semicolon |
: | 58 | Colon |
¿ | 185 | Inverse question mark |
? | 63 | Question mark |
¡ | 184 | Inverse exclamation point |
! | 33 | Exclamation point |
/ | 47 | Slant |
\ | 92 | Reverse slant |
| | 124 | Vertical bar |
@ | 64 | Commercial at |
& | 38 | Ampersand |
# | 35 | Number sign (hash) |
§ | 189 | Section |
$ | 36 | U. S. dollar sign |
¢ | 191 | U.S. cent sign |
£ | 187 | British pound sign |
£ | 175 | Italian lira sign |
¥ | 188 | Japanese yen sign |
ƒ | 190 | Dutch guilder sign |
| 186 | General currency sign
|
" | 34 | Double quote |
' | 96 | Opening single quote |
' | 39 | Closing single quote |
^ | 96 | Caret |
~ | 126 | Tilde |
´ | 168 | Acute grave |
` | 169 | Accent grave |
^ | 170 | Accent circumflex |
¨ | 171 | Umlaut/Diaeresis |
~ | 172 | Tilde accent |
_ | 95 | Underscore |
— | 246 | Long dash |
— | 176 | Overline |
a | 249 | Feminine ordinal sign |
o | 250 | Masculine ordinal sign |
[squf] | 252 | Solid |
| 0-31 | Control codes |
| 127 | DEL |
| 128-159 | Undefined control codes |
| 255 | Do not use
|
|
| |
|
| NOTE:
The Æ (uppercase AE ligature) and æ (lowercase ae ligature) are
expanded for collating purposes to AE or ae and collates as:
The ß (sharp s) is expanded for collating purposes to ss and
collates according to the German standard as:
|
|
| |
|
Table B-3 “Spanish Language-Dependent Variations” through Table B-6 “Finnish Language-Dependent Variations” show the language-dependent
variations to the collating sequence.