Appendix B Collating Sequences

Table of Contents

Spanish
Danish/Norwegian
Swedish
Finnish

Collating is defined as arranging character strings into order (usually alphabetic). To do this, a mechanism must be available that, given two character strings, decides which one comes first. In Native Language Support (NLS) this mechanism is the NLCOLLATE intrinsic.




	NOTE: This appendix deals with collating or lexical ordering and does not include matching. For matching purposes, there is generally a difference between A and a.

Look at the full ROMAN8 character set and consider that all these characters can appear in every European language. Even if a character does not exist in a language, it can still show up in names and/or addresses. It is quite useful to address a letter to Spain correctly, even if it originates in Germany. Therefore, the full ROMAN8 character set is considered to be used in all languages, and a collating sequence has been defined for all characters in the ROMAN8 character set for the languages it supports. Table B-1 “Collating Sequence Priority” lists the collating sequence for American-English, Canadian-French, Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish.

All characters in an alpha or numeric group collate the same. These characters usually differ only in uppercase versus lowercase priority, or accent priority. (Refer to Table B-2 “Collating Sequence” for collating sequences.) In sorting, they are initially considered the same. If characters in the two strings do not determine which string comes first, then the priorities of characters are used to determine the order. Refer to Table B-1 “Collating Sequence Priority” for examples of collating sequence priority.

Table B-1 Collating Sequence Priority

Example	Priority Explanation
Sorted
Strings
aEb, aEc	The third character in each string is different. The "b" precedes the "c".
aéb,aEb	The characters in the two strings are identical, so accent priority determines the order. The "é" precedes the "E".
abc, Abd	The last characters in the strings are different. The "c" precedes the "d".
aBc, abc	The characters in the two strings are the same, so the uppercase priority determines the order. The "B" precedes the "b".

Table B-2 “Collating Sequence” displays the collating sequence in three ways:

The graphic representation of the character.
The decimal equivalent of the character's binary value.
A description of the character.

Table B-2 Collating Sequence

Character	Decimal	Description
	Equivalent
	32	Space
	160	Do not use
0	48	Zero
1	49	One
2	50	Two
3	51	Three
4	52	Four
5	53	Five
6	54	Six
7	55	Seven
8	56	Eight
9	57	Nine
A	65	Uppercase A
a	97	Lowercase a
Á	224	Uppercase A acute
á	196	Lowercase a acute
À	161	Uppercase A grave
à	200	Lowercase a grave
Â	162	Uppercase A circumflex
â	192	Lowercase a circumflex
Ä	216	Uppercase A umlaut/diaeresis
ä	204	Lowercase a umlaut/diaeresis
Å	208	Uppercase A degree
å	212	Lowercase a degree
Ã	225	Uppercase A tilde
ã	226	Lowercase a tilde
B	66	Uppercase B
b	98	Lowercase b
C	67	Uppercase C
c	99	Lowercase c
Ç	180	Uppercase C cedilla
ç	181	Lowercase c cedilla
D	68	Uppercase D
d	100	Lowercase d
Đ	227	Uppercase D stroke
đ	228	Lowercase d stroke
E	69	Uppercase E
e	101	Lowercase e
É	220	Uppercase E acute
é	197	Lowercase e acute
È	163	Uppercase E grave
è	201	Lowercase e grave
Ê	164	Uppercase E circumflex
ê	193	Lowercase e circumflex
Ë	165	Uppercase E umlaut/diaeresis
ë	205	Lowercase e umlaut/diaeresis
F	70	Uppercase F
f	102	Lowercase f
G	71	Uppercase G
g	103	Lowercase g
H	72	Uppercase H
h	104	Lowercase h
I	73	Uppercase I
i	105	Lowercase i
Í	229	Uppercase I acute
í	213	Lowercase i acute
Ì	230	Uppercase I grave
ì	217	Lowercase i grave
Î	166	Uppercase I circumflex
î	209	Lowercase i circumflex
Ï	167	Uppercase I umlaut/diaeresis
ï	221	Lowercase i umlaut/diaeresis
J	74	Uppercase J
j	106	Lowercase j
K	75	Uppercase K
k	107	Lowercase k
L	76	Uppercase L
l	108	Lowercase l
M	77	Uppercase M
m	109	Lowercase m
N	78	Uppercase N
n	109	Lowercase n
Ñ	182	Uppercase N tilde
ñ	183	Lowercase n tilde
O	79	Uppercase O
o	110	Lowercase o
Ó	231	Uppercase O acute
ó	198	Lowercase o acute
Ò	232	Uppercase O grave
ò	202	Lowercase o grave
Ô	223	Uppercase O circumflex
ô	194	Lowercase o circumflex
Ö	218	Uppercase O umlaut/diaeresis
ö	206	Lowercase o umlaut/diaeresis
Õ	233	Uppercase O tilde
õ	234	Lowercase o tilde
Ø	210	Uppercase O crossbar
ø	214	Lowercase o crossbar
P	80	Uppercase P
p	112	Lowercase p
Q	81	Uppercase Q
q	113	Lowercase q
R	82	Uppercase R
r	114	Lowercase r
S	83	Uppercase S
s	115	Lowercase s
Š	235	Uppercase S caron
š	236	Lowercase s caron
T	84	Uppercase T
t	116	Lowercase t
U	85	Uppercase U
u	117	Lowercase u
Ú	237	Uppercase U acute
ú	199	Lowercase u acute
Ù	173	Uppercase U grave
ù	203	Lowercase u grave
Û	174	Uppercase U circumflex
û	195	Lowercase u circumflex
Ü	219	Uppercase U umlaut/diaeresis
ü	207	Lowercase u umlaut/diaeresis
V	86	Uppercase V
v	118	Lowercase v
W	87	Uppercase W
w	119	Lowercase w
X	88	Uppercase X
x	120	Lowercase x
Y	89	Uppercase Y
y	121	Lowercase y
Ÿ	238	Uppercase Y umlaut/diaeresis
[yuml ]	239	Lowercase /diaeresis
Z	90	Uppercase Z
z	122	Lowercase z
Þ	240	Uppercase thorn
þ	241	Lowercase thorn
	177-178	Currently undefined
	242-245	Currently undefined
(	40	Left parenthesis
)	41	Right parenthesis
[	91	Left bracket
]	93	Right bracket
{	123	Left brace
}	125	Right brace
«	251	Left guillemets
»	253	Right guillemets
<	60	Less than sign
>	62	Greater than sign
=	61	Equal sign
+	43	Plus
-	45	Minus
±	254	Plus/Minus
¼	247	One quarter
½	248	One half
°	179	Degree (ring)
%	37	Percent sign
*	42	Asterisk
.	46	Period (point)
,	44	Comma
;	59	Semicolon
:	58	Colon
¿	185	Inverse question mark
?	63	Question mark
¡	184	Inverse exclamation point
!	33	Exclamation point
/	47	Slant
\	92	Reverse slant
\|	124	Vertical bar
@	64	Commercial at
&	38	Ampersand
#	35	Number sign (hash)
§	189	Section
$	36	U. S. dollar sign
¢	191	U.S. cent sign
£	187	British pound sign
£	175	Italian lira sign
¥	188	Japanese yen sign
`ƒ`	190	Dutch guilder sign
	186	General currency sign
"	34	Double quote
'	96	Opening single quote
'	39	Closing single quote
^	96	Caret
~	126	Tilde
´	168	Acute grave
`	169	Accent grave
^	170	Accent circumflex
¨	171	Umlaut/Diaeresis
~	172	Tilde accent
_	95	Underscore
—	246	Long dash
—	176	Overline
`a`	249	Feminine ordinal sign
`o`	250	Masculine ordinal sign
[squf]	252	Solid
	0-31	Control codes
	127	DEL
	128-159	Undefined control codes
	255	Do not use

NOTE: The Æ (uppercase AE ligature) and æ (lowercase ae ligature) are expanded for collating purposes to AE or ae and collates as:

ad AE Ae aE ae AF

The ß (sharp s) is expanded for collating purposes to ss and collates according to the German standard as:

sr ss st

Table B-3 “Spanish Language-Dependent Variations” through Table B-6 “Finnish Language-Dependent Variations” show the language-dependent variations to the collating sequence.

Appendix B Collating Sequences

Technical documentation

» Table of Contents

» Index