sortsort/merge utility |
Command |
sort
[-cmu
]
[-o
outfile]
[-t
char]
[-y
[n]]
[-z
n]
[-bdfiMnr
]
[-k
startpos[,endpos]] ...
[file ...]
sort
[-cmu
]
[-o
outfile]
[-t
char]
[-y
n]
[-z
n]
[-bdfiMnr
]
[+
startposition
[-
endposition]] ...
[file ...]
sort
command implements a full sort and merge facility.
sort
operates on input files containing records that are
separated by the newline character.
If you do not specify either the -c
or the
-m
option, sort
sorts the concatenation
of all input files and produces the output on standard output.
-b
skips, for comparison purposes, any leading white space (blank or tab) in any field (or key specification).
-c
checks input files to ensure that they are correctly ordered according to the key position and sort ordering options specified, but does not modify or output the files. This option only affects the exit code.
-d
uses dictionary ordering. sort
examines only blanks, upper and lowercase letters, and numbers when making
comparisons.
-f
converts lowercase letters to uppercase for comparison purposes.
-i
ignores, for comparison purposes, non-printable characters.
-k
[startpos,[endpos]]specifies a sorting key. See the Sorting Keys section of this reference page for more information.
-M
assumes that the field contains a month name for comparison purposes.
Any leading white space is ignored. If the field starts with the first
three letters of a month name in uppercase or lowercase, the comparisons
are in month-in-year order. Anything that is not a recognizable month name
compares less than JAN
.
-m
merges files into one sorted output stream. This option assumes
that each input file is correctly ordered according to the other options
specified on the command line; you can check this with the
-c
option.
-n
assumes that the field contains an initial numeric value.
sort
sorts first by numeric value, then by the
remaining text in the field, according to options. This option treats a
field that contains no digits as if it had a value of zero. If more than
one line contains no digits, the lines are sorted alphanumerically.
-o
outfilewrites output to the file outfile. By default,
sort
writes output onto the standard output. The
output file can be one of the input files. In this case,
sort
makes a copy of the data to allow the (potential)
overwriting of the input file.
-r
reverses the order of all comparisons so that sort
writes output from largest to smallest rather than smallest to largest.
-t
charindicates that the character char separates input fields. When
you do not specify the -t
option,
sort
assumes that any number of white space characters
(blank or tab) separate fields.
-u
ensures that output records are unique. If two or more input records
have equal sort keys, sort
writes only the first
record to the output. When you use -u
with
-c
, sort
prints a diagnostic
message if the input records have any duplicates.
-y
[n]restricts the amount of memory available for sorting to n K of
memory (where a K of memory is 1024 bytes). If n is missing,
sort
chooses a reasonable maximum amount of memory for
sorting, dependent upon system configuration. sort
needs at least enough memory to hold five records simultaneously. If you
try to request less, sort
automatically takes enough.
When the input files overflow available memory, sort
automatically does a polyphase merge (external sorting) algorithm which
is, of necessity, much slower than internal sorting. n must be at
least 2. n has a maximum value of 1024 and a default value of
250.
-z
nindicates that the longest input record (including the newline character) is n characters in length. By default, record length is limited to 400 characters.
+
startposition[-
endposition]is an obsolete method of specifying a sorting key. See the Sorting Keys section of this reference page for more information.
-b
, -d
, -f
,
-i
, -M
, -n
,
-r
, and -t
options control how
sort
compares records to determine the order that the
records are written to the output in. These ordering options apply globally to
all sorting keys except those keys that you individually specify the ordering
option for. For more on sorting keys, see the next section.
sort
examines entire input records to determine
ordering. By specifying sorting keys on the command line, you can
tell sort
to restrict its attention to one or more parts of
each record.
You can indicate the start of a sorting key with
where m and the optional n are positive integers. You can choose options from the set-k m[.n][options]
bdfiMnr
(described previously)
to specify how sort
does comparisons for that sorting key.
(The b
option behaves differently from the other options;
see the next paragraph.) When you set one or more ordering options for a key,
sort
uses those options instead of the global ordering
options for that key. If you do not specify any options for the key, the
global ordering options are used.
The number m specifies which field in the input record contains the start
of the sorting key. The character given with the -t
option
separates input fields; if this option is not given, spaces or tabs separate the
fields. The number n specifies which character in the mth field
marks the start of the sorting key; if you do not specify n, the sorting
key starts at the first character of the mth field.
When you do not specify the -t
option, a field is
considered to begin with the white space that separates it from the
preceding field. When -t
is specified, a field begins
with the character following the separator.
where p and q are positive integers, indicating that the sort key ends with the qth character of the pth field. If you do not specify q or you specify a value of-k m[.n][options],p[.q][options]
0
for q, the
sorting key ends at the last character of the pth field. For example,
defines a sorting key that extends from the third character of the second field to the sixth character of the fourth field. The-k 2.3,4.6
b
option
applies only to the key start or key end that it is specified for.
sort
also supports a historical method of defining the
sorting key. Using this method, you indicate the start of the sorting key with
which is equivalent to+m[.n][options]
You can also indicate the end of a sorting key with-k m+1[.n+1][options]
which when preceded with +m[.n] is equivalent to-p[.q][options]
if q is specified and is zero. Otherwise-k m+1[.n+1],p.0[options]
For example,-k m+1[.n+1],p+1[.q][options]
defines a sorting key with a starting position that+1.2 -3.5
sort
finds by skipping the first field and the first 2 characters of the next field,
its end position is found by skipping the first three fields and then the first
five characters of the next field. In other words the sorting key extends from
the third character of the second field to the sixth character of the fourth
field. This is the same key as defined under the -k
option
described earlier.
With either syntax, if the end of a sorting key is not a valid position or no
end was specified, the sorting key extends to the end of the input record.
You can specify multiple sort key positions by using several
-k
options, or several +
and
-
options. In this case, sort
uses the
second sorting key only for records where the first sorting keys are equal, the
third sorting key only when the first two are equal, and so on. If all key
positions compare equal, sort
determines ordering by using
the entire record.
When you specify the -u
option to determine the uniqueness
of output records, sort
looks only at the sorting keys, not
the whole record. (Of course, if you specify no sorting keys,
sort
considers the whole record to be the sorting key.)
use the command:30 December 23 MAY 25 June 10 June
To merge two dictionaries, with one word per line:sort -k 2M -k 1n
sort -m -dfi dict1 dict2 >newdict
TMPDIR
contains the path name of the directory to be used for temporary files.
/tmp/stm*
temporary files used for merging and -o
option.
You can specify a different directory for temporary files using the
TMPDIR
environment variable. For further
information, see envvar
.
0
Successful completion. Also returned if -c
is
specified and the file is already in correctly sorted order.
1
Returned if you specified -c
and the file is not
correctly sorted. Also returned to indicate a non-unique record if you
specified -cu
.
2
Failure due to any of the following:
-k
-o
option-o
-t
-t
-y
or
-z
The key position was not specified correctly. Check the format and try again.
sort
has determined that filename is binary
because it found a NULL ('\0'
) character in a line.
This error normally occurs when you specify very large numbers for
-y
or -z
and there is not enough
memory available for sort
to satisfy the request.
Any input lines that are longer than nn. which is the default
number of characters (400) or the number specified with the
-z
option, are truncated.
You specified -k
, but did not specify a key
definition after the -k
.
With the -c
and -u
options, a
non-unique record was found.
With the -c
option, an incorrect ordering was
discovered.
Any file not ending in a newline character has one added.
The named temporary (intermediate) file could not be created. Make sure
that you have a directory named /tmp
and that this directory
has space to create files. The directory for temporary files can be
changed using the TMPDIR
environment variable;
see envvar
.
sort
could not generate a name for a temporary
working file. This should almost never happen.
Insufficient space was available for a temporary file. Make sure that
you have a directory named /tmp
and that this directory has
space to create files. The directory for temporary files can be changed
using the TMPDIR
environment variables; see
envvar
.
This implementation of sort
has a limit of 64 key
field positions.
Some error occurred in writing the standard output. This normally occurs when there is insufficient disk space to hold all of the intermediate data, or a diskette is write protected.
-M
and -y
options are extensions to
the POSIX and XPG standards. The -z
option is an XPG
extension to the POSIX standard. The POSIX.2 standard regards the historical
syntax for defining sorting keys as obsolete. Therefore, you should use only the
-k
option in the future.
sortgen
AWK script is a useful way to handle complex
sorting tasks. It is described in the AWK Tutorial in the User's Guide.
It originally appeared in The AWK Programming Language, by Aho,
Weinberger, and Kernighan.