Sort-Of-Numeric Sorting
The standard sort
command available in Unix-like systems offers a
“human numeric” sorting option, described as follows in the flag
description for --human-numeric
in Coreutils
9.1:
-h
,--human-numeric-sort
compare human readable numbers (e.g., 2K 1G)
In computing, there’s some controversy about whether SI suffixes
should be interpreted with their usual base-ten meanings, so 1K is
103
= 1,000, or interpreted with often-more-convenient
base-two meanings, so 1K is 210
= 1,024. With the sort
command, the answer is “neither.” For example, no matter which
interpretation of SI suffixes you use, 2,000 is clearly larger than
1K. But let’s see what sort
thinks:
$ sort -h <<EOF
1K
2000
EOF
2000
1K
No matter what you were expecting, that wasn’t what you expected. The man page explains what’s happening:
-h
--human-numeric-sort
--sort=human-numeric
Sort numerically, first by numeric sign (negative, zero, or positive); then by SI suffix (either empty, ork
orK
, or one ofMGTPEZY
…); and finally by numeric value. For example,1023M
sorts before1G
becauseM
(mega) precedesG
(giga) as an SI suffix. This option sorts values that are consistently scaled to the nearest suffix, regardless of whether suffixes denote powers of 1000 or 1024…
So human numeric sorting is actually a lexicographic sort, first on the sign, then on the SI unit, and only then on the number.