Unsound & Incomplete

Sort-Of-Numeric Sorting

The standard sort command available in Unix-like systems offers a “human numeric” sorting option, described as follows in the flag description for --human-numeric in Coreutils 9.1:

-h, --human-numeric-sort

compare human readable numbers (e.g., 2K 1G)

In computing, there’s some controversy about whether SI suffixes should be interpreted with their usual base-ten meanings, so 1K is 103 = 1,000, or interpreted with often-more-convenient base-two meanings, so 1K is 210 = 1,024. With the sort command, the answer is “neither.” For example, no matter which interpretation of SI suffixes you use, 2,000 is clearly larger than 1K. But let’s see what sort thinks:

$ sort -h <<EOF
1K
2000
EOF

2000
1K

No matter what you were expecting, that wasn’t what you expected. The man page explains what’s happening:

-h
--human-numeric-sort
--sort=human-numeric

Sort numerically, first by numeric sign (negative, zero, or positive); then by SI suffix (either empty, or k or K, or one of MGTPEZY…); and finally by numeric value. For example, 1023M sorts before 1G because M (mega) precedes G (giga) as an SI suffix. This option sorts values that are consistently scaled to the nearest suffix, regardless of whether suffixes denote powers of 1000 or 1024…

So human numeric sorting is actually a lexicographic sort, first on the sign, then on the SI unit, and only then on the number.