Binary Sorting Multi-Column Data

Databases and other systems often have to sort multi-column data. The way you do that is first you sort the first column, then for rows that match in the first column you sort the second column, and so forth. For variable-length data, such as character strings, any prefix of a value sorts smaller than the value. For example B sorts smaller than BB. So (B,BB) sorts smaller than (BA,A).

Binary sorting takes the byte representations of two values and compares them byte by byte. The first byte that differs determines which value sorts smaller. Again, if the values are variable length, a value always sorts bigger than any of its prefixes.

The difficulty is that for multi-column data with variable width column values, it is not enough to concatenate the columns and then binary sort. For example the concatenations of (B,BB) and (BA,A) are BBB and BAA. The column separators need to be preserved, and they sort lower than any possible byte values. All 256 possible byte values might be occupied by the actual values, so the column separator has to be some 257th value.

A solution (patent US6778985B1 covers this, expired 2019) is to encode variable-length column values to make room for a column separator before concatenating the column values into a byte-sortable key. This encoded value is always a few bytes longer than the original value. Fixed-length columns do not need encoding.

A simple encoding is to find all bytes of value 0 or 1, and replace them with two bytes, (1,1) or (1,2). This removes all zeros from the byte stream. A zero can then be appended to the end to represent the column separator. For example, 012345 would become 111223450.

This is reversible: if you see 0, it is the column separator. If you see 1, look at the next byte as well, it is either 1 or 2, if it is 1 then emit 0 and if it is 2 emit 1. Otherwise, if you see any byte value x bigger than 1, emit x.

It doesn't have to be 0 and 1 that get encoded. If X and X+1 are two consecutive byte values that don't get used much, any V less than X can encode as V+1, X can encode as (X+1,X+1), X+1 can encode as (X+1,X+2), and anything bigger than X+1 encodes as itself.

That can double the size of data. Here is another schemes that can at worst make the data 1.5x bigger. Look at the first byte, x. If you see a 0 or a 1, look at the next byte as well, y. (0,0) becomes (1,1), (0,1) becomes (1,2), otherwise (0,y) becomes (1,3,y), (1,0) becomes (1,4), (1,1) becomes (1,5), (1,y in 2..6) becomes (1,6,y), otherwise (1,y where y > 6) becomes (1,y), otherwise x becomes x. There are lots of encodings that maintain sort order. They all increase the byte length some.

Binary sort in reverse order is then easily done too. Just complement the byte-sortable encoded multi-column key.

Radix sort requires binary sorting. Radix sorting is around 4x faster than quicksort, but it isn't feasible for multi-column sort keys without this technique. B-trees sometimes want binary sorting too, especially to represent descending indexes.

Even for quicksort or mergesort, it may be more efficient to encode to a binary sortable value then do binary sorting. Column-aware sorting requires using different code for different column types, which is relatively expensive. Encoding all the values is O(n), vs O(nlogn) comparisons during a sort. So the gain of not being column-aware during comparisons may be greater than the loss of having to do the initial encoding.