Error Correction Codes


Hamming Codes

There's an error correction code that separates the bits holding the original value (data bits) from the error correction bits (check bits), and the difference between the calculated and actual error correction bits is the position of the bit that's wrong. Very nice, eh? It's called a Hamming Code.

Error correction codes are a way to represent a set of symbols so that if any 1 bit of the representation is accidentally flipped, you can still tell which symbol it was. For example, you can represent two symbols x and y in 3 bits with the values x=111 and y=000. If you flip any one of the bits of these values, you can still tell which symbol was intended.

If more than 1 bit changes, you can't tell, and you probably get the wrong answer. So it goes; 1-bit error correction codes can only correct 1-bit changes.

If b bits are used to represent the symbols, then each symbol will own 1+b values: the value representing the symbol, and the values differing from it by 1 bit. In the 3-bit example above, y owned 1+3 values: 000, 001, 010, and 100. Representing n symbols in b bits will consume n*(1+b) values.

If there is a 1-bit error correction code of b bits for n symbols, then n*(1+b) <= 2b. An x-bit error correction code requires that n*( (b choose 0) + (b choose 1) + ... + (b choose x) ) <= 2b. See the Tables of Lexicodes for x-bit codes.

Suppose you want a 1-bit error correction code for 211 symbols. Since (14+1)*211>214 but (15+1)*211=215, the code must have at least 15 bits. Such a code would be optimal, every 15-bit arrangement would be owned by one of the 11-bit symbols.

OK. Now what exactly is the code? Can we name the symbols 0..211-1 and confine the error correction to just four bits? Yes. Can we calculate four error correction bits easily? Yes. Can we recover easily when an error occurs? Yes. Here's how.

 11 10  9  8  7  6  5     4  3  2     1
 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

In the diagram above, we have our 15 bits. Bits 8, 4, 2, 1 (the powers of 2) are the error correction bits (check bits). The other 11 bits are data bits, and store the name of the symbol (an 11 bit value).
1 0 0 0 1
2 0 0 1 0
3 0 0 1 1
4 0 1 0 0
5 0 1 0 1
6 0 1 1 0
7 0 1 1 1
8 1 0 0 0
9 1 0 0 1
101 0 1 0
111 0 1 1
121 1 0 0
131 1 0 1
141 1 1 0
151 1 1 1

To the right is a table of the values 1..15 and their binary representations. To represent the symbol 10100010101, that is

 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1
  1  0  1  0  0  0  1     0  1  0     1
we need to figure out what the correction code is. Simply take the binary representation of each set data bit (15, 13, 9, 6, 3) and XOR them together:
    1 1 1 1  15
    1 1 0 1  13
    1 0 0 1   9
    0 1 1 0   6
 ^  0 0 1 1   3
 -----------
    1 1 1 0
to get the check bits (8=1, 4=1, 2=1, 1=0). Then fill them in to complete the codeword.
 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1
  1  0  1  0  0  0  1  1  0  1  0  1  1  1  0
The finished codeword, 101000110101110, is sent to the remote side, which recomputes the check bits from the data bits, and compares that to the check bits actually received.

That's the definition of the code.

Now what if some bit gets flipped in transit? Suppose a data bit (13) get flipped, so you receive 100000110101110 instead of 101000110101110. The remote side recomputes the check bits from the set data bits it received

    1 1 1 1  15   
    1 0 0 1   9
    0 1 1 0   6
 ^  0 0 1 1   3  (13 was received 0, not 1)
 -----------
    0 0 1 1
and XORs that with the check bits it actually received
    1 1 1 0  received
  ^ 0 0 1 1  calculated
 -----------
    1 1 0 1
and finds it's not 0. In fact, it's the binary representation of 13, the bit that was flipped.

Let's try again, flipping one of the check bits (bit 4) instead. So the value received is 101000110100110 instead of 101000110101110. The remote side calculates the check bits from the set data bits received

    1 1 1 1  15
    1 1 0 1  13
    1 0 0 1   9
    0 1 1 0   6
 ^  0 0 1 1   3
 -----------
    1 1 1 0
and compares that to the check bits actually received
    8 2 4 1

    1 0 1 0  received
  ^ 1 1 1 0  calculated
 -----------
    0 1 0 0
and it's not zero. No, it's the binary representation of 4, the bit that was flipped. The difference of the calculated and received check bits is always either 0000 (no bit got flipped) or the position of the bit that got flipped.

This example was for 15 bits representing 11-bit symbols. The same thing works for any 2n-1 bits; the powers of 2 are the check bits. (Notice that the first 7 bits of the 15-bit code are a error correction code for 4-bit symbols. One error code is always the prefix of another like that.) The highest data bits can be lopped off if you don't have that many symbols.

See, isn't that neat?


Choosing an error correction code

Don't ask me. There are lots and lots of codes out there. I know about Hamming codes, I'm vaguely aware of Gallagher and Turbo and Reed-Solomon and Golay codes, I've heard of Viterbi and Convolution and Trellis codes but don't know what they are, I'm pretty sure Gray codes can't be used for error correction or detection at all, and beyond that it's an unknown field to me. So I can't give useful advice. Don't ask me.

Turbo codes are built out of Hamming codes. Using the Hamming code from this page, you'd have a 15x15 matrix of bits. That's 11 data rows and 4 check rows, and 11 data columns and 4 check columns. Fill in the (data,data) positions with data. Fill in (data, check) positions in data rows by filling in the check bits for a 15-bit Hamming code for that row, and fill in the (check, data) bits in data columns the same way. Now the data bits are filled for the check rows and check columns. Fill in the (check, check) positions by filling in the check bits for the 15-bit Hamming code for their row. You'd get the same results for (check, check) bits if you did the Hamming code for their column, because the value is really the XOR of a bunch of (data, data) bits, and it's the same set of (data, data) bits either way. You could even go hog-wild and build N-dimensional Turbo codes, or Turbo codes with different Hamming codes in different dimensions. Decoding Turbo codes well requires more tricks.

Gallagher codes (LDPC, Low Density Parity Checks) use an MxN matrix of bits, M < N, with a few bits set in each row and column. You multiply that matrix by an array of M bits of data to get an N bit codeword that you send. Um. Similar but simpler would be multiplying by numbers rather than matrices. Suppose you want to send a digit 0..9, say 8, so you multiply by 10, so send 8x10=80. You'll expect to receive one of 0, 10, 20, 30, 40, 50, 60, 70, 80, 90. If you receive 82, the "nearest" codeword is 80, so you'd assume that the original message was 80/10=8. Finding the "nearest" codeword to a received M-bit message produced by an LDPC requires more tricks.

A Gray code is what you get if you take a counting sequence i of machine-integers and compute (i^(i>>1)). It still goes through all possible values once, just as counting does, but adjacent values always differ in exactly 1 bit. I don't think this can be used for error correction at all.


Hash functions and block ciphers
A recipe for finding pentagons that tile the plane
Table of Contents