This is a small fast pseudorandom number generator, suitable for
large statistical calculations, but not of cryptographic quality.
Although there is no guaranteed minimum cycle length, the average
cycle length is expected to be about 2^{126} results.

typedef unsigned long int u4; typedef struct ranctx { u4 a; u4 b; u4 c; u4 d; } ranctx; #define rot(x,k) (((x)<<(k))|((x)>>(32-(k)))) u4 ranval( ranctx *x ) { u4 e = x->a - rot(x->b, 27); x->a = x->b ^ rot(x->c, 17); x->b = x->c + x->d; x->c = x->d + e; x->d = e + x->a; return x->d; } void raninit( ranctx *x, u4 seed ) { u4 i; x->a = 0xf1ea5eed, x->b = x->c = x->d = seed; for (i=0; i<20; ++i) { (void)ranval(x); } } |

I have not found any test in any configuration that it does not pass. It passes DIEHARD. Sampling just the bottom byte, it passes the run test (both up and down), and the gap test (for gaps up to length 32) for 2 trillion values (chi.c). The frequency test (again on the bottom byte) looked suspicious at 1 trillion values (chi-square=5), so I ran it for 4 trillion values (chi-square=0.42), showing the earlier result was a fluke (freq.c). A test that counts bits per value for five consecutive values (countx.c) passes for at least 16 trillion values, both for normal and for graycoded values. It passes Geronimo Jones' nda test for at least 1 trillion values (4 trillion bytes).

The strongest test for small-state generators I've run across is Bob Blorp2's bitcounting test (countx.c). Near as I can tell, what it is doing is indirectly measuring how much avalanche happens by the time entropy is first recycled. For this generator that's 5 results. I can measure that directly too: for every bit of the internal state, I can start with two copies of a random state that differ in just that bit, then run the generator for 5 results, and XOR their 5th result. Complete avalanche would give an average of 16 bits differing. The program rngav.c measures this: for the PRNG above, at least 8.8 bits are affected on average. I used the shifts (27, 17). Others that achieve 8.8 bits of avalanche include (9,16), (9,24), (10,16), (10,24), (11,16), (11,24), (25,8), (25,16), (26,8), (26,16), (26,17), and (27,16). Avalanche is much better in reverse, but the code is also slower, both because of more read-after-write dependencies. It is possible to not fully achieve avalanche yet still pass all practical tests for randomness because nearly-identical states are very rare among the set of all possible states.

If raninit(seed) is used (one of those 2^{32} seeds), the
shortest cycle is expected to be about 2^{94} (because
126-32 = 94), and it expected that about one seed will run into
another seed within 2^{64} values (because 128-64-2*32=0).
That's probabilistic; the actual shortest cycle or collision could be better
or worse than that. I've tested each seed from raninit() to certain
lengths to guarantee it does not cycle and does not hit any other
seed. (You do that by starting a=0xf1ea5eed and b=c=d=seed, then
checking the state after every value for b==c and b==d and
a==0xf1ea5eed). How far I tested is limited by computing resources
on hand. I tested seed 0
(the first 2^{0} seeds) for 2^{50} values, 0..31
(the first 2^{5} seeds) for 2^{46} values, the first
2^{10} seeds for 2^{40} values, the first
2^{21} seeds for 2^{32} values, and all
2^{32} seeds for 2^{20} values. No cycles or overlaps
were found.

I wrote this PRNG. I place it in the public domain. Same goes for at least the implementation of all those tests linked to above.

I suspect that all you need for a statistically good pseudorandom number generator is

- A reversible, nonlinear function where all internal state bits affect one another
- A big enough state to give you a long cycle length
- Sufficiently fast mixing, when run either forward or backward

(April 2009) Elias Yarrkov used a black-box solver to find some fixed points of this generator (which produce cycles of length 1):

{ 0x00000000, 0x00000000, 0x00000000, 0x00000000 }, { 0x77777777, 0x55555555, 0x11111111, 0x44444444 }, { 0x5591F2E3, 0x69EBA6CD, 0x2A171E3D, 0x3FD48890 }, { 0x47CB8D56, 0xAE9B35A7, 0x5C78F4A8, 0x522240FF } |

{ 0x71AAC8F9, 0x66B4F5D3, 0x1E950B8F, 0x481FEA44 }, { 0xAB23E5C6, 0xD3D74D9A, 0x542E3C7A, 0x7FA91120 } |

A third rotate could be introduced into ranval() without increasing the maximum parallel path through the code. This came out significantly slower with the Microsoft Visual Studio compiler, though. (MS VS uses pointer arithmetic for the additions, to trick Intel's compiler into using 3-variable instructions instead of 2-variable. The extra rotate thwarted that, thus the speed hit.) A third rotate could increase minimum avalanche to 13 bits, on average, like so:

u4 ranval( ranctx *x ) { u4 e = x->a - rot(x->b, 23); x->a = x->b ^ rot(x->c, 16); x->b = x->c + rot(x->d, 11); x->c = x->d + e; x->d = e + x->a; return x->d; } |

Shifts amounts achieving 13 bits avalanche include (3,14,24), (3,25,15), (4,15,24), (6,16,28), (7,16,27), (8,14,3), (11,16,23), (12,16,22), (12,17,23), (13,16,22), (15,25,3), (16,9,3), (17,9,3), (17,27,7), (19,7,3), (23,15,11), (23,16,11), (23,17,11), (24,3,16), (24,4,16), (25,14,3), (27,16,6), (27,16,7). I imagine these are significantly more pseudorandom than the PRNG at the top. However, since they run slower and no test I know of can detect nonrandomness in either, I'm recommending the 2-shift version.

My original timings had one file containing both the generator and a tight loop testing the generator. The compiler inlined the generators into the tight loop, keeping the state in registers. On the register-starved Intel platform, that would never happen for real uses of a generator: whatever the random numbers are used for would kick the generator's state out of registers.

Another approach is to use the generator to fill a large array, then to read values out of that array one by one, like with this code in a .h file:

static inline u4 ranval(random_context *r) { if (r->count >= SIZE) { fill(r); r->count = 0; } return r->result[r->count++]; } |

Here's some new timings of generators for 20000*65536 results for various generators, variously inlined, used with a 256-item array, or in a routine that calculates a single value. Timed with cygwin gcc 4.3.2 -O3 on a 1.86GHz Intel 6300.

- 2.52 seconds: Knuth's delayed fibonacci, 256-item array, inlined in the tight loop.
- 3.64 seconds: Knuth's delayed fibonacci, separate routine, fill and exhaust a 256-item array.
- 4.31 seconds: My 2-rotate small prng, inlined in the tight loop.
- 4.69 seconds: My 3-rotate small prng, inlined in the tight loop.
- 5.75 seconds: My 2-rotate small prng, separate routine, fill a 256-item array then exhaust the values.
- 6.58 seconds: Knuth's delayed fibonacci, separate routine, called directly (it calculates a new value per call, 1024 bytes total state).
- 6.72 seconds: My 3-rotate small prng, separate routine, fill a 256-item array then exhaust the values.
- 8.22 seconds: ISAAC, my cryptographic random number generator, which uses a 256-item array.
- 9.55 seconds: My 2-rotate small prng, separate routine, called directly (it calculates a new value per call, 16 bytes total state).
- 10.23 seconds: My 3-rotate small prng, separate routine, called directly (it calculates a new value per call, 16 bytes total state).
- 11.90 seconds: the Mersenne Twister, used as directed.

If you want to use 8-byte terms instead of 4-byte terms, the proper rotate amounts are (39,11) for the two-rotate version (yielding at least 13.3 bits of avalanche after 5 rounds) or (7,13,37) for the three-rotate version (yielding 18.4 bits of avalanche after 5 rounds). I think I'd go with the three-rotate version, because the ideal is 32 bits of avalanche, and 13.3 isn't even half of that.

typedef unsigned long long u8; typedef struct ranctx { u8 a; u8 b; u8 c; u8 d; } ranctx; #define rot(x,k) (((x)<<(k))|((x)>>(64-(k)))) u8 ranval( ranctx *x ) { u8 e = x->a - rot(x->b, 7); x->a = x->b ^ rot(x->c, 13); x->b = x->c + rot(x->d, 37); x->c = x->d + e; x->d = e + x->a; return x->d; } void raninit( ranctx *x, u8 seed ) { u8 i; x->a = 0xf1ea5eed, x->b = x->c = x->d = seed; for (i=0; i<20; ++i) { (void)ranval(x); } } |

I don't think that there's an 8-byte rotate instruction on any 64-bit platform. And you only need 2 terms to get to 128 bits of internal state if you have 64-bit terms. Quite likely 64-bit deserves a whole different approach, not just different constants.

Dale Weiler implemented this prng as a compiletime constant

The small-state version isn't suitable for encryption. However, if x->b is replaced with an array and e is used for lookup in the array, it becomes something that is possibly of cryptographic strength:

typedef unsigned long int u4; typedef struct ranctx { u4 a; u4 b[256]; u4 c; u4 d;} ranctx; #define rot(x,k) (((x)<<(k))|((x)>>(32-(k)))) void ranxor( ranctx *x, u4 m[256] ) { u4 i; for (i=0; i<256; ++i) { u4 e = x->a - rot(x->b[i], 27); x->a = x->b[e%256] ^ rot(x->c, 17); x->b[e%256] = x->c + x->d; x->c = x->d + e; x->d = e + x->a; m[i] ^= x->d; } } void raninit( ranctx *x, u4 seed[128] ) { u4 i; u4 junk[256]; x->a = x->c = x->d = 0xf1ea5eed; for (i=0; i<128; ++i) { x->b[i] = x->b[i+128] = seed[i]; } for (i=0; i<10; ++i) { ranxor(x, junk); } } |

The routine ranxor() takes a message m[] consisting of 256 4-byte values and XORs 256 pseudorandom 4-byte values with it.

I strongly recommend against using this for any cryptographic purpose. If it is secure, it is just barely so. I don't think it's even particularly fast. The only value I see in it is as a target for cryptanalysis and possible inspiration for future stream ciphers.

Although lack of bias comes from the same good mixing as the original generator, the security comes more from the pseudorandom lookups than from the mixing. The most blatant weakness is that there is one pseudorandom lookup per result returned, leaving a system with n+256 variables and n equations after seeing n results (counting pseudorandom lookups as separate variables than lookups from known positions). If you can make progress guessing or linking the variables the whole thing should be solvable. Another weakness (and the most curious part of this generator) is that the positions to be updated in x->b[] are selected pseudorandomly rather than in a regular order. That means each pass leaves about 1/e of the positions in x->b[] unchanged. Which means you can reduce the number of independent variables just by guessing which positions don't get changed between rounds.

I still don't know how to break it, but I haven't tried very hard.

Text for a talk I gave on designing smallstate PRNGs

Hash functions for 32-bit integers

A candidate for a secure hash (Maraca)

Some paper airplane designs

Lexicodes: sets of values all n bits apart

Table of Contents