ENT

Here is 300MB of random from the device checked by ENT (notice I am not using -b as I was before, without it it is checking entropy on BYTE scale which is tougher):
$ ./ent dump
Entropy = 7.999999 bits per byte.

Optimum compression would reduce the size
of this 306380800 byte file by 0 percent.

Chi square distribution for 306380800 samples is 253.74, and randomly
would exceed this value 51.06 percent of the times.

Arithmetic mean value of data bytes is 127.5022 (127.5 = random).
Monte Carlo value for Pi is 3.141608288 (error 0.00 percent).
Serial correlation coefficient is 0.000074 (totally uncorrelated = 0.0).
ENT gives better results for Whirlygig in line with how much you feed it. With a 40MB test file, it reported entropy of 7.999996. That makes sense when you consider the data being really random, it shows its true colours only in the longer term since sample by sample, it can be doing anything at all.

rngtest

Rngtest had always puzzled me so most of this post is devoted to picking apart the meaning from these results from 1.27Tbits of Whirlygig randomness (1271Gbits).
rngtest: bits received from input: 1271367467008
rngtest: FIPS 140-2 successes: 63517865
rngtest: FIPS 140-2 failures: 50508
rngtest: FIPS 140-2(2001-10-10) Monobit: 6560
rngtest: FIPS 140-2(2001-10-10) Poker: 6444
rngtest: FIPS 140-2(2001-10-10) Runs: 18865
rngtest: FIPS 140-2(2001-10-10) Long run: 18947
rngtest: FIPS 140-2(2001-10-10) Continuous run: 12
rngtest: input channel speed: (min=39.329; avg=8626.930; max=19531250.000)Kibits/s
rngtest: FIPS tests speed: (min=332.192; avg=105801.561; max=114217.836)Kibits/s
rngtest: Program run time: 155670833366 microseconds
Considering it calls itself "rngtest", at first sight there are a shocking number of "failures". Over 63,568,373 "tests", 50508 "failed". Is something wrong with Whirlygig? I went and studied the rngtest sources to figure out what it was actually doing.

FIPS 140-2

rngtest is based on a document from NIST which goes into detail about assessing random output. It's based on 2500-byte blocks of random data which have various tests applied to them. But since the source is meant to be truly random, what does it mean to "test" the packet? Any bit pattern can come in there, each is equally likely as any other, including a whole packet of 0 or 1. How can some be considered "bad"? Actually a "bad" packet cannot be considered "bad" in isolation. Instead you have to look to the spread of packets meeting and "failing" the test criteria against the theoretical probability of their occurrence over time, to see if your random source has one kind of bias or another. An individual "bad" packet can't be said to be bad unless the history of failures is suggesting that there is a bias to generate these bad packets. Unfortunately, I could not find any documentation about rngtest that explained the expected rate of failures from a genuinely random source. I managed to calculate two of the five.

Monobit

monobit is just looking for a 50% distribution of 1s in each 20000 bit packet. If a packet comes with 275 more 1s than 0s or 275 more 0s than 1s, then it's a fail. Obviously a packet with 1 or 10 extra bits is highly probable. I found out that these should follow a "normal distribution", but I was unable to calculate where on the curve "275 more or less 1s" should fall -- it's 0.0275 skew on the expected figure of 10,000.... if anyone can help me it would be most welcome. In our case, the observed probability of a monobit packet from my Whirlygig was 0.000103, or 1:9690.

Poker

Poker is just looking at the distribution of nybbles It takes each byte as two 4-bit nybbles, and for each of the 5000 nybbles in the test packet, maintains a count of occurrences of 0 - 0xf. These counts are squared and then compared to two constants, greater than 1576928 or less than 1563176 for any nybble value gets you a fail. Again I have no idea how to calculate the theoretical probability of a "failure" here, but our observed probability is 0.000101 or 1:9864.

Run

A run is a series of "all 1s" or "all 0s". rngtest is counting how many times it sees a run of length 1 through 6 (and run longer than 6 bits is counted as being six bits). The result for each count of run length occurrences is then compared against a magic table: 1-bit: 2315 < run < 2685 2-bit: 1114 < run < 1386 3-bit: 527 < run < 723 4-bit: 240 < run < 384 5-bit: 103 < run < 209 6-bit: 103 < run < 209 (sic) Once again I couldn't find any estimate of probability of failing this test with a true random source. Our observed probability of failing it was 0.000296 or 1:3369.

Long run

For rngtest a "long run" is seeing 26 or more bits the same level at once. For any 26 bits, the chance of seeing a 26-bit run exactly is 2 in 2^26, or once every 32Mbits (there are two chances because it can be 0x3ffffff or 0x000000). However, to start the run it's also a requirement that the previous bit is the opposite level, so it's 2 in 2^27 chance, or 1 in 1^26 overall, 1.49 x 10^-8. For a 20000-bit test packet, that's 0.000298 or 1:3355 chance per packet. We observed 18947 of these out of 63,568,373 test packets, it's exactly matching the theoretical chance of 0.000298 or 1:3355.

Continuous Run

A "continuous run" is just seeing the same 32-bit pattern twice in a row, considering 32-bit boundaries. For every 32-bits generated, there's a 1 : 2^32 chance that it matches the previous one (without having to know what that was). So the theoretical probability of these "failures" is <number of bits> / 32 / 4G, for 1.27TB in our sample it comes to 9.5. We observed 12. So this doesn't seem unreasonable. So overall after studying each test, it's clear that a random source must fail rngtest with specific probabilities for each test. In no way is a "failure" on the rngtest tests in itself indicating a problem with the random source. But if your source does not cause the right amount of failures over time, that is indicating a problem with your source. It seems wrongheaded then that rngd will reject individual packets that "fail" the rngtext / FIPS140 tests.

Dieharder with a vengence

Next I ran the current dieharder suite again, this is from the latest RPMs on Rober G Brown's site http://www.phy.duke.edu/~rgb/General/dieharder.php. I started running it directly hooked up to the RNG device /dew/hwrng, but then I realized that since a lot of the tests are looking for lagged correlation, in fact I needed to give it a file that it could meaningfully rewind into. So I generated a 12GByte random file and fed it to dieharder -a (run all the tests). This got us the following summary (grepped just for the decision)
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for Diehard Birthdays Test
Assessment: PASSED at > 5% for Diehard 32x32 Binary Rank Test
Assessment: PASSED at > 5% for Diehard 6x8 Binary Rank Test
Assessment: PASSED at > 5% for Diehard Bitstream Test
Assessment: PASSED at > 5% for Diehard OPSO
Assessment: PASSED at > 5% for Diehard OQSO Test
Assessment: PASSED at > 5% for Diehard DNA Test
Assessment: PASSED at > 5% for Diehard Count the 1s (stream) Test
Assessment: PASSED at > 5% for Diehard Count the 1s Test (byte)
Assessment: PASSED at > 5% for Diehard Parking Lot Test
Assessment: PASSED at > 5% for Diehard Minimum Distance (2d Circle) Test
Assessment: PASSED at > 5% for Diehard 3d Sphere (Minimum Distance) Test
Assessment: PASSED at > 5% for Diehard Squeeze Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: POSSIBLY WEAK at < 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for STS Monobit Test
Assessment: PASSED at > 5% for STS Runs Test
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: POSSIBLY WEAK at < 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: POOR at < 1% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for Lagged Sum Test
No way! Two "possibly weak" and one "poor". I read the manpage for dieharder and got the advice from there to run the tests more times, because if the data is bad, feeding it more skewed badness will make the failing distribution of p-values "unambiguous". Dieharder has a default of 10,000 tests, I cranked it up to 20,000 and ran them all again on the same 12GByte sample.
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: POSSIBLY WEAK at < 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for Diehard Birthdays Test
Assessment: PASSED at > 5% for Diehard 32x32 Binary Rank Test
Assessment: PASSED at > 5% for Diehard 6x8 Binary Rank Test
Assessment: PASSED at > 5% for Diehard Bitstream Test
Assessment: PASSED at > 5% for Diehard OPSO
Assessment: PASSED at > 5% for Diehard OQSO Test
Assessment: PASSED at > 5% for Diehard DNA Test
Assessment: PASSED at > 5% for Diehard Count the 1s (stream) Test
Assessment: PASSED at > 5% for Diehard Count the 1s Test (byte)
Assessment: PASSED at > 5% for Diehard Parking Lot Test
Assessment: PASSED at > 5% for Diehard Minimum Distance (2d Circle) Test
Assessment: PASSED at > 5% for Diehard 3d Sphere (Minimum Distance) Test
Assessment: PASSED at > 5% for Diehard Squeeze Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: PASSED at > 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for STS Monobit Test
Assessment: PASSED at > 5% for STS Runs Test
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for Lagged Sum Test
So the "poor" and "possibly weak" guys became happy when we doubled the number of tests, and there's a new "possibly weak" guy. But when I looked up the new guy's p-value, it was only 0.02888045, which is 1 in 34 chance, it doesn't seem that improbable (real dieharder failures tend to look like 0.00000001 or 0.99999998 an should look more like that the more tests you run).

Conclusion

So far as I can tell these results are good. If anyone has enough math power to calculate the theoretical distribution of the rngtest monobit, poker and run rngtests I would be very grateful, so I can compare all the numbers. On the two I was able to calculate, we seem to be very close. Dieharder seemed happy with double the tests and the one test it flagged then only had a probability of 1:34 which is not unreasonable.