<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Way of the exploding head &#187; Hardware design</title>
	<atom:link href="http://warmcat.com/_wp/category/hardware-design/feed/" rel="self" type="application/rss+xml" />
	<link>http://warmcat.com/_wp</link>
	<description>Embedded and desktop Linux</description>
	<lastBuildDate>Sun, 06 Mar 2011 14:59:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.4</generator>
		<item>
		<title>Don&#8217;t let Production Test Be Special</title>
		<link>http://warmcat.com/_wp/2010/02/12/dont-let-production-test-be-special/</link>
		<comments>http://warmcat.com/_wp/2010/02/12/dont-let-production-test-be-special/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 23:49:41 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Embedded Linux]]></category>
		<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Openmoko Lessons]]></category>
		<category><![CDATA[Software design]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/?p=75</guid>
		<description><![CDATA[Lesson 3: Test is not special Commonly in embedded work test is the &#8220;red-haired stepchild&#8221;, nobody wants to take care of it and by common, silent consent it is always left until last.  Eventually the need for a test plan becomes overwhelming as the date to go to the factory nears, and the task is [...]]]></description>
			<content:encoded><![CDATA[<h2>Lesson 3: Test is not special</h2>
<p>Commonly in embedded work test is the &#8220;red-haired stepchild&#8221;, nobody wants to take care of it and by common, silent consent it is always left until last.  Eventually the need for a test plan becomes overwhelming as the date to go to the factory nears, and the task is assigned to the most junior engineers available, since everybody knows that test is the death knell of your career.</p>
<p>Coming cold to and excluded from being inside an already-existing project, the engineers try to create some kind of test coverage the best way they can.  At openmoko two giant test suites were created, DM1 and DM2, written by people who were learning C for the first time.  I got the job of modernizing this code so I know from experience the code was already truly terrible and bitrotted at an alarming rate.  However I had to admire the guys who wrote it, with everything against them and little experience they did manage to create something that did provide test coverage at the factory, however much it was on life-support.</p>
<h2>Totentanz</h2>
<p>Similarly, Openmoko used production test jigs, special additional PCBs that formed a kind of custom test environment for the PCB under test.  At one version of GTA03 there were so many test points added it was a serious concern that the board would break down under the overall pressure needed to mate the spring-loaded test probes to the test points.</p>
<p>Jigs and test points have an obvious advantage in terms of test throughput, but there are some big disadvantages.</p>
<p>First, you have to design and build the jig, and track changes to the actual device with it.  This effort is completely disconnected from moving your actual product on, except that it&#8217;s meant to help in production.</p>
<p>Second, test points don&#8217;t test your connectors; the test point may be connected OK but not the connector pin the user actually accesses.</p>
<p>Third, you need something else outside the device to assess what is happening on the test points, the code for that also has to be written and maintained against changes in the actual product.  It also means that it&#8217;s not possible for the tests to be casually performed outside the factory, or maybe by the original engineers if they have access to the ATE gear themselves.</p>
<h2>Pain into torture</h2>
<p>Additionally the bringup of GTA02 required special versions of U-Boot and kernel which had added &#8220;test magic&#8221; created by the test guys and unknown to anyone else.  These versions were seldom uplevelled.</p>
<p>Since GTA02 had raw NAND, it needed filling up at the factory with the rootfs.  The way to do this was via a very fragile OpenOCD using a custom USB &#8211; serial based device that was bitbanged.  It only worked with certain versions of the usb library needed to talk to it.</p>
<p>All of these quirks and requirements at the factory made production runs difficult and expensive to get right.</p>
<h2>I only hurt you because I love you</h2>
<p>I spent a lot of time thinking about how to avoid this end result next time I would design something.  The mistakes started in having anything special for test I concluded.  The jig: special, and so evil.  Test kernels or bootloader: special -&gt; evil.  Test rootfs -&gt; Evil.  test software, like Openmoko&#8217;s DM1 and DM2, evil.  The device should naturally be able to test itself with the arrangements that already exist inside it to operate at all.</p>
<p>The answer to the problem of &#8220;production test&#8221; is to completely subsume it into the rest of the design.  So it is the responsibility of Linux drivers to provide enough functionality by probe errors, or sysfs features, that one can perform test and diagnosis.  The &#8220;test suite&#8221; should boil down to a bash script that is using features exposed in a normal shipping rootfs and kernel.  Bash is ideal because most of the test action will be calling existing commandline tools like ifconfig, ping, l2ping and grepping or looking at their return code, this is what bash is best at.  It&#8217;s also easily understood and edited by anyone who has worked with Linux for a while.</p>
<p>The bootloader is required for test in only one capacity, it is the only part of the system that is capable to run the SDRAM tests; once you enter Linux you can&#8217;t perform a full SDRAM test any more.  But even that should be done by the one shipping bootloader image.</p>
<p>In many cases, device interfaces can be tested by external loopback connectors, this proves connectivity through the connectors and it leaves open the possibility of end-users being able to run the same tests on the shipping rootfs.</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2010/02/12/dont-let-production-test-be-special/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Whirlygig GPL&#8217;d HWRNG</title>
		<link>http://warmcat.com/_wp/2007/11/24/whirlygig-gpld-hwrng/</link>
		<comments>http://warmcat.com/_wp/2007/11/24/whirlygig-gpld-hwrng/#comments</comments>
		<pubDate>Sat, 24 Nov 2007 10:45:44 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Linux peripherals]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/2007/11/24/whirlygig-gpld-hwrng/</guid>
		<description><![CDATA[Hardware random for the masses I made available the result of the ring oscillator random generator as a GPL project called Whirlygig. It&#8217;s a 2.75cm x 4cm PCB with a mini USB connector, it provides a sustained 5.5Mbps (~620KBytes/sec) of apparently very high quality random bits using the Linux hw_random API. The large amount of [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/whirlygig-logo.png" align=left hspace=5></p>
<h3>Hardware random for the masses</h3>
<p>I made available the result of the ring oscillator random generator as a GPL project <a href="http://warmcat.com/_wp/whirlygig-rng/">called Whirlygig</a>.  It&#8217;s a 2.75cm x 4cm PCB with a mini USB connector, it provides a sustained 5.5Mbps (~620KBytes/sec) of apparently very high quality random bits using the Linux hw_random API.  The large amount of randomness should make it useful for statistical tests as well as hard crypto.</p>
<p>I prototyped it using a couple of boards I had lying around, so I know it works fine, but I am waiting for the PCBs to come back from fabrication to actually build a final one.  I placed the CPLD VHDL, the board hardware design, the driver software and the firmware for the USB controller into <a href="http://git.warmcat.com">http://git.warmcat.com</a>.</p>
<h3>Dieharder</h3>
<p>I spent some time worrying about how to test the quality of the result &#8212; I found that &#8220;diehard&#8221; mentioned in an earlier post has been superceded by <a href="http://www.phy.duke.edu/~rgb/General/dieharder.php">&#8220;dieharder&#8221;</a>.  This has a much tougher general testing regime, even though many of its test are reproductions of the diehard ones &#8212; it runs each test many times and forms histograms of the p-value results from the many runs, and gives an assessment of fail, poor, possibly weak or pass on the spread of results rather than a single result.</p>
<p>At first the RNG failed three of the 18 tests, but on looking closer one of the tests (#2) currently fails for all RNG input and is marked up as not for use with assessing RNG quality, and the two others required by default more than the 400MBytes of randomness I had prepared.  Unfortunately in that case they simply rewind the randomness file and re-use the same data to make up the balance!  Of course this is no longer quite &#8220;random&#8221;.  When I adjusted those two tests to use a smaller sample that fitted into the 400MBytes without repetition, the output of the RNG get a &#8220;pass&#8221; on all 17 of the relevant dieharder suite tests.</p>
<h3>Max Entropy</h3>
<p>During the validation phase I changed the RNG algorithm in the CPLD significantly.  The scheme is described on the project page, but basically I moved away from a bit-centric to a byte-centric design with 8 identical sets of 3 oscillators.  To stop any characteristic of a particular oscillator&#8217;s routing from being associated with a particular bit of the result byte and creating a bias, I introduced a &#8220;mixer&#8221; that first generates 8 random bits by combining six oscillator outputs each with XOR, then rotates these oscillator sets between the result bits sequentially at 24MHz.  I also removed the toggling action and used the random bit directly.</p>
<p>I also found the Linux rng-tools suite which repeatedly runs FIPS-140-2 tests on the bits, this fails 1 in 1200 or so packets of testing over 20 billion bits, I believe this is normal for a real random generator that it will produce sequences with low probability that don&#8217;t look very random in the short term.</p>
<p>Aside from passing dieharder and FIPS-140-2, the changes also got me a reported 8.000000 bits of entropy per byte from the ENT test, so there are reasons to imagine the quality of the output is very good.</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/11/24/whirlygig-gpld-hwrng/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FIPS-140-2 and ENT validation vs ring RNG</title>
		<link>http://warmcat.com/_wp/2007/11/15/fips-140-2-and-ent-validation-vs-ring-rng/</link>
		<comments>http://warmcat.com/_wp/2007/11/15/fips-140-2-and-ent-validation-vs-ring-rng/#comments</comments>
		<pubDate>Thu, 15 Nov 2007 09:20:13 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Embedded Linux]]></category>
		<category><![CDATA[Hardware design]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/2007/11/15/fips-140-2-and-ent-validation-vs-ring-rng/</guid>
		<description><![CDATA[NIST lists some more test suites. NIST also have their own suite, but it is now Windows-only, and lacks a necessary DLL to run there. The last UNIX version segfaulted here before giving any results&#8230; sigh. I ran the last 10MByte sample against ENT and TestU01&#8230; to cut a long story short $ ./ent ../die.c/dump3 [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://csrc.nist.gov/groups/ST/toolkit/rng/batteries_stats_test.html">NIST</a> lists some more test suites.  NIST also have their own suite, but it is now Windows-only, and lacks a necessary DLL to run there.  The last UNIX version segfaulted here before giving any results&#8230; sigh.  </p>
<p>I ran the last 10MByte sample against <a href="http://www.fourmilab.ch/random/">ENT</a> and <a href="http://www.iro.umontreal.ca/~simardr/testu01/TestU01.zip">TestU01</a>&#8230; to cut a long story short</p>
<blockquote><p><font size=-2>$ ./ent ../die.c/dump3<br />
Entropy = 7.999980 bits per byte.</p>
<p>Optimum compression would reduce the size<br />
of this 10002432 byte file by 0 percent.</p>
<p>Chi square distribution for 10002432 samples is 281.26, and randomly<br />
would exceed this value 25.00 percent of the times.</p>
<p>Arithmetic mean value of data bytes is 127.4958 (127.5 = random).<br />
Monte Carlo value for Pi is 3.140111525 (error 0.05 percent).<br />
Serial correlation coefficient is -0.000212 (totally uncorrelated = 0.0).</font></p></blockquote>
<p>7.9999 bits of entropy per byte!  TestU01 is less turnkey than the other suites &#8212; it&#8217;s literally a test library with some example code.  I amended an example to call the FIPS-140-2 tests:</p>
<blockquote><pre><font size=-2>============== Summary results of FIPS-140-2 ==============

 File:             dump3
 Number of bits:   20000

       Test          s-value        p-value    FIPS Decision
 --------------------------------------------------------
 Monobit               9933           0.83       Pass
 Poker                11.88           0.69       Pass

 0 Runs, length 1:     2482                      Pass
 0 Runs, length 2:     1227                      Pass
 0 Runs, length 3:      630                      Pass
 0 Runs, length 4:      319                      Pass
 0 Runs, length 5:      161                      Pass
 0 Runs, length 6+:     166                      Pass

 1 Runs, length 1:     2466                      Pass
 1 Runs, length 2:     1302                      Pass
 1 Runs, length 3:      620                      Pass
 1 Runs, length 4:      311                      Pass
 1 Runs, length 5:      140                      Pass
 1 Runs, length 6+:     146                      Pass

 Longest run of 0:       16           0.14       Pass
 Longest run of 1:       14           0.46       Pass
 ----------------------------------------------------------
 All values are within the required intervals of FIPS-140-2</font></pre>
</blockquote>
<p>So the design&#8217;s output is compliant to FIPS-140-2, a requirement for many uses.</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/11/15/fips-140-2-and-ent-validation-vs-ring-rng/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Diehard validation vs ring RNG</title>
		<link>http://warmcat.com/_wp/2007/11/14/diehard-validation-vs-ring-rng/</link>
		<comments>http://warmcat.com/_wp/2007/11/14/diehard-validation-vs-ring-rng/#comments</comments>
		<pubDate>Wed, 14 Nov 2007 15:56:03 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Linux peripherals]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/2007/11/14/diehard-validation-vs-ring-rng/</guid>
		<description><![CDATA[RNG Quality assessment A timely article flew by on Reddit about the RANDU pseudo-random generator algorithm widely used in the 1960s, which it turns out was very flawed indeed. It was explained to one student that &#8221;We guarantee that each number is random individually, but we donâ€™t guarantee that more than one of them is [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/catbowl1.png" align=left hspace=5></p>
<h3>RNG Quality assessment</h3>
<p>A timely article flew by on Reddit about the <a href="http://en.wikipedia.org/wiki/RANDU">RANDU</a> pseudo-random generator algorithm widely used in the 1960s, which it turns out was very flawed indeed.  It was explained to one student that &#8221;We guarantee that each number is random individually, but we donâ€™t guarantee that more than one of them is random&#8221;.  Basically it produced numbers that belonged to one of 15 &#8220;planar&#8221; groupings and nothing in the gaps between the planes.  It isn&#8217;t just a minor annoyance, because many statistical studies in the 60s and 70s used it, and it can easily have contaminated their results.  That&#8217;s definitely not what I am trying to reproduce with the ring oscillator device &#8212; so how can I figure out how &#8220;good&#8221; the randomness is in an objective way?</p>
<h3>RNG quality test suites</h3>
<p>It turns out that empirically testing RNG outputs has been the subject of a lot of work for decades, and there are some established testing suites available online.  A major one seems to be the &#8220;<a href="http://stat.fsu.edu/pub/diehard/">diehard</a>&#8221; suite &#8212; I guess it is a pun on die as the plural of dice.</p>
<p>It needs you to fetch 10M bytes of random numbers or more and let it run a bunch of tests on them.  The output was a little hard to assess initially: most tests issue a &#8220;p&#8221; number which only suggests something is bad if it is 0.000&#8230; OR 0.999&#8230;.  All other numbers inbetween are to be taken as a good result as I understood it.  Except there is a warning that even good RNGs can produce the occasional test fail.</p>
<blockquote><p> Thus you should not be surprised with  occasional p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p`s of 0 or 1 to six or more places.  By all means, do not, as a Statistician might, think that a p < .025 or p> .975 means that the RNG has &#8220;failed the test at the .05 level&#8221;.  Such p`s happen among the hundreds that DIEHARD produces, even with good RNGs.  So keep in mind that &#8220;p happens&#8221;</p></blockquote>
<p>I duly fetched 10M bytes of 115kbps randomness from the device and fed it to diehard.  It seemed to give fine results except on &#8220;Count the 1s stream&#8221; and &#8220;Squeeze&#8221; (devastating p=0.000000), &#8220;Count the 1s specific&#8221; for bits 1-11 (p=0.000030) and 9-16 (p=0.000064), and QQSO 2-6 (p=0.000005).  It passed the dozens of other tests but it was disappointing, looks like a big fat &#8216;failed&#8217;.</p>
<h3>Triple Scoop</h3>
<p>Well, since my test CPLD was an XC95288XL with 288 Macrocells to burn, I naturally wondered if I could improve matters by tripling the amount of ring oscillators getting Xor-ed &#8212; that is to implement the three varying sized oscillators 3 times each, totaling nine, and sum them with a big XOR.  They&#8217;ll all be drifting around individually as much as together, it should be a mighty noise-fest.</p>
<p>I edited the VHDL and blew it into the CPLD&#8230; visually the summed RNG output &#8220;bit&#8221; was an awful lot more noisy than before.   I pulled another 10M bytes from that setup: but just looking at the byte distribution as I did before told me something is still up.</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-2.png"></p>
<p>That sawtooth type distribution is &#8220;not random&#8221; to coin a phrase.  If you look at the large jump at 0&#215;80 (128) it is telling us that we are more likely to get 1000000 binary than we are to get 01111111, in other words, since this is over 10M bytes, there is a distribution problem favouring &#8217;0&#8242;.  When I analyze the distributions of 1s and 0s I find</p>
<table>
<tr>
<td>
<pre>0: 40436204, 1: 39563804... delta=872400, skew=1.090500%</pre>
</td>
</tr>
</table>
<p>You can see the same thing even better looking at 0&#215;00 (42,000 hits) vs 0xFF (36,000 hits), they are like 8% off the median of 39,000.  Clearly that distribution of 1s and 0s has to have a very small skew to stop these kinds of effects showing up, and equally clearly this is telling us something deep about the RNG hardware.</p>
<h3>Spiky</h3>
<p>Although the individual oscillators are quite slow thanks to the number of inverter stages, at 4 &#8211; 6MHz, the way they are being summed makes for trouble from bandwidth limitations inside the CPLD.  At the moment it just uses a dumb asynchronous XOR action, that means that potentially very fast spikes can be seen when one &#8220;slow&#8221; oscillator changes state very shortly after another &#8220;slow&#8221; oscillator.  For example:</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0002tek.jpg"></p>
<p>You can see on the left (this is 5ns/div notice) a runt pulse where this happened, the XOR was convinced to rise by one oscillator changing and then countermanded when another oscillator changed state less than 5ns later, resulting in a doubtful pulse that was probably not visible as a &#8217;1&#8242;.  This also happens when going from &#8217;1&#8242; to &#8217;0&#8242;, but maybe the threshold for the transistors in the CPLD is not at exactly 50% of the 3.3V supply.  So we suddenly have it seeing more &#8217;0&#8242;s than &#8217;1&#8242;s on average when spikes are involved.</p>
<p>This whole high bandwidth summing step is completely needless, it&#8217;s only there because it is a literal interpretation of the diagram in the original RFC.  I changed it instead to have nine latches sample the nine oscillators every 125ns (there is an 8MHz clock on the prototype board) and sum those results with XORs into a single bit.  In turn this output is sampled by another latch at 8MHz to hide any metastability.</p>
<h3>Latched up</h3>
<p>The latched summing version performs much better and has gotten rid of most of the bit skew, and the sawtooth behaviour:</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-3.png"></p>
<p>&#8230;but there is still a problem with 0&#215;00&#8230;. the bit skew looks like this</p>
<table>
<tr>
<td>
<pre>0: 39960076, 1: 40039932... delta=79856, skew=0.099820%</pre>
</td>
</tr>
</table>
<p>so the skew is now on the side of &#8217;1&#8242;s but only by 0.1%.  You can see the byte count spread is much tighter than before too &#8212; 1800 instead of 6000 counts before.</p>
<h3>Balancing out the skew</h3>
<p>Well if the remaining skew is something to do with the ratio of rise to fall times, or the non-squareness of the oscillator outputs for some other reason by something as low as 0.1%, that is hard to do much about, especially as it may vary on the specific silicon die.</p>
<p>But it shouldn&#8217;t matter &#8212; now the bandwidth situation at the XOR summer is sane, if we invert the summed output 50% of the time it should spread any excess on &#8217;1&#8242;s or &#8217;0&#8242;s to the opposite as well, cancelling any bias.  I added a couple of terms to the summer to xor against the UART bit index LSB and a bit which toggles after every byte sent by the UART.  It&#8217;s the equivalent of xor with 0&#215;55 for the first byte and then 0xAA for the second byte, over and over.</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-4.png"></p>
<p>That glitch in the middle is actually at 134 (0&#215;86), maybe it is random but I guess we will see&#8230;. the skew is further reduced as anticipated</p>
<table>
<tr>
<td>
<pre>0: 39974218, 1: 40025790... delta=51572, skew=0.064465%</pre>
</td>
</tr>
</table>
<h3>Diehard sequel</h3>
<p>I ran 10M bytes from this version through Diehard again&#8230; the really bad p-value results are gone.  For example Squeeze was a deadly 0.000000 before and is now 0.255260.</p>
<p>I made one last adjustment, I added the current state of the latched random value to the XOR term.  That means it decides whether to keep or invert the latched value, it no longer directly accepts the value from the RNG.  This got me to the promised land: 0.0005% skew between &#8217;1&#8242; and &#8217;0&#8242;.</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-5.png"></p>
<table>
<tr>
<td>
<pre>0: 40000206, 1: 39999802... delta=404, skew=0.000505%</pre>
</td>
</tr>
</table>
<p>This also gets me the apparently good diehard results with no obvious failures on any tests, you can see the actual results <a href="/diehard.txt">here</a>.  So it seems the current version can tentatively be called a &#8220;real RNG&#8221;. </p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/11/14/diehard-validation-vs-ring-rng/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ring oscillator RNG performance</title>
		<link>http://warmcat.com/_wp/2007/11/12/ring-oscillator-rng-performance/</link>
		<comments>http://warmcat.com/_wp/2007/11/12/ring-oscillator-rng-performance/#comments</comments>
		<pubDate>Mon, 12 Nov 2007 01:33:12 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Linux peripherals]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/2007/11/12/ring-oscillator-rng-performance/</guid>
		<description><![CDATA[Pretty random After some scrabbling around porting my Jtag SVF interpreter to Octotux and creating a kernel module for the PIO end of it &#8212; and moving to a different board with a XC95288XL CPLD to prototype it, the triple ring oscillator RNG is working. It issues a 9600 baud result, but after some initial [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/dawg.png" align=left hspace=5></p>
<h3>Pretty random</h3>
<p>After some scrabbling around porting my Jtag SVF interpreter to Octotux and creating a kernel module for the PIO end of it &#8212; and moving to a different board with a XC95288XL CPLD to prototype it, the triple ring oscillator RNG is working.    It issues a 9600 baud result, but after some initial confusion I modified it 1/8th of the time to sit out a sample time leaving &#8220;break&#8221; on the serial line.  This should make sure that the receiving UART does not get confused by the data as a start bit.  The true data rate is something like 800 random bytes per second at 9600 baud.</p>
<p>Here are the three chains of inverters (19, 23 and 29 long) oscillating at the different fundamentals</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0016tek.jpg" height=263></p>
<p></p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0017tek.jpg" align=center height=263></p>
<p></p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0018tek.jpg" align=center height=263></p>
<p>&#8230; and here is what the xor summing looks like, first over 1s then sampled once.</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0019tek.jpg" align=center height=263></p>
<p></p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0020tek.jpg" align=center height=263></p>
<p>Although the single shot sample doesn&#8217;t look very random, the oscillators are drifting around all the time.  If you wait a little while between samples (currently it is 104us, a 9600 baud bit-period) it&#8217;s pretty hard to guess what phase all the oscillators have drifted to &#8212; at least, that&#8217;s the plan.</p>
<h3>Distribution of binary levels</h3>
<p>The first test I did was to see what the distribution of &#8217;1&#8242; and &#8217;0&#8242; in the results was&#8230; clearly if the device is really random it should on average be 50% each.  I fetched 1M random bytes, or 8Mbits:</p>
<table align=center>
<tr>
<td>0: 4008913, 1: 3991095&#8230; delta=17818, skew=0.222725%</td>
</tr>
</table>
<p>Its okay for a really random source to deviate to 50:50 at any given time, although on average it should be 50:50.</p>
<h3>Octet distribution</h3>
<p>Next I looked at the distribution of the results from 0&#215;00 through 0xFF as the result &#8220;random byte&#8221;.  This would show up if the RNG fails to ever issue some result or favours certain results over others &#8212; every result should on average have an equal chance of showing up and so an equal count.  I ran it for 1M random bytes&#8230;</p>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/rng-dist-1.png" align=center></p>
<p>This is pretty decent, every possible result is seen with a frequency within +/-200 counts of the 3,900 average after 1M bytes.</p>
<h3>115200 baud results</h3>
<p>Encouraged by this I cranked the baud rate up to 115220 or 8.68us between samples and around 10K random bytes per second.  The skew is increased somewhat and the spread of result counts is increased a little.</p>
<table align=center>
<tr>
<td>0: 4028746, 1: 3971262&#8230; delta=57484, skew=0.718549%</td>
</tr>
</table>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/rng-dist-2.png" align=center></p>
<p>So far so good!</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/11/12/ring-oscillator-rng-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adding entropy to /dev/random</title>
		<link>http://warmcat.com/_wp/2007/11/07/adding-entropy-to-devrandom/</link>
		<comments>http://warmcat.com/_wp/2007/11/07/adding-entropy-to-devrandom/#comments</comments>
		<pubDate>Wed, 07 Nov 2007 11:13:54 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Linux peripherals]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/2007/11/07/adding-entropy-to-devrandom/</guid>
		<description><![CDATA[A hard RNG is good to find The recent statistical analysis for drumbeat reminded me I could do with a proper source of random numbers, not generated by a pseudorandom feedback action. Back in the early 1990s I was looking at statistical profiling of execution on microcontrollers, I was surprised then to discover that only [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/buffalo.png" align=left hspace=5></p>
<h3>A hard RNG is good to find</h3>
<p>The recent statistical analysis for drumbeat reminded me I could do with a proper source of random numbers, not generated by a pseudorandom feedback action.  Back in the early 1990s I was looking at statistical profiling of execution on microcontrollers, I was surprised then to discover that only by making the sampling period random could I get a true picture of execution distribution.  If the address bus was sampled at a fixed rate, say 100kHz, instead of a true picture it would be distorted by activity that was happening at some fraction or harmonic of the sampling frequency.  So you would alias out pieces of loops completely or get a bloated count for other areas.  Only by true randomness in the sampling timing could you see the reality &#8212; a paradox.</p>
<h3>Analogue RNG methodologies</h3>
<p>A Google or two around showed that most of the techniques are analogue one way or the other.  Many of the methods suffer from a problematic need to amplify some very tiny source of noise, a Zener diode or avalanche transistor junction, by really huge amounts, 90dB or more.  There are a couple of suppliers of RF &#8220;noise diodes&#8221; with flat spectra across a wide frequency range, but they are hard to source.</p>
<h3>Digital non-pseudorandom technique</h3>
<p>However there is one technique which while still relying on analogue noise is basically digital &#8212; to run multiple chains of unlocked inverting oscillators and xor the outputs.  The unlocked oscillators have no reference at all, they&#8217;re basically an inverter fed back on its own input &#8212; in fact a chain of inverters.  Such a circuit oscillates according to the period of the total delay through the inverter chain&#8230; and that is highly sensitive to temperature.  Normally with synchronous digital design we choose a clock rate for a circuit that is just below the maximum possible at the worst temperature it is expected to operate at &#8212; and after that we can forget about temperature.  But with this asynchronous unlocked oscillator concept, the micro- and macro- temperature dependence is revealed in all its freaky glory, causing the oscillation to drift unpredictably slightly every cycle and over larger period with gross temperature fluctuations.</p>
<h3>RFC4086</h3>
<p><a href="http://tools.ietf.org/html/rfc4086">RFC4086</a> mentions a recommendation for a RNG based on unlocked inverter chains that is found in IEEE 802.11i.</p>
<blockquote><pre>
             |\     |\                |\
         +-->| >0-->| >0-- 19 total --| >0--+-------+
         |   |/     |/                |/    |       |
         |                                  |       |
         +----------------------------------+       V
                                                 +-----+
             |\     |\                |\         |     | output
         +-->| >0-->| >0-- 23 total --| >0--+--->| XOR |------>
         |   |/     |/                |/    |    |     |
         |                                  |    +-----+
         +----------------------------------+      ^ ^
                                                   | |
             |\     |\                |\           | |
         +-->| >0-->| >0-- 29 total --| >0--+------+ |
         |   |/     |/                |/    |        |
         |                                  |        |
         +----------------------------------+        |
                                                     |
             Other randomness, if available ---------+</pre>
</blockquote>
<p>This has three unlocked, wandering oscillator chains of different lengths being summed at an XOR gate.</p>
<h3>Implementing the RFC4086 RNG</h3>
<p>Since it needs 71 inverters, you would need 12 74hc04 or similar, it makes more sense to put it all in one CPLD.  I have an old XC95108 lying around, so I wrote up the design in VHDL and added a UART interface to issue the sampled random data.  This brings up the issue of how quickly it can be sampled and still get high quality randomness&#8230; clearly if we sampled it at 10ps it wouldn&#8217;t be very random at all, since it didn&#8217;t have time to change between samples.  On the other hand if we sampled it at some high multiple of the fastest free-running oscillator period, then there is a lot of opportunity for each oscillator phase to have been affected over the longer time.  By using the UART we can control how often we sample the RNG by the baud rate&#8230; I initially set it to 9600 baud or 104us/sample.  The oscillators should have periods on the order of 150 &#8211; 200ns (5 &#8211; 6MHz), so this is allowing 500+ cycles of jitter to accumulate in each oscillator before the summed sample is taken.</p>
<p>I&#8217;m currently waiting for a programming tool to be delivered so I can program another device to allow programming the XC95108 &#8212; I no longer have any PCs with a printer port I realized yesterday.  I am very interested to see what the performance and quality of the randomness is like!</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/11/07/adding-entropy-to-devrandom/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CE Technical Documentation</title>
		<link>http://warmcat.com/_wp/2007/10/25/ce-technical-documentation/</link>
		<comments>http://warmcat.com/_wp/2007/10/25/ce-technical-documentation/#comments</comments>
		<pubDate>Thu, 25 Oct 2007 18:36:31 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/2007/10/25/ce-technical-documentation/</guid>
		<description><![CDATA[The other week I went on a workshop to learn more about the new 89/336/EEC regulations that came into force in the UK on 20th July 2007. Here are some notes cribbed from my notes, they&#8217;re intended to be an overview: obviously for something this important you should get your own advice. Out with the [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/standards-compliance.png" align=left hspace=5>The other week I went on a workshop to learn more about the new 89/336/EEC regulations that came into force in the UK on 20th July 2007.  Here are some notes cribbed from my notes, they&#8217;re intended to be an overview: obviously for something this important you should get your own advice.</p>
<h3>Out with the old ways&#8230;</h3>
<p>For a long time it has been a requirement to certify that any product you manufacture for sale in the EU meets the &#8220;relevant standards&#8221;, so it can have a &#8220;CE&#8221; mark.  Until this July in the UK you could either do that by:</p>
<ul>
<li>paying for test at a &#8220;competent body&#8221;, a company with a ton of test gear that will empirically test your device against the emission and immunity standards, or</li>
<li>Writing up a Technical Construction File, or TCF, which described the product design in a deep way, and included tests and logic showing why you are compliant</li>
</ul>
<blockquote><h3>Device torture at the House of Pain</h3>
<p>The last design I completed, for a smart 4-channel Analogue telephony device that can hook to the Internet, I went down the empirical test route.  At a total cost of several thousand pounds the production device was tested at a real competent body with calibrated receivers and emitters, blasted with wideband radio signals, zapped with +/- 8kV discharges.  The resulting report gave a very clear okay except on a minor issue to do with the AC power supply we had used.  We also had to do specialized testing for the analogue telephony end, which we again passed, although not until getting a component supplier to make a special that actually complies with the standard.</p>
<p>(Actually walking the device through this testing is a pretty sweaty business, since time is literally money at the test facility and wiggle room in the case of trouble is also in short supply.  In one instance for example I was able to patch the sources in realtime when an issue came up during ESD testing that broke normal operation but wasn&#8217;t enough to trigger the watchdog, turning a fail into a pass.  So I would never send a device unaccompanied for testing, or go to a test house without a laptop with full sources to expect the unexpected.)</p></blockquote>
<h3>In with the new ways&#8230;</h3>
<p>However the demands in the new regulations have changed significantly.  You must now generate &#8220;Technical Documentation&#8221; for any product you will be selling in the EU.  This is basically the old TCF route to compliance, but it doesn&#8217;t itself necessarily remove or even perhaps reduce the need for absolute tests for a given device.</p>
<p>Less well known is that if you are still selling devices first sold before 20 July 2007 come 20 July 2009, you will need to have made a new style Technical Documentation for them, or stop selling them.  A lot of tech products from 2007 will be old hat by 2009 solving the problem, but it is not true in all markets.</p>
<p>Typically the Technical Director of the company is the &#8220;responsible&#8221; as the French say who must sign off that the device meets the standards.  What you are signing off on is that WHEN it is properly installed and maintained, and used for the purpose it is intended for:</p>
<ul>
<li>the device creates an EM disturbance low enough that radio and telecoms equipment can operate as intended</li>
<li>it has a level of intrinsic immunity which is adequate to enable it to operate as intended</li>
</ul>
<h3>So what is done with this &#8220;Technical Documentation&#8221;?</h3>
<p>Nothing if you&#8217;re lucky.  The only people who can ask to look at it are the regulatory authorities, OFCOM in the UK.  You don&#8217;t publish it or register a copy of it.  But you have to keep it for ten years after the last sale of the device for the authorities to ask for.  It literally only exists to keep the signatory out of jail if the authorities ask for it.  Not kidding about the jail &#8212; if you don&#8217;t have a satisfactory Technical Documentation to show, the criminal penalties can include a GBP5,000 fine and/or 3 months in jail.</p>
<p>The key words about the Technical Documentation are that it should be &#8220;reasonable&#8221; and &#8220;duly diligent&#8221;, as in &#8220;All reasonable steps are exercised and all due diligence to avoid committing the offense&#8221;.  <strong>That really sums up the job of writing it, you are trying to have an answer for anything that could be said was unreasonable or not duly diligent.</strong>  While meeting budget constraints from the customer :-/</p>
<h3>Spread of outcomes</h3>
<p>The ways that problems might pan out were discussed informally.  It was proposed that roughly a third of companies, the presenter reckoned, have their head totally in the sand about it, and could expect trouble.  Another third had made some effort in the right direction and another third spent the money and were golden.  Another factor in how much shit would rain down in the event of problems was the number of devices sold, if it was millions and they were crap, expect maximum warp to jail.  If it is five and they don&#8217;t quite comply despite obvious efforts to prove it, maybe that won&#8217;t be so bad.  But who knows, some overkill is called for.</p>
<blockquote><h3>How likely is my Technical Documentation to be demanded?</h3>
<p>In Germany, we were told, the authorities have a system of testing 10,000 models of devices a year, spread over the various types of product.  In the first year (IIRC) it resulted in 105 prosecutions :-/</p>
<p>Another tidbit is in the UK, OFCOM are allegedly looking at training up 85 new enforcement officers.  The mobile phone companies, due to the ruinously expensive spectrum auctions of a few years ago, are apparently agitating for more enforcement of the cleanliness of their expensive 3G spectrum.</p></blockquote>
<h3>What goes in the Technical Documentation then?</h3>
<p>Here is the briefest outline:</p>
<ul>
<li>Description of apparatus &#8211; brand/model/manufacturer, intended function, limitations on operation&#8230; Technical description &#8211; block diagram, technical drawings, interconnections, variations, versions of design documents referenced</li>
<li>Procedures used to ensure conformity &#8211; Technical Rationale: what you&#8217;re testing against, why you did particular tests; Details of design: EMC features, component specifications, QA to control variation; Test data: Logical processes to decide if the tests are adequate, EMC tests and their results, external test reports on subassemblies/components</li>
</ul>
<p>You can also get a Competent Body to &#8220;comment&#8221; on your Technical Documentation, as some fairly convincing assurance that it is adequate.  This is really a seal on the &#8220;due diligence&#8221; aspect so you can really show you totally ticked every box to make sure it was compliant, but I guess only large companies can afford it.</p>
<h3>Conclusion</h3>
<p>If you manufacture or import stuff to sell in the EU, you are going to have to have Technical Documentation to keep yourself out of jail.  </p>
<p>For a standalone device, that means you&#8217;re really going to have to not only look to dealing with EMC early in the design, with some kind of inhouse testing ability, but find the budget of a few thousand pounds to take it for testing at a Competent Body so you have something convincing and calibrated to put into that Technical Documentation.  </p>
<p>Not only that, but even determining which are the applicable standards is a huge headache if you try to do it yourself, there are hundreds of them: a Competent body can also help select the basic issue of which tests you are targeting.</p>
<p>But it&#8217;s not all bad &#8212; if you make product variations around the same base, you can choose which variation to actually test as a baseline, and then for each variation see if it stands up to show they would not push the original base design over the edge.  I have done this in the last couple of weeks, creating for a customer Technical Documentation for a sister device to one that went through actual testing at a Competent Body, and using the very large similarities to limit the amount of retesting needed.</p>
<p>There are definite advantages to requiring this level of design scrutiny and justification, but the change to requiring Technical Documentation and the trend to increased enforcement over the ten years you must keep the documentation has definitely pushed the minimum cost and effort of bringing something to market up several notches.</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/10/25/ce-technical-documentation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Magic correlator code analysis</title>
		<link>http://warmcat.com/_wp/2007/09/12/magic-correlator-code-analysis/</link>
		<comments>http://warmcat.com/_wp/2007/09/12/magic-correlator-code-analysis/#comments</comments>
		<pubDate>Wed, 12 Sep 2007 19:47:09 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Magic Correlator]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/?p=39</guid>
		<description><![CDATA[Intrigued by the magic correlator possibilities, I wrote some code to simulate a proper worst-case Monte Carlo analysis of the performance vs noise, with fascinating results. (Although I tried to choose reasonably large number of random runs considering the CPU time needed, please bear in mind the numbers identified in the rest of this are [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/waitress.png" align=left hspace=5>Intrigued by the magic correlator possibilities, I wrote some code to simulate a proper worst-case Monte Carlo analysis of the performance vs noise, with fascinating results.  (Although I tried to choose reasonably large number of random runs considering the CPU time needed, please bear in mind the numbers identified in the rest of this are only as accurate as the number of runs allows.)</p>
<h3>False indication rejection when given only noise</h3>
<p>What about the reaction to having no signal at all&#8230; can it tell there is no transmission or does it falsely detect correlation?  What is the highest false correlation result seen when challenged with noise?  Here is the distribution of highest correlation results for 50 Million runs feeding it only white-ish binary noise with no correlation sequence component.<br />
<img src="/corr-noise-dist.png" align=center><br />
The largest false response seen even once in the runs is +58, out of a full-scale match with a 0% bit-error rate of +128: one can put it that the probability of seeing a match better than +58 from noise is something greater than 1 in 50M.  So we learn from this we can&#8217;t trust any correlation result lower than, say, +64, to allow some margin.  (This +64 requirement is shown with a blue line in the following graphs).</p>
<h3>Random bit-error rate response</h3>
<p>In this graph I ran the self-correlation 10,000 times per offset with different noise each time, and picks the worst (lowest) correct &#8220;position 0&#8243; sync match value (red) and plots it against the best (highest) wrong offset match value (green) in absolute match quality.  The thin blue line shows the absolute correlation value of +64 we selected based on the first graph.</p>
<p>On the left where there is no noise, we can tell the correct sync by a wide margin.  Where the red line crosses the green, at around 0.2 bit-error probability, it means the correct sync position can no longer be distinguished from a false match.  But before then, the absolute correlation value for the correct offset has fallen below our +64 limit (selected because noise can create a +58 result) so detection is lost first at a 0.12 ber.<br />
<img src="/corr-noise.png" align=center><br />
Here is a plot of the ranking of the correct offset vs all of the other offsets.  I expected the correct one to start at #1 and then slip down the rankings, but instead it starts at #1 and falls right to the bottom when it can&#8217;t be selected as #1 any more.<br />
<img src="/corr-ranking.png" align=center><br />
What it means is that up to around 0.12 &#8211; 0.15 ber (equates to 15 &#8211; 19 randomly selected flipped bits of the 128 in the pattern) you can detect the pattern VERY reliably.  Any higher ber &#8211; with randomly selected bit errors &#8211; and your probability of detecting the pattern is very low.</p>
<h3>Multibit dropout tolerance</h3>
<p>From my WiFi work I know that a common failure mode in RF packets is a multibit continuous dropout, that&#8217;s different from the random bit errors introduced above.  These graphs show the effect on worst correct offset margin  from dropouts of all possible lengths randomly placed in the packet, where the dropout is filled with white noise, all zeroes or all ones.<br />
<img src="/corr-drop-white.png" align=center><br />
<img src="/corr-drop-0.png" align=center><br />
<img src="/corr-drop-1.png" align=center><br />
Clearly it is beautifully insensitive to multibit contiguous dropouts.  If the problem is that you have white noise crapping on the transmission, the loss of 39 contiguous bits can be sustained without dropping below the +64 result limit.  If the problem is events that cause continuous static 1 or 0 to be read during the disturbance, the code is <b>very insensitive</b> to this and can still be detected with fully half of the bits sequentially zero&#8217;d out or up to 50 set to &#8217;1&#8242;.  So the sync detection performance faced with contiguous dropouts actually exceeds that of random dropouts.</p>
<p>This last dropout graph shows performance when there are TWO dropped-out areas randomly (5,000 runs at each dropout length) placed in the packet at various dropout lengths (the dropout length is the same for both and they can overlap, explaining the noise at the end as they grow larger).<br />
<img src="/corr-drop-dual.png" align=center><br />
Again looking at the absolute result values for the graph (blue line) the optimal absolute result cutoff of +64 is seen at two blocks of 18 contiguous bits contaminated with noise.  These are very severe insults that still allow a correct sync detection.</p>
<h3>Conclusion</h3>
<p>This means (to the accuracy of these simulations) if you draw a line at 15% bit-error rate, <b>if you ever see any offset of the correlator giving an absolute result of +64 or better, there is a very high probability that: </p>
<ul>
<li>there is a genuine transmission in progress</li>
<li>the offset reporting that result is the correct sync offset, and
<li>your bit-error rate is 15% or less</li>
</ul>
<p>Conversely if no correlator offset gives +64 or better:
<ul>
<li>the bit-error rate is higher than 15%, or</li>
<li> there is no transmission</li>
</ul>
<p></b></p>
<p>This is a very robust correlator pattern!  It can be improved further: at the moment the &#8220;score&#8221; for correlation adds 1 for a matched binary bit level and subtracts 1 for a binary mismatch.  If the demodulator that is providing these bits gives a probability of a &#8217;1&#8242; or a &#8217;0&#8242; instead of a binary &#8217;1&#8242; or &#8217;0&#8242;, then the result can be made from more information.  A few &#8220;looks a bit like a 0&#8243; inputs will more weakly override many &#8220;definitely a 1&#8243; inputs, for example.</p>
<p>There is another great advantage to interleaving this pattern with the payload.  If the sync pattern can be recovered considering the 15% bit-error rate that is allowed, it is possible to identify then which bits of the pattern were corrupted.  Because the correlator code bits are interleaved with the payload, it suggests that if the payload is broken, that the problem is coming from the payload bits next to the known-bad correlator code bits.  For example, if it is shown that say three contiguous bits of the correlator code channel are wrong, one has to wonder about the two payload bits that are inbetween them.  If there are a small number of bits involved, it can be possible to &#8220;fuzz&#8221; the suspected bad payload bits to see if an otherwise unrecoverable ECC error can be solved.</p>
<p>One more advantage is that the robustness margin of 15% allows channel bit-error rate to be continually assessed during reception.</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/09/12/magic-correlator-code-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Autocorrelation code and weak signal recovery</title>
		<link>http://warmcat.com/_wp/2007/09/12/autocorrelation-code-and-weak-signal-recovery/</link>
		<comments>http://warmcat.com/_wp/2007/09/12/autocorrelation-code-and-weak-signal-recovery/#comments</comments>
		<pubDate>Wed, 12 Sep 2007 11:40:51 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>
		<category><![CDATA[Magic Correlator]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/?p=38</guid>
		<description><![CDATA[Looking at weak signal capture at the moment, there has been considerable work done on this by Radio hams. The extreme cases for these guys are bouncing signals off the moon or meteors to reach other places on the planet. The most recent protocol I could find is called JT65, and it makes some pretty [...]]]></description>
			<content:encoded><![CDATA[<p><img src="/grin.png" align=left hspace=5>Looking at weak signal capture at the moment, there has been considerable work done on this by Radio hams.  The extreme cases for these guys are bouncing signals off the moon or meteors to reach other places on the planet.  The most recent protocol I could find is called <a href="http://www.arrl.org/FandES/field/regulations/techchar/18JT65.pdf">JT65</a>, and it makes some pretty extraordinary claims for data recovery: 100% recovery at -27dB SNR, ie, the noise floor is 27dB above the signal.  Unfortunately it seems the author of this otherwise cool and interesting protocol took it a step too far, and used <a href="http://www.sm2cew.com/jt65.html">&#8220;forbidden Black Magic&#8221;</a> in his implementation to get results at that level.</p>
<p>However removing the black magic the claim of 100% recovery at -22dB SNR using another &#8220;forbidden&#8221; but less magical technology is not being disputed.  This is a patent-encumbered &#8220;soft&#8221; Reed-Solomon decoder which is able to recover from more damage faster than the normal &#8220;hard decision&#8221; decoder: this means you have to give up another few dB to get a distributable implementation.  An open source implementation exists at <a href="http://developer.berlios.de/projects/wsjt/">berlios</a> but it&#8217;s written in freaking Fortran.  Multithreaded Fortran with a Python GUI.  This provides a normal Reed-Solomon FEC implementation which is used if you don&#8217;t have the external forbidden one.</p>
<p>One awfully limiting &#8220;trick&#8221; and two really interesting techniques are used in the protocol.  The bad news is that very very long symbol-times are used for transmission, 372ms per 6-bit symbol.  Considering the various bloatages it&#8217;s about one byte per two seconds.  They are sending one of 64 &#8220;tones&#8221; to encode the six bits during that time&#8230; obviously the symbol duration helps with recovery.  This &#8220;trick&#8221; is the core feature of weak signal recovery&#8230; repeat what you are doing a lot, in this case repeat the &#8220;tone&#8221; cycles a lot to &#8220;amplify&#8221; the signal at a receiver which knows how to take advantage of looking for something happening multiple times to increase probability of detection.</p>
<p>The first interesting trick is just the amount of Reed-Solomon used&#8230; this is not new to me since I used it as part of Penumbra.  But in this protocol, every 72-bit packet has an additional 306 bits of error correction attached to it :-O.  That&#8217;s more than 4 times as much ECC as data, and despite that it still pays off for capturing the signal.</p>
<p>The second cool technique is to interleave the payload data with a binary autocorrelation &#8220;clock&#8221;.  Since the noise level is so crazy, it&#8217;s of little use to expect a 1-bit channel in the data to be usable as a &#8220;start of frame&#8221; marker or somesuch as you would normally expect with digital serialized communication.  Instead, they spread the sync information in this interleaved &#8220;channel&#8221; using a 126-bit sequence which has a magically cool property&#8230; if you autocorrelate the sequence with itself, even in the presence of a fair bit of noise, every correlation offset except the right one matches MUCH worse than the 1:1 lineup.  Here is the sequence extended to 128 bits and correlating with itself.  The y axis is the number of bits that match&#8230;. obviously that is 128 when it compares itself to itself at the 0 offset on the X axis.  The cool part is how low the self-correlation is everywhere else, no better than 20, or a 14dB &#8220;SNR&#8221; between a match and a non-match.<br />
<img src="/ac0.png" align=center hspace=5><br />
This remains the case even under pretty bad noise, up to 25% of the bits being trashed (still 9dB sync SNR):<br />
<img src="/ac25.png" align=center  hspace=5><br />
but at 30% of the bits being trashed, the performance falls off a cliff:<br />
<img src="/ac30.png" align=center hspace=5><br />
Not only does the noise floor rise due to falsely improved correlations, but the one true correlation is also falsely degraded.  After about 28% bit errors the reliability is gone.  (Note the noise is one-shot with the test program, rather than being Monte Carlo&#8217;d, but I ran it several times and the graphs shown are representative).</p>
<p>But that isn&#8217;t the end of the story for this code.  First the correlation action is a filter for transmission presence all by itself.  And if you detect the transmission by the presence of the correlation code, you have also sync&#8217;d the receiver to the transmitted frame, since the correlation bits are interleaved with the actual data and the &#8220;0&#8243; offset marks the start of the frame.</p>
<p>With deep memory and a known period of retransmission from the source, temporally averaged autocorrelation can take place to increase the chances to find the presence of a transmitter and to sync up to its data.  After a transmitter &#8220;sync&#8221; has been found in the averaged data with high probability, the averaging memory can be turned to only store the times when a transmission was expected from the known schedule of the transmitter.</p>
<p>Here is the magic code with the 128-bit sequence and the test loops<br />
<code>
<pre>
#include <stdio.h>

static unsigned int u8Auto[] = {
	0x19, 0xbf, 0xa2, 0x89, 0xf3, 0xf6, 0x58, 0xcd,
	0x2a, 0x81, 0x01, 0x4b, 0xab, 0x4c, 0xc2, 0xbf
 };

#define AC_LEN 128

char GetAc(int n)
{
	n = n &#038; (AC_LEN - 1);
	return (u8Auto[n >> 3] >> (n &#038; 7)) & 1;
}

int main(int argc, char ** argv)
{
	int n, n1;
	int nSum;
	int nNoise = 0;
	int nSeed;
	FILE *f = fopen("/dev/urandom", "r");

	fread(&#038;nSeed, sizeof(nSeed), 1, f);
	fclose(f);
	srand(nSeed);

	if (argc == 2)
		nNoise = (1024 * atoi(argv[1])) / 100;

	fprintf(stderr, "Noise: %d%%\n", nNoise);

	for (n = -(AC_LEN - 1); n < AC_LEN; n++) {
		nSum = 0;
		for (n1 = 0; n1 < AC_LEN; n1++) {
			char c = GetAc(n + n1);
			/* simulate white noise */
			if ((rand()&#038;1023) < nNoise)
				c = c ^ 1;

			if (GetAc(n1) == c)
				nSum++;
			else
				nSum--;
		}
		printf("%d %d ", n, nSum);
	}
	return 0;
}</pre>
<p></code><br />
and the graph command that generated the graphs (the 28 is the percentage of noise to graph)</p>
<p><code>gcc test.c -o test ; ./test 28 | graph -Tpng --bitmap-size 1200x1200 -FHersheySans>temp.png &#038;&#038; convert temp.png -scale 300x300 png:temp1.png</code></p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2007/09/12/autocorrelation-code-and-weak-signal-recovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Next Generation</title>
		<link>http://warmcat.com/_wp/2006/09/14/next-generation/</link>
		<comments>http://warmcat.com/_wp/2006/09/14/next-generation/#comments</comments>
		<pubDate>Thu, 14 Sep 2006 10:24:26 +0000</pubDate>
		<dc:creator>andy</dc:creator>
				<category><![CDATA[Hardware design]]></category>

		<guid isPermaLink="false">http://warmcat.com/_wp/?p=21</guid>
		<description><![CDATA[In the Electronics and software treadmill world practitioners are constantly having to re-skill themselves to keep up with solutions that make sense in size, power and cost. Software has &#8216;bitrot&#8217; where things stop working properly if they are not maintained; while hardware designs should keep on chugging away once made, they may not remain manufacturable [...]]]></description>
			<content:encoded><![CDATA[<p>In the Electronics and software treadmill world practitioners are constantly having to re-skill themselves to keep up with solutions that make sense in size, power and cost.  Software has &#8216;bitrot&#8217; where things stop working properly if they are not maintained; while hardware designs should keep on chugging away once made, they may not remain manufacturable for long with components going obsolete and even if the components are available, hardware designs don&#8217;t stay competitive for long with the constant cycles of improvement in silicon.</p>
<p>Generally small design companies are pretty much as empowered as the very large companies in terms of designing and using the latest stuff.  But currently there are two major and important technologies in hardware terms that are out of the reach of small design companies.</p>
<p>First is BGA (Ball Grid Array) packaging.  Instead of legs sticking out of plastic shells, BGA has an array of solder balls on the underside of the package.  The PCB has exposed metal pads under each ball; the BGA is placed on top and heated slowly according to a temperature profile.  The solder balls melt and firmly attach all the connections through to the PCB pads.</p>
<p>The problems with BGA start with the PCB design, many BGA pinouts are far too dense to allow automated routing without taking up a crazy number of layers.  Modern BGA chips have solder balls on a 0.5mm pitch(!) which further demands the expense of laser-cut vias.  PCB autorouters which are perfectly fine for PQFP or other pin-based technologies make a miserable job of BGAs and they can need to be fanned out (to get the signals spread out from the pads) by hand.</p>
<p>Small design companies are typically making their own prototypes, but this is no longer possible either with BGA, since everything is on the underside of the device.  Instead an outside contractor must be used to place the BGAs on the PCB with an infra-red oven, and the result has to be inspected too with technology that is beyond a small company, using X-Rays to see through the chip and to confirm that all of the solder balls are melted and making contact.</p>
<p>The problem is that the most modern and desirable technologies are starting to appear ONLY in BGA form.  Unless a designer can specify world-class technologies available to his customers&#8217; competitors, clearly he is at a disadvantage.  So moving out of the pin-based ghetto into the BGA world is a major and growing concern here.</p>
<p>The second issue surrounds the problems of high-speed transmission-line based technology found for example in using DDR DRAM.  There are a bunch of stringent design rules surrounding DDR, the most difficult of which is length matching 70 or more nets.  Basically all the signals should arrive at the same time to the chip, this means ensuring that they all travel about the same distance on the PCB.  If you look at a modern motherboard, you will see some tracks perform strange &#8220;squiggles&#8221;: they are doing to to add length to themselves so they match the length of a signal that had to travel further.  Trying to do this for 70 or more nets in a small region, where each change can impact the length of other nets is&#8230; nontrivial  &#8230; and completely beyond the midrange autorouters available to small companies.  Higher-end autorouters like Cadence Specctra are capable of automating this task, but run to GBP30,000, and demand a king&#8217;s ransom to keep the updates coming.  Failure to maintain a sufficiently tight relationship between the lengths, or to keep signal quality to the necessary level for other reasons, will result in a design that can&#8217;t operate at the intended frequency or is flaky at any speed.</p>
<p>A related problem is being even able to look at signals operating at these high speeds due to bandwidth restrictions on midrange oscilloscopes.</p>
<p>These problems are for the future: currently we master 90MHz SDRAM bussing and 180MHz ARM9 CPU technology on 4-layer which is more than sufficient for many of today&#8217;s and tomorrow&#8217;s designs.  But the very high end of today&#8217;s needs is demanding the ability to attack BGA and DDR and investing in it is going to have to be on the agenda in the coming months.</p>
]]></content:encoded>
			<wfw:commentRss>http://warmcat.com/_wp/2006/09/14/next-generation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

