Whirlygig PCB

May 21st, 2009

I built the first prototype Whirlygig PCB last weekend, it’s working well.  For testing I left out the noncritical inductors and some caps.  I also found the total current consumption at the USB side is 250mA with the CPLD macrocells in low power mode and 350mA with them in high power mode, comfortably within the 500mA USB budget.  I decided to use the higher power mode because it should increase the ring oscillator frequencies and hence the randomness.  The CPLD runs hot, around 40 degrees C.

Improvements

I took the opportunity to make some improvements:

- Added JTAG programming of the CPLD to the SiLabs microcontroller over USB.  This allows change or update of the CPLD logic from the host PC without any hardware needed.  However because the kernel module blocks the logical USB interface, it’s safe from being rewritten while in use.

- Changed the random logic.  I’ll explain the changes and results in the rest of this article.

- Decreased the polling rate of the CPLD but increased the total USB random throughput, 1.0MBytes/sec sustained (for as long as you like) by making the code in the microcontroller “multithreaded”.  You can also plug in more Whirlygig devices to linearly increase random production; the kernel module allows hotplug and unplug without problems and combines the output seamlessly all in /dev/hwrng.

- I was pleased to see the kernel module had hardly bitrotted at all, it only needed a one-line edit to build a working module against a current Fedora Rawhide kernel.

The second LED lights while the PC is requesting random packets from the device.  It lights briefly on plugging it in while the driver’s cache is filled, then it only lights when something is using the hard random numbers on the PC.

New random scheme

I had three main ideas about improving the random hardware inside the CPLD.

First I realized we can decrease predictability by having more oscillators than are used at one time to change an output bit.  We have 8 output bits, but we now have 16 oscillator sets.  Instead of combining them all, on average several will not be used on any given operation.

The second idea was that now we have a pool of oscillators greater than needed at any one time, we can randomly select from them for each output bit operation.  So I added an additional 32 oscillator sets (4 for each output bit) which are only used to select which of the pool of 16 we use for any operation.  The end result is that at least 8 oscillators from the pool will be unused for each operation, and which oscillators do get used for which bit are individually “random” with “no” correlation between output bits.  This makes any attacker’s attempt to model the pool oscillator states very tough because there’s no longer any knowledge about which bit contains information about which pool oscillator, or even if its state has affected any output bit.

Lastly we now operate from a clock (24MHz) that is 14 times faster than the sample rate.  This lets us mix 14 randomly chosen oscillator states by xor before the output is sampled for each bit.  Even if two output bits were mixed with the same 14 oscillators, the order would have to be the same as well to get the same result, since the oscillators are never standing still.  For this same reason selecting a pool oscillator more than once in the 14 operations is not equivalent to a NOP.

I added another small tweak, all of the random generators shift ther oiginal state by 1 generator on each clock.  This is intended to reduce the impact of any hard nonliniarity in individual generator routing on the CPLD.

There were no problems with the PCB, but to save myself a headache working with the crossbar in the CPU I blobbed together pins 26 and 27 on the CPU.

In the next article we look at the random performance again with the new scheme.

Exhaustion and the GPL

May 23rd, 2008

exhaustionSome years ago I came across a guy Alexander Terekhov who worked then for IBM and had outspoken views about the viability of the GPL.

If I understood it, his opinion was that the license terms of the GPL would not survive resale, due to the well established “first sale doctrine” and its EU equivalent “exhaustion”.  It basically means that the copyright holder cannot stop you reselling your software, and that the license terms will not apply to the guy receiving it.

I tried to understand this further, but Alexander was not always easy for me to comprehend and had then a habit of linking to his own posts elsewhere to bolster his position, leading to a kind of echo chamber of Terekhovs all nodding vigorously at each other.  He also back then and evidently more recently too explained legal decisions that did not fit his understanding by calling the Judges in question “morons”, etc.  Well the forum I met him at had a very high trolling quotient so it just joined the rest of the anti-GPL sentiment there for me in the end and I ignored it.

GPL is a license too

But I was reminded of this last night when I read about a recent decision against Autodesk which is being widely seen as a victory for Joe Softwarebuyer.  From the Patry blog post link above:

…many software companies have taken the position that they can convey the copy to the customer in an over-the-counter transaction for a one-time payment, but describe that transaction as a license; as a license, the first sale doctrine doesn’t apply, meaning copyright owners can prevent further distribution of the copy…

Doesn’t this vindicate Alexander’s position?  How can GPL terms stick past resale if Autodesk EULA ones don’t?  Nothing stops “built-in” or “automated” resale to clense software of any licensing restriction.

A lot of people seem to be happy about the paid-for world being freed from license conditions, are they going to be happy if it turns out that everyone is also freed from GPL conditions?

Civil infringement and Punishment

What effect would this have on contribution I wonder.  It seems to me the real-world advantages from being active in a project by contributing will still apply.  But it will enable private proprietary forking for products, the kind of thing that Harald Welte’s gp-violations.org has had success attacking and punishing to date.  Contributors will see their work used in commercial products without the changes being open.

But the BSD folks seem to survive this outrage without it removing their motivation.  And from time spent looking at music licensing over the years, I kind of recognize an element of proprietary vindictiveness in gpl-violations… of course the member companies hiding behind the RIAA attacks are also “perfectly within their rights” to embark on much worse vindictive destruction, but they are not entirely dissimilar and that always bothered me.

Playing ball or going home?

Well, this decision is subject to appeal, will only apply to the jurisdiction of that court, etc, so the sky didn’t fall in already.  But there is quite a bit of harmonization of copyright law thanks to the insistence of rich rightholder companies mainly from the US side.  But if this is upheld, it may come to contaminate most Western countries and turn GPL terms in unenforcable noise — the choices would be in effect public domain or closed.

I guess some people will go closed rather than have their work exploited, but I expect most people will just continue on, and contributions will continue to come perfectly fine.  The advantages from being a visible contributor and taking upstream directly are still going to apply, so will the bitrot that happens to any additional code put on top and maintained privately.

Too mature to care?

Maybe now we reached a point that the social, financial, engineering and public advantages from cooperation are ingrained enough that we don’t need a license to protect them anyway?  But I read this and I feel a sinking feeling about the naivity of such a proposal.

Whirlygig GPL’d HWRNG

November 24th, 2007

Hardware random for the masses

I made available the result of the ring oscillator random generator as a GPL project called Whirlygig. It’s a 2.75cm x 4cm PCB with a mini USB connector, it provides a sustained 5.5Mbps (~620KBytes/sec) of apparently very high quality random bits using the Linux hw_random API. The large amount of randomness should make it useful for statistical tests as well as hard crypto.

I prototyped it using a couple of boards I had lying around, so I know it works fine, but I am waiting for the PCBs to come back from fabrication to actually build a final one. I placed the CPLD VHDL, the board hardware design, the driver software and the firmware for the USB controller into http://git.warmcat.com.

Dieharder

I spent some time worrying about how to test the quality of the result — I found that “diehard” mentioned in an earlier post has been superceded by “dieharder”. This has a much tougher general testing regime, even though many of its test are reproductions of the diehard ones — it runs each test many times and forms histograms of the p-value results from the many runs, and gives an assessment of fail, poor, possibly weak or pass on the spread of results rather than a single result.

At first the RNG failed three of the 18 tests, but on looking closer one of the tests (#2) currently fails for all RNG input and is marked up as not for use with assessing RNG quality, and the two others required by default more than the 400MBytes of randomness I had prepared. Unfortunately in that case they simply rewind the randomness file and re-use the same data to make up the balance! Of course this is no longer quite “random”. When I adjusted those two tests to use a smaller sample that fitted into the 400MBytes without repetition, the output of the RNG get a “pass” on all 17 of the relevant dieharder suite tests.

Max Entropy

During the validation phase I changed the RNG algorithm in the CPLD significantly. The scheme is described on the project page, but basically I moved away from a bit-centric to a byte-centric design with 8 identical sets of 3 oscillators. To stop any characteristic of a particular oscillator’s routing from being associated with a particular bit of the result byte and creating a bias, I introduced a “mixer” that first generates 8 random bits by combining six oscillator outputs each with XOR, then rotates these oscillator sets between the result bits sequentially at 24MHz. I also removed the toggling action and used the random bit directly.

I also found the Linux rng-tools suite which repeatedly runs FIPS-140-2 tests on the bits, this fails 1 in 1200 or so packets of testing over 20 billion bits, I believe this is normal for a real random generator that it will produce sequences with low probability that don’t look very random in the short term.

Aside from passing dieharder and FIPS-140-2, the changes also got me a reported 8.000000 bits of entropy per byte from the ENT test, so there are reasons to imagine the quality of the output is very good.

FIPS-140-2 and ENT validation vs ring RNG

November 15th, 2007

NIST lists some more test suites. NIST also have their own suite, but it is now Windows-only, and lacks a necessary DLL to run there. The last UNIX version segfaulted here before giving any results… sigh.

I ran the last 10MByte sample against ENT and TestU01… to cut a long story short

$ ./ent ../die.c/dump3
Entropy = 7.999980 bits per byte.

Optimum compression would reduce the size
of this 10002432 byte file by 0 percent.

Chi square distribution for 10002432 samples is 281.26, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.4958 (127.5 = random).
Monte Carlo value for Pi is 3.140111525 (error 0.05 percent).
Serial correlation coefficient is -0.000212 (totally uncorrelated = 0.0).

7.9999 bits of entropy per byte! TestU01 is less turnkey than the other suites — it’s literally a test library with some example code. I amended an example to call the FIPS-140-2 tests:

============== Summary results of FIPS-140-2 ==============

 File:             dump3
 Number of bits:   20000

       Test          s-value        p-value    FIPS Decision
 --------------------------------------------------------
 Monobit               9933           0.83       Pass
 Poker                11.88           0.69       Pass

 0 Runs, length 1:     2482                      Pass
 0 Runs, length 2:     1227                      Pass
 0 Runs, length 3:      630                      Pass
 0 Runs, length 4:      319                      Pass
 0 Runs, length 5:      161                      Pass
 0 Runs, length 6+:     166                      Pass

 1 Runs, length 1:     2466                      Pass
 1 Runs, length 2:     1302                      Pass
 1 Runs, length 3:      620                      Pass
 1 Runs, length 4:      311                      Pass
 1 Runs, length 5:      140                      Pass
 1 Runs, length 6+:     146                      Pass

 Longest run of 0:       16           0.14       Pass
 Longest run of 1:       14           0.46       Pass
 ----------------------------------------------------------
 All values are within the required intervals of FIPS-140-2

So the design’s output is compliant to FIPS-140-2, a requirement for many uses.

Diehard validation vs ring RNG

November 14th, 2007

RNG Quality assessment

A timely article flew by on Reddit about the RANDU pseudo-random generator algorithm widely used in the 1960s, which it turns out was very flawed indeed. It was explained to one student that ”We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random”. Basically it produced numbers that belonged to one of 15 “planar” groupings and nothing in the gaps between the planes. It isn’t just a minor annoyance, because many statistical studies in the 60s and 70s used it, and it can easily have contaminated their results. That’s definitely not what I am trying to reproduce with the ring oscillator device — so how can I figure out how “good” the randomness is in an objective way?

RNG quality test suites

It turns out that empirically testing RNG outputs has been the subject of a lot of work for decades, and there are some established testing suites available online. A major one seems to be the “diehard” suite — I guess it is a pun on die as the plural of dice.

It needs you to fetch 10M bytes of random numbers or more and let it run a bunch of tests on them. The output was a little hard to assess initially: most tests issue a “p” number which only suggests something is bad if it is 0.000… OR 0.999…. All other numbers inbetween are to be taken as a good result as I understood it. Except there is a warning that even good RNGs can produce the occasional test fail.

Thus you should not be surprised with occasional p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p`s of 0 or 1 to six or more places. By all means, do not, as a Statistician might, think that a p < .025 or p> .975 means that the RNG has “failed the test at the .05 level”. Such p`s happen among the hundreds that DIEHARD produces, even with good RNGs. So keep in mind that “p happens”

I duly fetched 10M bytes of 115kbps randomness from the device and fed it to diehard. It seemed to give fine results except on “Count the 1s stream” and “Squeeze” (devastating p=0.000000), “Count the 1s specific” for bits 1-11 (p=0.000030) and 9-16 (p=0.000064), and QQSO 2-6 (p=0.000005). It passed the dozens of other tests but it was disappointing, looks like a big fat ‘failed’.

Triple Scoop

Well, since my test CPLD was an XC95288XL with 288 Macrocells to burn, I naturally wondered if I could improve matters by tripling the amount of ring oscillators getting Xor-ed — that is to implement the three varying sized oscillators 3 times each, totaling nine, and sum them with a big XOR. They’ll all be drifting around individually as much as together, it should be a mighty noise-fest.

I edited the VHDL and blew it into the CPLD… visually the summed RNG output “bit” was an awful lot more noisy than before. I pulled another 10M bytes from that setup: but just looking at the byte distribution as I did before told me something is still up.

That sawtooth type distribution is “not random” to coin a phrase. If you look at the large jump at 0×80 (128) it is telling us that we are more likely to get 1000000 binary than we are to get 01111111, in other words, since this is over 10M bytes, there is a distribution problem favouring ’0′. When I analyze the distributions of 1s and 0s I find

0: 40436204, 1: 39563804... delta=872400, skew=1.090500%

You can see the same thing even better looking at 0×00 (42,000 hits) vs 0xFF (36,000 hits), they are like 8% off the median of 39,000. Clearly that distribution of 1s and 0s has to have a very small skew to stop these kinds of effects showing up, and equally clearly this is telling us something deep about the RNG hardware.

Spiky

Although the individual oscillators are quite slow thanks to the number of inverter stages, at 4 – 6MHz, the way they are being summed makes for trouble from bandwidth limitations inside the CPLD. At the moment it just uses a dumb asynchronous XOR action, that means that potentially very fast spikes can be seen when one “slow” oscillator changes state very shortly after another “slow” oscillator. For example:

You can see on the left (this is 5ns/div notice) a runt pulse where this happened, the XOR was convinced to rise by one oscillator changing and then countermanded when another oscillator changed state less than 5ns later, resulting in a doubtful pulse that was probably not visible as a ’1′. This also happens when going from ’1′ to ’0′, but maybe the threshold for the transistors in the CPLD is not at exactly 50% of the 3.3V supply. So we suddenly have it seeing more ’0′s than ’1′s on average when spikes are involved.

This whole high bandwidth summing step is completely needless, it’s only there because it is a literal interpretation of the diagram in the original RFC. I changed it instead to have nine latches sample the nine oscillators every 125ns (there is an 8MHz clock on the prototype board) and sum those results with XORs into a single bit. In turn this output is sampled by another latch at 8MHz to hide any metastability.

Latched up

The latched summing version performs much better and has gotten rid of most of the bit skew, and the sawtooth behaviour:

…but there is still a problem with 0×00…. the bit skew looks like this

0: 39960076, 1: 40039932... delta=79856, skew=0.099820%

so the skew is now on the side of ’1′s but only by 0.1%. You can see the byte count spread is much tighter than before too — 1800 instead of 6000 counts before.

Balancing out the skew

Well if the remaining skew is something to do with the ratio of rise to fall times, or the non-squareness of the oscillator outputs for some other reason by something as low as 0.1%, that is hard to do much about, especially as it may vary on the specific silicon die.

But it shouldn’t matter — now the bandwidth situation at the XOR summer is sane, if we invert the summed output 50% of the time it should spread any excess on ’1′s or ’0′s to the opposite as well, cancelling any bias. I added a couple of terms to the summer to xor against the UART bit index LSB and a bit which toggles after every byte sent by the UART. It’s the equivalent of xor with 0×55 for the first byte and then 0xAA for the second byte, over and over.

That glitch in the middle is actually at 134 (0×86), maybe it is random but I guess we will see…. the skew is further reduced as anticipated

0: 39974218, 1: 40025790... delta=51572, skew=0.064465%

Diehard sequel

I ran 10M bytes from this version through Diehard again… the really bad p-value results are gone. For example Squeeze was a deadly 0.000000 before and is now 0.255260.

I made one last adjustment, I added the current state of the latched random value to the XOR term. That means it decides whether to keep or invert the latched value, it no longer directly accepts the value from the RNG. This got me to the promised land: 0.0005% skew between ’1′ and ’0′.

0: 40000206, 1: 39999802... delta=404, skew=0.000505%

This also gets me the apparently good diehard results with no obvious failures on any tests, you can see the actual results here. So it seems the current version can tentatively be called a “real RNG”.

Ring oscillator RNG performance

November 12th, 2007

Pretty random

After some scrabbling around porting my Jtag SVF interpreter to Octotux and creating a kernel module for the PIO end of it — and moving to a different board with a XC95288XL CPLD to prototype it, the triple ring oscillator RNG is working. It issues a 9600 baud result, but after some initial confusion I modified it 1/8th of the time to sit out a sample time leaving “break” on the serial line. This should make sure that the receiving UART does not get confused by the data as a start bit. The true data rate is something like 800 random bytes per second at 9600 baud.

Here are the three chains of inverters (19, 23 and 29 long) oscillating at the different fundamentals

… and here is what the xor summing looks like, first over 1s then sampled once.

Although the single shot sample doesn’t look very random, the oscillators are drifting around all the time. If you wait a little while between samples (currently it is 104us, a 9600 baud bit-period) it’s pretty hard to guess what phase all the oscillators have drifted to — at least, that’s the plan.

Distribution of binary levels

The first test I did was to see what the distribution of ’1′ and ’0′ in the results was… clearly if the device is really random it should on average be 50% each. I fetched 1M random bytes, or 8Mbits:

0: 4008913, 1: 3991095… delta=17818, skew=0.222725%

Its okay for a really random source to deviate to 50:50 at any given time, although on average it should be 50:50.

Octet distribution

Next I looked at the distribution of the results from 0×00 through 0xFF as the result “random byte”. This would show up if the RNG fails to ever issue some result or favours certain results over others — every result should on average have an equal chance of showing up and so an equal count. I ran it for 1M random bytes…

This is pretty decent, every possible result is seen with a frequency within +/-200 counts of the 3,900 average after 1M bytes.

115200 baud results

Encouraged by this I cranked the baud rate up to 115220 or 8.68us between samples and around 10K random bytes per second. The skew is increased somewhat and the spread of result counts is increased a little.

0: 4028746, 1: 3971262… delta=57484, skew=0.718549%

So far so good!

Adding entropy to /dev/random

November 7th, 2007

A hard RNG is good to find

The recent statistical analysis for drumbeat reminded me I could do with a proper source of random numbers, not generated by a pseudorandom feedback action. Back in the early 1990s I was looking at statistical profiling of execution on microcontrollers, I was surprised then to discover that only by making the sampling period random could I get a true picture of execution distribution. If the address bus was sampled at a fixed rate, say 100kHz, instead of a true picture it would be distorted by activity that was happening at some fraction or harmonic of the sampling frequency. So you would alias out pieces of loops completely or get a bloated count for other areas. Only by true randomness in the sampling timing could you see the reality — a paradox.

Analogue RNG methodologies

A Google or two around showed that most of the techniques are analogue one way or the other. Many of the methods suffer from a problematic need to amplify some very tiny source of noise, a Zener diode or avalanche transistor junction, by really huge amounts, 90dB or more. There are a couple of suppliers of RF “noise diodes” with flat spectra across a wide frequency range, but they are hard to source.

Digital non-pseudorandom technique

However there is one technique which while still relying on analogue noise is basically digital — to run multiple chains of unlocked inverting oscillators and xor the outputs. The unlocked oscillators have no reference at all, they’re basically an inverter fed back on its own input — in fact a chain of inverters. Such a circuit oscillates according to the period of the total delay through the inverter chain… and that is highly sensitive to temperature. Normally with synchronous digital design we choose a clock rate for a circuit that is just below the maximum possible at the worst temperature it is expected to operate at — and after that we can forget about temperature. But with this asynchronous unlocked oscillator concept, the micro- and macro- temperature dependence is revealed in all its freaky glory, causing the oscillation to drift unpredictably slightly every cycle and over larger period with gross temperature fluctuations.

RFC4086

RFC4086 mentions a recommendation for a RNG based on unlocked inverter chains that is found in IEEE 802.11i.

             |\     |\                |\
         +-->| >0-->| >0-- 19 total --| >0--+-------+
         |   |/     |/                |/    |       |
         |                                  |       |
         +----------------------------------+       V
                                                 +-----+
             |\     |\                |\         |     | output
         +-->| >0-->| >0-- 23 total --| >0--+--->| XOR |------>
         |   |/     |/                |/    |    |     |
         |                                  |    +-----+
         +----------------------------------+      ^ ^
                                                   | |
             |\     |\                |\           | |
         +-->| >0-->| >0-- 29 total --| >0--+------+ |
         |   |/     |/                |/    |        |
         |                                  |        |
         +----------------------------------+        |
                                                     |
             Other randomness, if available ---------+

This has three unlocked, wandering oscillator chains of different lengths being summed at an XOR gate.

Implementing the RFC4086 RNG

Since it needs 71 inverters, you would need 12 74hc04 or similar, it makes more sense to put it all in one CPLD. I have an old XC95108 lying around, so I wrote up the design in VHDL and added a UART interface to issue the sampled random data. This brings up the issue of how quickly it can be sampled and still get high quality randomness… clearly if we sampled it at 10ps it wouldn’t be very random at all, since it didn’t have time to change between samples. On the other hand if we sampled it at some high multiple of the fastest free-running oscillator period, then there is a lot of opportunity for each oscillator phase to have been affected over the longer time. By using the UART we can control how often we sample the RNG by the baud rate… I initially set it to 9600 baud or 104us/sample. The oscillators should have periods on the order of 150 – 200ns (5 – 6MHz), so this is allowing 500+ cycles of jitter to accumulate in each oscillator before the summed sample is taken.

I’m currently waiting for a programming tool to be delivered so I can program another device to allow programming the XC95108 — I no longer have any PCs with a printer port I realized yesterday. I am very interested to see what the performance and quality of the randomness is like!

CE Technical Documentation

October 25th, 2007

The other week I went on a workshop to learn more about the new 89/336/EEC regulations that came into force in the UK on 20th July 2007. Here are some notes cribbed from my notes, they’re intended to be an overview: obviously for something this important you should get your own advice.

Out with the old ways…

For a long time it has been a requirement to certify that any product you manufacture for sale in the EU meets the “relevant standards”, so it can have a “CE” mark. Until this July in the UK you could either do that by:

  • paying for test at a “competent body”, a company with a ton of test gear that will empirically test your device against the emission and immunity standards, or
  • Writing up a Technical Construction File, or TCF, which described the product design in a deep way, and included tests and logic showing why you are compliant

Device torture at the House of Pain

The last design I completed, for a smart 4-channel Analogue telephony device that can hook to the Internet, I went down the empirical test route. At a total cost of several thousand pounds the production device was tested at a real competent body with calibrated receivers and emitters, blasted with wideband radio signals, zapped with +/- 8kV discharges. The resulting report gave a very clear okay except on a minor issue to do with the AC power supply we had used. We also had to do specialized testing for the analogue telephony end, which we again passed, although not until getting a component supplier to make a special that actually complies with the standard.

(Actually walking the device through this testing is a pretty sweaty business, since time is literally money at the test facility and wiggle room in the case of trouble is also in short supply. In one instance for example I was able to patch the sources in realtime when an issue came up during ESD testing that broke normal operation but wasn’t enough to trigger the watchdog, turning a fail into a pass. So I would never send a device unaccompanied for testing, or go to a test house without a laptop with full sources to expect the unexpected.)

In with the new ways…

However the demands in the new regulations have changed significantly. You must now generate “Technical Documentation” for any product you will be selling in the EU. This is basically the old TCF route to compliance, but it doesn’t itself necessarily remove or even perhaps reduce the need for absolute tests for a given device.

Less well known is that if you are still selling devices first sold before 20 July 2007 come 20 July 2009, you will need to have made a new style Technical Documentation for them, or stop selling them. A lot of tech products from 2007 will be old hat by 2009 solving the problem, but it is not true in all markets.

Typically the Technical Director of the company is the “responsible” as the French say who must sign off that the device meets the standards. What you are signing off on is that WHEN it is properly installed and maintained, and used for the purpose it is intended for:

  • the device creates an EM disturbance low enough that radio and telecoms equipment can operate as intended
  • it has a level of intrinsic immunity which is adequate to enable it to operate as intended

So what is done with this “Technical Documentation”?

Nothing if you’re lucky. The only people who can ask to look at it are the regulatory authorities, OFCOM in the UK. You don’t publish it or register a copy of it. But you have to keep it for ten years after the last sale of the device for the authorities to ask for. It literally only exists to keep the signatory out of jail if the authorities ask for it. Not kidding about the jail — if you don’t have a satisfactory Technical Documentation to show, the criminal penalties can include a GBP5,000 fine and/or 3 months in jail.

The key words about the Technical Documentation are that it should be “reasonable” and “duly diligent”, as in “All reasonable steps are exercised and all due diligence to avoid committing the offense”. That really sums up the job of writing it, you are trying to have an answer for anything that could be said was unreasonable or not duly diligent. While meeting budget constraints from the customer :-/

Spread of outcomes

The ways that problems might pan out were discussed informally. It was proposed that roughly a third of companies, the presenter reckoned, have their head totally in the sand about it, and could expect trouble. Another third had made some effort in the right direction and another third spent the money and were golden. Another factor in how much shit would rain down in the event of problems was the number of devices sold, if it was millions and they were crap, expect maximum warp to jail. If it is five and they don’t quite comply despite obvious efforts to prove it, maybe that won’t be so bad. But who knows, some overkill is called for.

How likely is my Technical Documentation to be demanded?

In Germany, we were told, the authorities have a system of testing 10,000 models of devices a year, spread over the various types of product. In the first year (IIRC) it resulted in 105 prosecutions :-/

Another tidbit is in the UK, OFCOM are allegedly looking at training up 85 new enforcement officers. The mobile phone companies, due to the ruinously expensive spectrum auctions of a few years ago, are apparently agitating for more enforcement of the cleanliness of their expensive 3G spectrum.

What goes in the Technical Documentation then?

Here is the briefest outline:

  • Description of apparatus – brand/model/manufacturer, intended function, limitations on operation… Technical description – block diagram, technical drawings, interconnections, variations, versions of design documents referenced
  • Procedures used to ensure conformity – Technical Rationale: what you’re testing against, why you did particular tests; Details of design: EMC features, component specifications, QA to control variation; Test data: Logical processes to decide if the tests are adequate, EMC tests and their results, external test reports on subassemblies/components

You can also get a Competent Body to “comment” on your Technical Documentation, as some fairly convincing assurance that it is adequate. This is really a seal on the “due diligence” aspect so you can really show you totally ticked every box to make sure it was compliant, but I guess only large companies can afford it.

Conclusion

If you manufacture or import stuff to sell in the EU, you are going to have to have Technical Documentation to keep yourself out of jail.

For a standalone device, that means you’re really going to have to not only look to dealing with EMC early in the design, with some kind of inhouse testing ability, but find the budget of a few thousand pounds to take it for testing at a Competent Body so you have something convincing and calibrated to put into that Technical Documentation.

Not only that, but even determining which are the applicable standards is a huge headache if you try to do it yourself, there are hundreds of them: a Competent body can also help select the basic issue of which tests you are targeting.

But it’s not all bad — if you make product variations around the same base, you can choose which variation to actually test as a baseline, and then for each variation see if it stands up to show they would not push the original base design over the edge. I have done this in the last couple of weeks, creating for a customer Technical Documentation for a sister device to one that went through actual testing at a Competent Body, and using the very large similarities to limit the amount of retesting needed.

There are definite advantages to requiring this level of design scrutiny and justification, but the change to requiring Technical Documentation and the trend to increased enforcement over the ten years you must keep the documentation has definitely pushed the minimum cost and effort of bringing something to market up several notches.

Drumbeat

October 25th, 2007

The magic code project has gained a name and there are some new results to share.

The full sources for the experimental modem using the correlator codes is available at http://git.warmcat.com under GPL2+ license.

The big change is what I call “scrambling”.  The 126-bit magic correlator sequence is disordered and xor-ed in 258 random ways, which are selected to not correlate well with each other at any offset (less than 43/128 match).  This allows us to issue whole bytes in one code by selecting which disordered correlation code to transmit.  The other two codes are for start and end of packet markers. I first tried 22 scrambles to allow 4 bits per code and some extra codes, this worked fine but I was able to extend it to 258 without really damaging the “best” scramble-scramble false correlation score too much.

The gain here is that the self-ordering and threshold properties of the code now reaches up to entire bytes: you always get a clean, aligned byte or you don’t get anything: with high probability you don’t get a wrong byte.

I also changed the demodulator code to something that is currently quite expensive to run.  Instead of tracking a reference phase at the receiver, which is tough to do in the presence of extreme noise and phase wrapping, at each sample it tries to demodulate using that sample as “0 degrees” and a 50% phase slicer to get the bits.  It’s expensive because it has to do that against each of the 258 scrambles every sample — but on the plus side the correlator does not run at all for 124 symbols out of 126 (98.4% of the time) after there has been a successful decode, since it knows it is partway through a 126-symbol code. Still the number of demodulation attempts can definitely be reduced, maybe by subsequently just doing it on the last winning phase and a few either side.

I found that the noise performance of the demodulator is strongly dependent on the relationship between the sample rate and the carrier frequency.  It’s much less dependent on the number of carrier cycle periods per symbol, so long as you have 4 or 5 or above (2 works but is killed by hardly any noise).  Since there is plenty of filtering to the carrier going on I didn’t really expect that. Here is a graph of the noise performance when there are 96 samples per carrier (48kHz sample rate, 500Hz carrier, 100baud — these are intra-code symbols, it’s 6.25bits/sec effective):

The noise performance doesn’t look too bad, at least with these synthetic tests and unrealistic 48kHz sampling. It can recover all 7 bytes that are sent even at -28dB, when only 4% of the received power is the signal and the rest is white noise. And because of the code properties, these are definite bytes being captured with quite high probability, not the kind of uncertain decode that would normally be expected in such noise.

The “average quality %” shown in the graph is the percentage of demodulated bits in the matched code scramble that actually were “right”, averaged over all the bytes. If a byte is missing because its quality was below the threshold of 70/128, it counts as 0% quality. Up until -20dB, the demodulator is doing well and the recovering individual bits almost perfectly. Without the Correlator code, after -20dB you would be dealing with a rapid increase in bit errors from the demodulator and relying on ECC. By using the Correlator coding though, we are able to push performance another 8dB into the noise (we even recover half the symbols at -30dB, or 10dB further) and still maintain the alignment and high probability of correct decode advantages.

This shows the effect of keeping everything else the same, but bringing the sample rate down to 8kHz from 48kHz

You can see the BPSK demodulation starts to fail at -10dB instead of -20dB and the correlation code again buys you another 8 – 10dB into the noise after that.

Here is the spectrum after bandpass and lowpass filtering that is actually “transmitted”, the peak is the carrier at 500Hz in this case.

Another aspect of this setup is that although the demodulator is currently pretty expensive in CPU (and power), the modulator is much simpler. It just requires a 1KByte table for the scrambles and a precooked integer sine table if you are running a separate carrier than the RF one itself. Then it just needs enough logic to walk through the scramble table entry at the symbol rate (and the sine tables at the sample rate if you’re using it). That can fit in part of a tiny flash controller, using a 1-bit output of a shifter to switch the phase of the transmitter. It can be simplified even further by not using scrambles but just sending a bit per code sending the code forwards or backwards to signal a ’1′ or ’0′.

It seems that I can begin to understand where this system fits into the existing high noise codings already used by ham radio folks. The correlator code is a special case of using an ECC code, in this case where 8-ish bits are exploded into an 126-bit “space” filled with correct decodes, damaged but recoverable decodes and invalid decodes. The error correction performance of adding 8-10dB “coding gain” is probably a bit (not so much, from what I can work out) poorer than an optimal use of 126 bits to code for 8, but the advantages that attracted me to the code in the first place can offset that for some applications, the guaranteed self alignment right down from demodulation to byte boundaries, and a quite firm decode success threshold. In addition, it seems some of the more optimal decoding schemes for stuff like Viterbi can be very compute-intensive, whereas although at times we do a lot of it, the correlation action is simple and lends itself to being done in parallel. Turbo codes are patented. So it’s not a one size fits all technique, but it has its niche.

I guess the next stage is looking to BPSK modulate and recover an RF carrier directly, but that will need some thinking on because it will be a very frequency-specific design, unlike these baseband tests where you can just edit the important variables and recompile. For example if it can be done initially at the UK 40MHz ISM band it will be possible to consider logic looking at carrier zero-crossings for phase assessment easily enough, and to autocorrelate just the averaged recovered phase at 4 times the symbol rate.

Heading deeper into the noise

September 28th, 2007

QPSK abandoned for BPSK

The noise performance of the QPSK decoder wasn’t what I was hoping after several iterations. I pretty quickly threw out the Costas loop because loss of phase lock was the weakest link in the chain, and moved to a system with a tight bandpass filter at the carrier frequency, and measuring zero crossings of the carrier against a local oscillator. But still the noise performance was not impressive at 300baud / 3kHz carrier.

I had hoped to go back to a damaged QPSK code recovery and to correct the data phase on the broken code bits, because we know what they should be according to the code. But there was no visible commonality between what was happening on the phase information that carried the code and the phase information carrying the data.

So I have moved back for now to a two-phase BPSK system interleaving the code and the payload symbol by symbol, in order to understand better how far the code can be pushed with noise.

Improved noise performance

Currently I use a 4.8kHz carrier and 1200baud symbols. Here is the performance today:

The RED line is the best decode seen for the correct match offset, the BLUE line is the best decode for all other (wrong) offsets. The PURPLE line is the mean decode for the correct match offset over the 100 runs. GREEN is the “quality cutoff” basically keeping us from getting near the blue line’s false hits: because of the properties of the code it is extremely unlikely noise will get near the green line.

Here is a close-up on the noisy end of things:

So far in terms of detecting the code, we can on average do so when the input energy is 82% noise (-15dB SNR). We can still detect codes ~1% of the time even at 92% noise (-21dB SNR). Here is a further close-up (it is 1000 runs, not 100)

suggesting you can still get good recoveries ~0.1% of the time at 95% noise (-25.5dB SNR). And of course we are now measuring the whole system performance here including the demodulator part, not just damaging the code bits directly.

Current BPSK receiver

Here is the receiver for the current BPSK method:

I spent several days meddling with the QPSK version and arriving at the carrier zero-crossing method for phase detection. The original plan to have a symbol sampler running from a locked LO hung around causing lots of problems. The indications of a change in symbol — detected by zero crossings — were variously delayed by the phase itself and the filters used, making it difficult to convert their jittery indications into a guide for the symbol recovery clock. This resulted in double bits being sampled.

The code is the symbol clock

Because of this I eventually realized that no symbol clock was needed, by running the correlator at the sample frequency, and sampling symbols at a fixed period, because of the false match rejection properties of the code it would “discover” the correct phase and offset from the behaviour of the correlator output. And when the code was recovered best then the data interleaved with it will be recovered best too. This sounds power-unfriendly, but after the first offset is used, you don’t run the correlator until enough time has passed for 256 symbols to be acquired, and then you should still be locked from when you did the first 256 symbols: if not you run the correlator a few dozen times to find lock again.

Also of note is that what is stored in the ringbuffer is a weighted average of the last four phase results, this includes information from the phase of multiple carrier cycles for the same symbol, helping to reduce the effect of noise on the decode.

The code is the data!

Well recovering the code at high SNR is interesting, but how useful is it if the data bits interleaved with it have been subject to the same beating without the properties of the code to protect them? We can use the autosyncing properties of the code to help with trying to get some payload signal gain through averaging, but I think I have seen where this is headed now… the code IS the signalling system for the payload data. That means throwing out the interleaved data concept.

A ’0′ can be signalled as the normal code, and a ’1′ as a time-reversed code. Because it’s intended for low data rate communication at VERY low signal levels compared to noise, it’s okay if we are reduced to 1 byte/sec, which will be the end result of this at 1200 baud. Basically with this we carry over all of the great robustness qualities of the code to be attributes of the payload data.

So what is the point compared to just blasting 128 symbol times of the same phase carrier, which is a hell of a lot simpler?

  • Three high accuracy results, “no result”, a ’0′ or a ’1′ detected. The carrier-only method will happily return a bogus result if there is noise energy at the carrier — if the code method ever claims a ’0′ or a ’1′ you can be almost certain it is genuine
  • “Fuzzy” robust damage-tolerant signal detection, better performance than a simple threshold comparator
  • Automatic bit sync (“bit clock recovery”) with almost no chance of wrong sync, sync recaptured each symbol; bit sync for the carrier-only version in high noise is unreliable
  • Absolute phase polarity can be recovered from one symbol despite the 180 degree lock uncertainty for BPSK — you can even lose lock one or more times inside the symbol and the code with absolute phase can still be recovered; the carrier-only concept needs a coding at a higher level to determine absolute recovered phase, in turn needing multiple correct symbols recovered without losing lock