Archive for the ‘Linux peripherals’ Category

New NXP LPC32x0 in Qi bootloader

Monday, November 29th, 2010

LPC3250 from scratch

NXP’s new LPC32x0 is a very cheap and feature-filled ARM926.  According to Digikey anyway, it’s the cheapest ARM chip with at least v5 instruction set that’s going.  That’s important not just because of the extra processor strength over older ARM9 core, but because ARM Fedora is built requiring armv5 or newer instruction set.  Being able to use ARM Fedora and RPM as a basis means freedom from compromise and having to own the building of an integrated, self-consistent rootfs; you can just focus on doing your specialized code on top using the reliable Fedora quality basis.

There are four chips in the series, they differ in having an LCD controller and Ethernet MAC or not; also the smallest guy LPC3220 has “only” 128KBytes of Static IRAM and the others 256KBytes.  Well, having worked with the 2KBytes of internal static RAM on the iMX31 for SD boot on Qi, having to shoehorn an SD card driver in there, even 128KBytes is crazy amounts.

They have support for resistive touchscreen, USB OTG, NAND controller and Mobile DDR, and up to 266MHz CPU clock at 1.4V Vcore (208MHz at 1.2V Vcore but as we will see that is not entirely true).  They don’t support SD Card boot from ROM, but that can be solved for about US$0.30 as will be shown.

In short they’re ready to do some serious embedded work at a budget price.

Embedded Artists EA3250 Dev kit

There are a few dev kits around for LPC32x0, Hitex have a cheap USB stick format one that has been permanently two weeks away from availability since I first looked at it a month or so ago, and it still is two weeks away.

NXP anoited two real dev boards they evidently worked with the vendors for during development, they don’t actually make an NXP branded dev board, it’s Phytec and Embedded Artists.  Since the EA one is in Digikey, that’s what I ended up with.

The dev board is well made but there are some problems with it: like many dev boards it comes in two halves, a cheaper, large breakout board and a 8-layer DIMM type board that has the actual CPU BGA and memory.  In an act of supreme lunk-headedness, the large breakout board re-uses the Pn.m nomenclature that the CPU uses for GPIO, with no care to retain the CPU mapping.  So for example a header is marked with having a pin P1.27, very confusingly this is nothing to do with the CPU GPIO P1.27.  This is also true in the schemtatics for the baseboard and CPU board, complete confusion trying to trace a signal between the two boards or looking for a misnamed signal on the baseboard.

DDR trouble #1

There’s also a more serious problem, the DDR on the CPU card is marginal and Embedded Artists have made a recall where they will replace the board with one with a different DDR DRAM for free.  The CPU board I got was affected but not at room temperature; they want the old card sending back and I am not finished with it yet, so I will take advantage of this recall later.

DDR trouble #2

There’s another problem with DDR, NXP issued an errata confessing their inverted signal for the differential DDR clock is skewed by no less than 1.2ns from the uninverted partner of the differential pair, a huge skew.  This issue removes a lot of comfort zone from designing with DDR and means only some memory devices will tolerate it.  However in the EA board case, they have not used the workaround suggested by NXP which is to nuke the inverted output entirely and make the clock unipolar, so the situation can’t be that bad.

DDR trouble #3

The last problem with DDR… operation at 208MHz with 1.2V Vcore is fine for the CPU, in fact while screwing with the PLL I had the CPU running fine at 400MHz, although there is no way to divide anything useful down for the memory clock at that speed and it’s illegal for the PLL over temperature, which tops out at 320MHz.  However at 1.2V and 208MHz, the CPU side of the DDR bus is unreliable: it requires cranking to 1.4V to operate DDR even at 104/208MHz.  That’s annoying because since 1.2V is needed anyway for other circuitry, it could have saved a regulator.

Unbrickability of LPC32x0

LPC32x0 chips feature UART-based bootloader injection… if you pull down the SERVICE_N pin, then next boot the ROM in the CPU will bring up UART5 at 115200 n81 and issue a simple protocol byte allowing for bootloader download.

Since I couldn’t find a Linux tool for injecting bootloaders, just a Windows one, I wrote a commandline tool for it and added it to Qi build.

http://git.warmcat.com/cgi-bin/cgit/qi/tree/tools/lpcboot.c?h=lpc

No matter how broken your nonvolatile image gets, it’s still possible to recover the device via this UART scheme with a USB <-> LVTTL serial cable.

Bootloader Hell

The LPC32x0 bootloader situation is ugly.  Basically NXP provided a huge suite used for chip verification called CDL (“common driver library”), this is a sort of chopped down OS in bootloader form.  It has all kinds of functions to drive the chip peripherals and test memory, but nothing to actually boot Linux!

What EA shipped, and what you are meant to do as a system integrator, is get an implementation of CDL in the form of “S1L” — stage one bootloader — to load U-Boot, which will then load Linux.  Both U-Boot and S1L — itself like 130KBytes! — store “state” on the board.  It leads to this insane situation that two bootloaders with two kinds of state must be right in order to boot.  Things are further complicated that SPI boot only allows the first 56KBytes to be loaded by ROM into IRAM and executed, but the bloated bootloaders are too big to do this in one step.

Bootloader Heaven

I added support for LPC32x0 to Qi last week, this is a single < 30KBytes image that can boot itself from SPI Flash or UART 5 injection and pull Linux from SD Card in VFAT partition or also via SPI Flash.  Boot from cold, with Qi and Kernel in SPI Flash to Fedora 12 bash prompt is less than 4 seconds.

http://git.warmcat.com/cgi-bin/cgit/qi/log/?h=lpc

This replaces both S1L and U-Boot, and in accordance with Qi philosophy it holds no state at all on the device.

Its strategy is if it finds that it is running via injection on UART5, it copies itself into SPI Flash / EEPROM so it will run next boot from there, and if it finds an SD Card kernel image it will also copy that into SPI Flash.

When it finds it is running from a non-injection source, ie, a normal boot from SPI Flash, it favours any kernel it can find on the first, VFAT, partition of an SD Card if found, otherwise it boots from the kernel also in SPI Flash.

This is why the lack of ROM -> SD Card boot is not critical, the cheapest, smallest SPI EEPROM can be used to contain Qi, which will then load the kernel and rootfs from SD Card if that’s what’s needed as during development.  If SD Card is overkill for the job, then Qi, Kernel and initrd can all be pushed into a single US$2 32MBit SPI Flash.

Since I only have the Embedded Artists board right now it wants to see a kernel image called k-ea3250.img on the SD Card; the way Qi works you add a new file for each supported board in ./src/cpu/lpc32x0/ copied from embart-steppingstone.c in that directory; the bootloaders need some way to identify what they’re running on at runtime since there is only a single image per cpu that supports all devices.  See  http://git.warmcat.com/cgi-bin/cgit/qi/tree/src/cpu/lpc32x0/embart-steppingstone.c?h=lpc for an idea of what’s involved to support a new board in the bootloader image.

Bootloader Envy

Monday, February 8th, 2010

Lesson #2:  A bootloader is to load and boot Linux

On the first day of FOSDEM I sat through a presentation on what could be called another “U-Boot derivative”.  One of the greatest asspains at Openmoko was the various kinds of Hell caused by the U-Boot bootloader and its philosophy, which can be summed up as “I wanna be Linux when I grow up”.

Configure system is a bad alternative to good bootloader design

First, it has a config system.  That should be good though, right?  The problem with the config system is that if anything differs from your current config, you must build another incompatible binary with another config and take care of that.  When you have more than a handful of different boards, you are in a maze of incompatible bootloaders.  Openmoko took it one step further, they mandated a different bootloader binary per PCB revision, so left unchecked there would have been a continuous proliferation of incompatible bootloaders, all basically the same.

All persistent bootloader private state is EVIL

Second, U-Boot thinks it’s a good idea to have these environment “scripts”, because it’s “configurable”.  Actually, the job of a bootloader is to Load, then Boot Linux.  You don’t need any configurability for that if the bootloader can figure out what it’s running on and therefore where the memory is and how much there is.  These scripts expose a really deadly trap I call “private bootloader state”.  It means the bootloader stores stuff in nonvolatile memory on the PCB and acts different according to what it hides there.  The end result is that two boards from the same factory may act totally different even with the same rootfs due to “bootloader secrets”.  This is totally needless and ALL private bootloader state can be eliminated by correct design of the bootloader leading to completely deterministic boot action per rootfs.

A good example how that lead you to the path to hell is hardcoding in the U-Boot environment of the amount of kernel image you will copy from somewhere.  People commonly set it to 2MBytes, forget about it and one day they generate a 2.1MB kernel image and wonder why decompress blows up.  Actually, that whole procedure is insane, the kernels are uImages that report their length in a header.  The bootloader should examine the header and compute the length of image to pull.  But that doesn’t fit with this “environment” nonsense.

Do Linux Stuff In Linux

In any of these bloated U-Boot style bootloaders, is there even one feature they do better than the same feature in Linux?  The startup time should be better by a few 100ms.  Other than that, no, every single bloated “I will add it to the bootloader beacuse I can” feature is shittier than you get in Linux.  Every single feature!

If you need some advanced capability or backup / recovery boot action, check for a button held down at boot-time in the bootloader and go fetch a different Linux partition + kernel.  Use standard Linux tools and shells.  In return, get really high quality network stack, proper USB support, NAND access that’s compatible to your main Linux system access in BBT / ECC terms, and all the other advantages of Linux.

Do your peripheral bringup in drivers in Linux

Typically you do not need ANY bringup in the bootloader except SDRAM controller and chip init, since it’s a prerequisite to put Linux in the RAM that it’s initialized.

That’s right, all the megabytes of source spent in U-Boot providing support for so many kinds of peripheral is a waste of time, effort and maintenance.  I am being kind saying “maintenance”, because the drivers in U-Boot are typically “dumbed down” versions of the equivalent Linux driver that were forked irretrievably the moment all the Linux APIs were ripped, so there’s no coherent effort to keep them up to date with the Linux ones .  Lately I saw that they try to ape some Linux APIs there… why not go the whole hog and just load and boot real Linux?  After all, modern CPUs can be running your driver probes in Linux in ~2 seconds from power using a bootloader that doesn’t get in the way.

You typically don’t even need to talk to the PMU in the bootloader, after all, you are running code fine already, right?  Otherwise you wouldn’t be able to run the bootloader code itself.

Fat girl in Ibiza

At least at Openmoko, code quality inside U-Boot was awful bad.  I called U-Boot on the lists there “the fat girl in Ibiza” because you know she’s going to do anything you want.  All kinds of constant-only code, weird new scripting keywords were added for test undocumented, you name it.  Hardware guys felt up to writing such code secretly by themselves once they learned the software engineering marvel that is *((unsigned int *)0x…) = 0x…;

Your bootloader just tests SDRAM

There’s only one test action your bootloader is suited to do, and that is SDRAM test.  Once you are in Linux, it can’t perform a full SDRAM test while it’s running.  But the bootloader is typically starting from on-CPU SRAM, it can actually run a true SRAM test from there.  Otherwise, the bootloader should be completely absent from the test plan.  All other tests should be performed in Linux via standard driver and rootfs tools.

More about board and test and board bringup will feature in another report of a lesson learned.

Qi

While at Openmoko (mainly) I wrote a bootloader that meets these ideals, you can find it in git here One of the nicest things about it is that unlike the bloated bootloaders whose job never finishes trying to become Linux cargo cult style, Qi has been pretty much complete for a few months.  It’s a new job to support a new CPU, a much smaller job to add a new board and it doesn’t want to talk to your peripherals anyway so no problem there.

Qi creates one binary per CPU, that supports all boards with that CPU.  That sounds like a big job but we don’t care about your peripherals so all boards with the same CPU look almost identical.  You have to find something that can detect your particular board at runtime, for example NOR device ID read check.  So there is zero build-time config and Qi generates all CPU support when it’s buit, it takes 3 sec or so typically.

Typical bootloader binary size per CPU is 28-30KBytes.  That supports VFAT, ext2/3/4 typcially the SD controller as well.  The single Qi image also supports being booted from NAND, JTAG or SD Card on processors that support it just by being copied into place and without any changes.

There is zero bootloader private state, however Qi can look in the rootfs and append kernel commandline text from the content of a filesystem file.  This maintains the rule that boot should be completely deterministic per rootfs.

Whirlygig GPL’d HWRNG

Saturday, November 24th, 2007

Hardware random for the masses

I made available the result of the ring oscillator random generator as a GPL project called Whirlygig. It’s a 2.75cm x 4cm PCB with a mini USB connector, it provides a sustained 5.5Mbps (~620KBytes/sec) of apparently very high quality random bits using the Linux hw_random API. The large amount of randomness should make it useful for statistical tests as well as hard crypto.

I prototyped it using a couple of boards I had lying around, so I know it works fine, but I am waiting for the PCBs to come back from fabrication to actually build a final one. I placed the CPLD VHDL, the board hardware design, the driver software and the firmware for the USB controller into http://git.warmcat.com.

Dieharder

I spent some time worrying about how to test the quality of the result — I found that “diehard” mentioned in an earlier post has been superceded by “dieharder”. This has a much tougher general testing regime, even though many of its test are reproductions of the diehard ones — it runs each test many times and forms histograms of the p-value results from the many runs, and gives an assessment of fail, poor, possibly weak or pass on the spread of results rather than a single result.

At first the RNG failed three of the 18 tests, but on looking closer one of the tests (#2) currently fails for all RNG input and is marked up as not for use with assessing RNG quality, and the two others required by default more than the 400MBytes of randomness I had prepared. Unfortunately in that case they simply rewind the randomness file and re-use the same data to make up the balance! Of course this is no longer quite “random”. When I adjusted those two tests to use a smaller sample that fitted into the 400MBytes without repetition, the output of the RNG get a “pass” on all 17 of the relevant dieharder suite tests.

Max Entropy

During the validation phase I changed the RNG algorithm in the CPLD significantly. The scheme is described on the project page, but basically I moved away from a bit-centric to a byte-centric design with 8 identical sets of 3 oscillators. To stop any characteristic of a particular oscillator’s routing from being associated with a particular bit of the result byte and creating a bias, I introduced a “mixer” that first generates 8 random bits by combining six oscillator outputs each with XOR, then rotates these oscillator sets between the result bits sequentially at 24MHz. I also removed the toggling action and used the random bit directly.

I also found the Linux rng-tools suite which repeatedly runs FIPS-140-2 tests on the bits, this fails 1 in 1200 or so packets of testing over 20 billion bits, I believe this is normal for a real random generator that it will produce sequences with low probability that don’t look very random in the short term.

Aside from passing dieharder and FIPS-140-2, the changes also got me a reported 8.000000 bits of entropy per byte from the ENT test, so there are reasons to imagine the quality of the output is very good.

Diehard validation vs ring RNG

Wednesday, November 14th, 2007

RNG Quality assessment

A timely article flew by on Reddit about the RANDU pseudo-random generator algorithm widely used in the 1960s, which it turns out was very flawed indeed. It was explained to one student that ”We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random”. Basically it produced numbers that belonged to one of 15 “planar” groupings and nothing in the gaps between the planes. It isn’t just a minor annoyance, because many statistical studies in the 60s and 70s used it, and it can easily have contaminated their results. That’s definitely not what I am trying to reproduce with the ring oscillator device — so how can I figure out how “good” the randomness is in an objective way?

RNG quality test suites

It turns out that empirically testing RNG outputs has been the subject of a lot of work for decades, and there are some established testing suites available online. A major one seems to be the “diehard” suite — I guess it is a pun on die as the plural of dice.

It needs you to fetch 10M bytes of random numbers or more and let it run a bunch of tests on them. The output was a little hard to assess initially: most tests issue a “p” number which only suggests something is bad if it is 0.000… OR 0.999…. All other numbers inbetween are to be taken as a good result as I understood it. Except there is a warning that even good RNGs can produce the occasional test fail.

Thus you should not be surprised with occasional p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p`s of 0 or 1 to six or more places. By all means, do not, as a Statistician might, think that a p < .025 or p> .975 means that the RNG has “failed the test at the .05 level”. Such p`s happen among the hundreds that DIEHARD produces, even with good RNGs. So keep in mind that “p happens”

I duly fetched 10M bytes of 115kbps randomness from the device and fed it to diehard. It seemed to give fine results except on “Count the 1s stream” and “Squeeze” (devastating p=0.000000), “Count the 1s specific” for bits 1-11 (p=0.000030) and 9-16 (p=0.000064), and QQSO 2-6 (p=0.000005). It passed the dozens of other tests but it was disappointing, looks like a big fat ‘failed’.

Triple Scoop

Well, since my test CPLD was an XC95288XL with 288 Macrocells to burn, I naturally wondered if I could improve matters by tripling the amount of ring oscillators getting Xor-ed — that is to implement the three varying sized oscillators 3 times each, totaling nine, and sum them with a big XOR. They’ll all be drifting around individually as much as together, it should be a mighty noise-fest.

I edited the VHDL and blew it into the CPLD… visually the summed RNG output “bit” was an awful lot more noisy than before. I pulled another 10M bytes from that setup: but just looking at the byte distribution as I did before told me something is still up.

That sawtooth type distribution is “not random” to coin a phrase. If you look at the large jump at 0×80 (128) it is telling us that we are more likely to get 1000000 binary than we are to get 01111111, in other words, since this is over 10M bytes, there is a distribution problem favouring ’0′. When I analyze the distributions of 1s and 0s I find

0: 40436204, 1: 39563804... delta=872400, skew=1.090500%

You can see the same thing even better looking at 0×00 (42,000 hits) vs 0xFF (36,000 hits), they are like 8% off the median of 39,000. Clearly that distribution of 1s and 0s has to have a very small skew to stop these kinds of effects showing up, and equally clearly this is telling us something deep about the RNG hardware.

Spiky

Although the individual oscillators are quite slow thanks to the number of inverter stages, at 4 – 6MHz, the way they are being summed makes for trouble from bandwidth limitations inside the CPLD. At the moment it just uses a dumb asynchronous XOR action, that means that potentially very fast spikes can be seen when one “slow” oscillator changes state very shortly after another “slow” oscillator. For example:

You can see on the left (this is 5ns/div notice) a runt pulse where this happened, the XOR was convinced to rise by one oscillator changing and then countermanded when another oscillator changed state less than 5ns later, resulting in a doubtful pulse that was probably not visible as a ’1′. This also happens when going from ’1′ to ’0′, but maybe the threshold for the transistors in the CPLD is not at exactly 50% of the 3.3V supply. So we suddenly have it seeing more ’0′s than ’1′s on average when spikes are involved.

This whole high bandwidth summing step is completely needless, it’s only there because it is a literal interpretation of the diagram in the original RFC. I changed it instead to have nine latches sample the nine oscillators every 125ns (there is an 8MHz clock on the prototype board) and sum those results with XORs into a single bit. In turn this output is sampled by another latch at 8MHz to hide any metastability.

Latched up

The latched summing version performs much better and has gotten rid of most of the bit skew, and the sawtooth behaviour:

…but there is still a problem with 0×00…. the bit skew looks like this

0: 39960076, 1: 40039932... delta=79856, skew=0.099820%

so the skew is now on the side of ’1′s but only by 0.1%. You can see the byte count spread is much tighter than before too — 1800 instead of 6000 counts before.

Balancing out the skew

Well if the remaining skew is something to do with the ratio of rise to fall times, or the non-squareness of the oscillator outputs for some other reason by something as low as 0.1%, that is hard to do much about, especially as it may vary on the specific silicon die.

But it shouldn’t matter — now the bandwidth situation at the XOR summer is sane, if we invert the summed output 50% of the time it should spread any excess on ’1′s or ’0′s to the opposite as well, cancelling any bias. I added a couple of terms to the summer to xor against the UART bit index LSB and a bit which toggles after every byte sent by the UART. It’s the equivalent of xor with 0×55 for the first byte and then 0xAA for the second byte, over and over.

That glitch in the middle is actually at 134 (0×86), maybe it is random but I guess we will see…. the skew is further reduced as anticipated

0: 39974218, 1: 40025790... delta=51572, skew=0.064465%

Diehard sequel

I ran 10M bytes from this version through Diehard again… the really bad p-value results are gone. For example Squeeze was a deadly 0.000000 before and is now 0.255260.

I made one last adjustment, I added the current state of the latched random value to the XOR term. That means it decides whether to keep or invert the latched value, it no longer directly accepts the value from the RNG. This got me to the promised land: 0.0005% skew between ’1′ and ’0′.

0: 40000206, 1: 39999802... delta=404, skew=0.000505%

This also gets me the apparently good diehard results with no obvious failures on any tests, you can see the actual results here. So it seems the current version can tentatively be called a “real RNG”.

Ring oscillator RNG performance

Monday, November 12th, 2007

Pretty random

After some scrabbling around porting my Jtag SVF interpreter to Octotux and creating a kernel module for the PIO end of it — and moving to a different board with a XC95288XL CPLD to prototype it, the triple ring oscillator RNG is working. It issues a 9600 baud result, but after some initial confusion I modified it 1/8th of the time to sit out a sample time leaving “break” on the serial line. This should make sure that the receiving UART does not get confused by the data as a start bit. The true data rate is something like 800 random bytes per second at 9600 baud.

Here are the three chains of inverters (19, 23 and 29 long) oscillating at the different fundamentals

… and here is what the xor summing looks like, first over 1s then sampled once.

Although the single shot sample doesn’t look very random, the oscillators are drifting around all the time. If you wait a little while between samples (currently it is 104us, a 9600 baud bit-period) it’s pretty hard to guess what phase all the oscillators have drifted to — at least, that’s the plan.

Distribution of binary levels

The first test I did was to see what the distribution of ’1′ and ’0′ in the results was… clearly if the device is really random it should on average be 50% each. I fetched 1M random bytes, or 8Mbits:

0: 4008913, 1: 3991095… delta=17818, skew=0.222725%

Its okay for a really random source to deviate to 50:50 at any given time, although on average it should be 50:50.

Octet distribution

Next I looked at the distribution of the results from 0×00 through 0xFF as the result “random byte”. This would show up if the RNG fails to ever issue some result or favours certain results over others — every result should on average have an equal chance of showing up and so an equal count. I ran it for 1M random bytes…

This is pretty decent, every possible result is seen with a frequency within +/-200 counts of the 3,900 average after 1M bytes.

115200 baud results

Encouraged by this I cranked the baud rate up to 115220 or 8.68us between samples and around 10K random bytes per second. The skew is increased somewhat and the spread of result counts is increased a little.

0: 4028746, 1: 3971262… delta=57484, skew=0.718549%

So far so good!

Adding entropy to /dev/random

Wednesday, November 7th, 2007

A hard RNG is good to find

The recent statistical analysis for drumbeat reminded me I could do with a proper source of random numbers, not generated by a pseudorandom feedback action. Back in the early 1990s I was looking at statistical profiling of execution on microcontrollers, I was surprised then to discover that only by making the sampling period random could I get a true picture of execution distribution. If the address bus was sampled at a fixed rate, say 100kHz, instead of a true picture it would be distorted by activity that was happening at some fraction or harmonic of the sampling frequency. So you would alias out pieces of loops completely or get a bloated count for other areas. Only by true randomness in the sampling timing could you see the reality — a paradox.

Analogue RNG methodologies

A Google or two around showed that most of the techniques are analogue one way or the other. Many of the methods suffer from a problematic need to amplify some very tiny source of noise, a Zener diode or avalanche transistor junction, by really huge amounts, 90dB or more. There are a couple of suppliers of RF “noise diodes” with flat spectra across a wide frequency range, but they are hard to source.

Digital non-pseudorandom technique

However there is one technique which while still relying on analogue noise is basically digital — to run multiple chains of unlocked inverting oscillators and xor the outputs. The unlocked oscillators have no reference at all, they’re basically an inverter fed back on its own input — in fact a chain of inverters. Such a circuit oscillates according to the period of the total delay through the inverter chain… and that is highly sensitive to temperature. Normally with synchronous digital design we choose a clock rate for a circuit that is just below the maximum possible at the worst temperature it is expected to operate at — and after that we can forget about temperature. But with this asynchronous unlocked oscillator concept, the micro- and macro- temperature dependence is revealed in all its freaky glory, causing the oscillation to drift unpredictably slightly every cycle and over larger period with gross temperature fluctuations.

RFC4086

RFC4086 mentions a recommendation for a RNG based on unlocked inverter chains that is found in IEEE 802.11i.

             |\     |\                |\
         +-->| >0-->| >0-- 19 total --| >0--+-------+
         |   |/     |/                |/    |       |
         |                                  |       |
         +----------------------------------+       V
                                                 +-----+
             |\     |\                |\         |     | output
         +-->| >0-->| >0-- 23 total --| >0--+--->| XOR |------>
         |   |/     |/                |/    |    |     |
         |                                  |    +-----+
         +----------------------------------+      ^ ^
                                                   | |
             |\     |\                |\           | |
         +-->| >0-->| >0-- 29 total --| >0--+------+ |
         |   |/     |/                |/    |        |
         |                                  |        |
         +----------------------------------+        |
                                                     |
             Other randomness, if available ---------+

This has three unlocked, wandering oscillator chains of different lengths being summed at an XOR gate.

Implementing the RFC4086 RNG

Since it needs 71 inverters, you would need 12 74hc04 or similar, it makes more sense to put it all in one CPLD. I have an old XC95108 lying around, so I wrote up the design in VHDL and added a UART interface to issue the sampled random data. This brings up the issue of how quickly it can be sampled and still get high quality randomness… clearly if we sampled it at 10ps it wouldn’t be very random at all, since it didn’t have time to change between samples. On the other hand if we sampled it at some high multiple of the fastest free-running oscillator period, then there is a lot of opportunity for each oscillator phase to have been affected over the longer time. By using the UART we can control how often we sample the RNG by the baud rate… I initially set it to 9600 baud or 104us/sample. The oscillators should have periods on the order of 150 – 200ns (5 – 6MHz), so this is allowing 500+ cycles of jitter to accumulate in each oscillator before the summed sample is taken.

I’m currently waiting for a programming tool to be delivered so I can program another device to allow programming the XC95108 — I no longer have any PCs with a printer port I realized yesterday. I am very interested to see what the performance and quality of the randomness is like!

Out of your tree

Saturday, March 3rd, 2007

Out of your treeThe willingness of the kernel devs to refactor stuff is both a huge strength and weakness for the kernel. The strength is in the extraordinary continual optimization and improvement in the codebase, not just locally to an area of code but for cross-kernel concepts, like the recent workqueue changes.

But this has a pretty harsh cost for people writing or maintaining code that is outside the kernel tree and which therefore does not get the reworking applied to it as part of the core kernel. Whatever code they put out is invalidated and broken again and again sometimes in just the space of a few weeks.

The freedom to refactor despite breaking external code is a huge luxury for the devs seldom seen elsewhere in the coding world. Some projects take some care to allow compilation of their drivers for all recent kernels, using conditional compilation based on the kernel tree it is being compiled against, but other projects have an attitude that it will only compile against the current Linus tree.

The foaming churn of change makes for pretty hard work trying to make any kernel code that is not in the main tree work for any length of time. Greg KH at least is on record that his concept of the solution is to bring everything inside the kernel tree, but I don’t know how that will ever scale, and it loads the devs with having to understand a work with an ever growing amount of device-specific code. Aside from that, it makes the kernel devs gatekeepers for what will be accepted, and since not everything that can exist will be deemed acceptable, there will always be a class of device driver that is living out of the comforts of the main tree.

Anyway the end result is that for many projects that people need drivers for, the shelf-life of any instance of the driver sources is extremely narrowly defined. A Wifi driver for example touches many subsections of the kernel that have a history of changes in the recent past, yet requires a pretty recent kernel to compile at all with the stuff that it actually needs. So each driver tree has a quite narrow slot of kernel versions that it will work with, annoyingly current CVS from many drivers will not compile against current kernel source, not -git either but -rc versions. It means that out of tree drivers are a lottery for any recent kernel any kind of driver is a high commitment project, that needs constant revisiting to keep it alive.

There doesn’t seem to be an answer except that over time more and more critical subsystems in the kernel will surely mature to the point that they get fiddled with less and less, and things should therefore die down on the breakage front. But in truth the adolescent codebase of Linux shows no signs at the moment of slowing down its crazed foaming froth of reinvention and massive damage and breakage to the code around it.

Conexant ADSL Binary driver damage

Monday, August 14th, 2006

A couple who are friends with Jenny and I asked what must be getting on for two years ago about what could be done to remove the constant virus problems they were having with their Windows box. Naturally after making sure they did not need anything that Linux was poor at, ie, 3D games and so on, I recommended FC2 at that time. I nuked their box with it and the guy has been very happy all this time. I updated him to FC4 a while back.

But now he is upgrading from dialup to ADSL, he needed this taking care of. He had a Zoom PCI Adsl card, model 5506 with a conexant chipset. I found a driver here for it:
http://patrick.spacesurfer.com/linux_conexant_pci_adsl.html

Hm so the first sign all was not well was the age of the page and the results from Google, they are all from circa 2003. This project has been continued to be worked up in the last couple of months though. After some struggle trying to avoid the 4kBytes/sec modem download we got the driver and the kernel-devel sorted out and compiled it. It quickly blew chunks, on a #error that our kernel had CONFIG_REGPARM defined. Well we run the stock Fedora kernel and are not much interested in moving off it, why on earth should the driver care about this detail? Hm closer inspection of the site showed:

”Note: Linux 2.6.* users should note that their kernel must be compiled without the “use register arguments” (CONFIG_REGPARM) option. This is an experimental option that will almost certainly never work reliably with this driver or any other driver that uses proprietary object code. Newer versions of Fedora and SuSE come with kernels that use this option, in these cases you will have to recompile the kernel.”

Ugh, so the reason it couldn’t survive CONFIG_REGPARM is because it has a binary blob which demands stack args! No chance apparently to get two binary blobs compiled with and without. This is a stupid situation, because the site itself documents that Fedora kernels after 2.6.9 on FC3 are compiled with CONFIG_REGPARM, since it should speed things up at no cost. His solution is to insist on a vanilla kernel.org kernel solely to support the needs of the binary blob :-(

We had to give up trying to get it cooking, and instead the guy blew GBP20 on an ADSL router from ebuyer. Just what awesome secrets do they think that binary blob is concealing? What astounding concepts that would set the world on fire if their sources were known?
Binary blobs, causing trouble and bitrotting where ever you find them.

I sent the guy running the project a polite email

Hi Patrick – First thanks for your work on the Conexant ADSL project. I was trying to install a Zoom ADSL PCI card for a friend, we are both running Fedora Core 5. I saw after some time that I was on a loser because there is a binary blob in the project which was basically compiled with different compiler switches to cut a long story short. What is the situation with Conexant and this blob as you understand it? It seems that the chipset dates from 2002 or 2003, is there no chance that this far down the road they might be willing to be more liberal with the sources for it? My friend and I gave up on the PCI card and ordered a GBP20 ADSL router from ebuyer instead, simply due to there being a binary blob. -Andy

I got a reply a couple of hours later, the guy does not have a relationship with Conexant and says they are ignoring his mails.

RT73 Belkin stick depression

Wednesday, August 9th, 2006

Sadly I have thrown three days down the toilet on trying to get a Belkin “Wireless G Network Adapter”, F5D7050, containing a Ralink RT2571 chip to work using either the Ralink RT73 driver or the newer serialmonkey rt2x00 driver which contains the rt73usb.ko driver, this is on my AT91 platform.

Initially I started, like a happy idiot, trying to get either to work with wpa_supplicant, since we have a WPA2 80211g network here. The Ralink RT73 sources did not initially crosscompile cleanly, there is a bad reference to asm/i386/… in an include, but after that it went better. However, at least when crosscompiled on gcc 4.02, this driver is a useless piece of crap, I outlined the problems here but naturally there was zero response from Ralink.
Well okay, I knew about the alternative serialmonkey driver from getting my elder stepson’s laptop working, which incorporated another Ralink chipset. They did not seem to have any support in the form of the modified Ralink drivers, but they do have a beta 2×00 driver which supports the RT73 chipset. This got a lot further, the MAC address was correctly initialized and in the end, with some coaxing, it can be made to show results from iwlist wlan0 scan that include our AP. But it won’t associate and stay associated. After I removed the encryption from the AP temporarily, I was once – one time only – able to contact the DHCP server long enough to get an IP, but then it immediately deassociated again. And this is with no encryption! Again I posted to the forums here and again there was zero response. Perhaps it is the Arm crosscompile that is freaking the devs out, but since it is littleendian and 32 bits, it’s really not so wild to expect it to just work.

Another issue – actually here is the one bit of good news from the work – is there are two versions of firmware for the RT73 I found, in the form rt73.bin. One is shipped with the ralink driver and is also available on their site, which claims to be version 1.7. The other was provided in the Win98 directory on the CDROM that came with the Belkin device and is referred to as version 1.0 in the debug output. The Ralink-supplied driver has its own code to grab the file from a specific path – /etc/Wireless/something – and also has a private copy of the firmware in the sources of the driver itself if it can’t find the driver in its magic path. The serialmonkey driver does it the proper Linux way using the firmware API in the kernel. Anyway this was the good news, I learned how this worked and created a hotplug script that is compatible with it, allowing it to load the firmware successfully from /lib/firmware.

Anyway, while I have been saying recently that the wifi driver problems are largely resolved in Linux, which has been my experience on x86 laptops, they sure as hell aren’t resolved for crosscompile usage :-( ((

Edit 2006-08-13: I posted to the serialmonkey project mailing list about it, it’s too tantalizing close to forget about it.  Head serialmonkey replied, “send hardware”.  Trying to see how feasible that it, since they need a build env and so on.

Broadcomm and WPA

Tuesday, July 11th, 2006

About 18 months ago on a periodic trip to gawk at PC World (a superstore for PC stuff here in the UK) I purchased a Belkin PC Card 54g adapter with a Broadcomm 4306 chipset. Of course I took a flyer on the chipset, it was relatively cheap and I figured I would have some fun trying to get it to work with Linux. Yes, the same madness that grips me every time in PC World. The cheaper peripherals that do not have standardized interfaces (unlike, say, USB Audio devices like headsets, which always just work) always have a very new chip from a company that regards the interface to it as part of what makes their IP such a special flower and Must Never Be Told. Webcams seem to be the chief culprit at the moment.
Periodically I took it down from its box of dead things and tried to get it working with a new version of Fedora. Well I read that the BCM43xx driver was integrated to 2.6.17 and that is where Fedora are at (Fedora do a good job of tracking the latest kernels, there is a chart in a Linux magazine here in the UK this month showing Fedora has much later kernels than distros except SuSE). Since I was going to upgrade the laptop Rohan uses here to FC5, I did this and at the same time without too much hope tried the old Broadcom bookstop.

To my pleasure I was able to get it working here after extracting some firmware and sorting out wpa_supplicant, which I gained some experience in from getting this Samsung laptop working. I sat there loading webpages and looking at its power and data lights, which I was never before able to light. Good old Linux!

Hum later that evening the behaviour became intermittent. I ran wpa_supplicant with a debug switch and I see it is having problems maintaining sync with the crypto. Bringing the (eth1) interface down and up got it working for a while but then it would stutter into silence again. I modprobe -r’d the bcm43xx driver and pulled out the card, it was hot but not so hot. I know that wpa_supplicant is working fine on FC5 because this laptop’s wifi is super stable (ipw3495-based). So the problem is either in the bcm43xx driver, or is a physical (heat?) problem with the adapter, I guess it makes sense it can show up in WPA breakage if it is a low level problem.

Edit: couple of days later, I changed the /etc/epa_supplicant/wpa_supplicant.conf contents and that seemed to resolve the problem, we will have to see if the improvement is permanent.  Here is the contents:

ctrl_interface=/var/run/wpa_supplicant
network={
ssid=”myssid”
scan_ssid=1
key_mgmt=WPA-PSK
proto=WPA
pairwise=CCMP TKIP
group=CCMP TKIP
psk=xxxxxxxx…xxxx
priority=3
}