libwebsockets.org

January 12th, 2013

In 2012 it turns out there was a great deal of interest in libwebsockets, the lightweight, portable C websockets library, that I did not entirely stay on top of due to work commitments.  In order to take care of that better going on I have created a trac and mailing list at http://libwebsockets.org/ where support and cooperation for improvements should be easier.

Recently libwebsockets has absorbed almost all of the patch delta in Dave Galeano’s github branch and added new features like configurable logging; the approach to autotools has also been fixed so it is much more compatible.  If you’re interested in libwebsockets please consider visiting http://libwebsockets.org/ and joining in.  Edit: http://git.libwebsockets.org and git://git.libwebsockets.org are both working now.  http://git.warmcat.com/cgi-bin/cgit/libwebsockets/ is a symlink of the same repo so you can use either.

libwebsockets new features

March 6th, 2011

More platforms

In addition to Linux, there are now users of libwebsockets on win32, OSX and iOS that have contributed patches now in git at http://git.warmcat.com/cgi-bin/cgit/libwebsockets OSX wasn’t too hard thanks to some patches from Darin Willits, win32 was a bit tougher despite a big patch from Peter Hinz, I was really surprised to find the the Microsoft compiler doesn’t support simple C99 features like named member initialization of structs. Anyway after dumbing down the few structs that used it, and adapting a few other things it’s now possible to use libwebsockets on win32 thanks to Peter’s work.

v05 and v06 protocol support

Both v05 and v06 support was added within 12 hours of the spec being released, at the time of writing v06 is the current version.

I also added v00 client support by request from a developer using the socket.io server library which doesn’t support anything except 76 / 00. That means that in libwebsockets, both server and client support now covers all combinations of v00. v04, v05 and v06.

Extensions and deflate-stream

There’s a new extension support infrastructure added, including an implementation of zlib compression to provide the standardized “deflate-stream” extension.

The benefit of running with compression depends a lot on the payload size and content; in order to reduce latency the compression buffers are partially flushed every frame. But it is an important part of the websockets standard that libwebsockets now supports.

Integration with external poll arrays

Libwebsockets is being integrated into a fork of ircd, the daemon which runs the IRC network. The challenge there is to interoperate will the existing single poll() loop in a way that maximizes blocking and maintains serialization of poll service in a single thread to avoid the need for locking.

In addition to its default private poll array management libwebsockets now provides poll array callbacks into the user code which enables integration of websocket event loop functionality into an existing, master poll() array.

Callbacks occur into user code when file descriptors must be added, removed or have their event masks changed. There are hash tables implemented in libwebsockets to allow everything to remain opaque to the host code using just the file descriptor as an index.

With this technique, it has been possible to integrate websocket functionality into an existing irc server while keeping all details about websocket functionality and protocol versioning in the library.

Nokia failure

February 11th, 2011

Nokia has seemed to be in a downward spiral for some years, now they really jumped the shark in losing sovereignty over their software stack.

http://conversations.nokia.com/2011/02/11/open-letter-from-ceo-stephen-elop-nokia-and-ceo-steve-ballmer-microsoft/

Can you imagine more worthless Microsoft-servitude bullshit than this (from the nokia link above)

There are other mobile ecosystems. We will disrupt them.

There will be challenges. We will overcome them.

Success requires speed. We will be swift.

Together, we see the opportunity, and we have the will, the resources and the drive to succeed.

Quelle leadership, Nokia employees.

QT is looking a bit pale too since it has no path forward after the Microsoft takeover of Nokia.

In other news, libwebsockets was updated to -05 within a few hours of the new spec being released.

libwebsockets now with 04 protocol and simultaneous client / server

January 22nd, 2011

76 + 4 = 80

The websockets protocol reached its 80th version recently; after v76 was widely implemented they renamed it 00 and continued meddling with it.  Many of the changes for 04 make a lot of sense, like moving keys and nonces into base64 encoded headers rather than raw 8-bit data.  One particular expensive change is to require a SHA1 per payload frame and XOR munging of all payload data in the client -> server direction; the server -> client frames remain unmunged.

Standards politics

The reasoning behind that particular change makes no sense and is an entirely political decision AFAICT; after a “security problem” was identified, Firefox and Opera disabled websockets by default in their dev builds, putting the whole effort in danger of collapse.  However on closer inspection, this “security problem” does not appear to identify any actual problem that exists in this world, and if it did exist it would be in the form of a broken intermediary / proxy that would remain broken and open to abuse no matter what websockets did to avoid enabling its theoretical exploitation (should it ever be discovered to exist).  The end result is a pointless and expensive payload munging scheme that doesn’t protect anything inserted into the standard, in order to encourage the browser vendors to re-enable support for it.

libwebsockets client support

I don’t have a browser with 04 support yet, but it is implemented into libwebsockets and tested via libwebsockets’ new client websocket support which is 04-only; the server support is 76/00 and -04 depending on what the individual client asks for on each connection.

The client support means you can connect to an -04 server as if you were a browser.

With support for 04 client transmission, the prepadding constant LWS_SEND_BUFFER_PRE_PADDING is increased to 14 reflecting the maximum needed to contain the new frame nonce along with the length coding, but you shouldn’t have to worry about that if you are using “LWS_SEND_BUFFER_PRE_PADDING”.

Since the client support is so integrated into the server support, I changed the name of the common init API to “libwebsocket_create_context()” instead of …create_server.

Breaking out the service loop

A libwebsocket user hit a problem with the forked service loop approach that the library was using because on his platform, IOS, fork() is not supported.  To allow libwebsockets to work in single-threaded environments I exposed a new api libwebsockets_service() that just needs to be called periodically to perform the poll() on all the sockets and handle incoming websocket traffic then.  There is a new configure option –enable-nofork which disables any references to fork() and similar in the sources and implies the user will call the service api periodically (as shown in the test server sources).

Support for client sockets is integrated into the server stuff, it means that even if your application is simultaneously a websocket server and client to other servers, there is still just a single service call / poll action.

Because of the integration, it means that a single protocol callback can handle both client and server callback reasons; I added a new reason LWS_CALLBACK_CLIENT_RECEIVE for client rx payloads so they are handled separately from the server rx payload callback reason LWS_CALLBACK_RECEIVE.

You can see all these changes in the client and server test apps that are part of the sources and built along with the library.

The test client connects to the test server with two websockets, one using the “dumb-increment” protocol where the server just keeps sending the client an incrementing number in ascii, and the other uses the “lws-mirror” protocol to draw circles in the canvas of any browser that is also connected to the same server.  On both the server and client side this is done fork and threadlessly, and the same server is able to deal with say a -76 version browser and an -04 libwebsocket client connected and interoperate between them.

New NXP LPC32x0 in Qi bootloader

November 29th, 2010

LPC3250 from scratch

NXP’s new LPC32x0 is a very cheap and feature-filled ARM926.  According to Digikey anyway, it’s the cheapest ARM chip with at least v5 instruction set that’s going.  That’s important not just because of the extra processor strength over older ARM9 core, but because ARM Fedora is built requiring armv5 or newer instruction set.  Being able to use ARM Fedora and RPM as a basis means freedom from compromise and having to own the building of an integrated, self-consistent rootfs; you can just focus on doing your specialized code on top using the reliable Fedora quality basis.

There are four chips in the series, they differ in having an LCD controller and Ethernet MAC or not; also the smallest guy LPC3220 has “only” 128KBytes of Static IRAM and the others 256KBytes.  Well, having worked with the 2KBytes of internal static RAM on the iMX31 for SD boot on Qi, having to shoehorn an SD card driver in there, even 128KBytes is crazy amounts.

They have support for resistive touchscreen, USB OTG, NAND controller and Mobile DDR, and up to 266MHz CPU clock at 1.4V Vcore (208MHz at 1.2V Vcore but as we will see that is not entirely true).  They don’t support SD Card boot from ROM, but that can be solved for about US$0.30 as will be shown.

In short they’re ready to do some serious embedded work at a budget price.

Embedded Artists EA3250 Dev kit

There are a few dev kits around for LPC32x0, Hitex have a cheap USB stick format one that has been permanently two weeks away from availability since I first looked at it a month or so ago, and it still is two weeks away.

NXP anoited two real dev boards they evidently worked with the vendors for during development, they don’t actually make an NXP branded dev board, it’s Phytec and Embedded Artists.  Since the EA one is in Digikey, that’s what I ended up with.

The dev board is well made but there are some problems with it: like many dev boards it comes in two halves, a cheaper, large breakout board and a 8-layer DIMM type board that has the actual CPU BGA and memory.  In an act of supreme lunk-headedness, the large breakout board re-uses the Pn.m nomenclature that the CPU uses for GPIO, with no care to retain the CPU mapping.  So for example a header is marked with having a pin P1.27, very confusingly this is nothing to do with the CPU GPIO P1.27.  This is also true in the schemtatics for the baseboard and CPU board, complete confusion trying to trace a signal between the two boards or looking for a misnamed signal on the baseboard.

DDR trouble #1

There’s also a more serious problem, the DDR on the CPU card is marginal and Embedded Artists have made a recall where they will replace the board with one with a different DDR DRAM for free.  The CPU board I got was affected but not at room temperature; they want the old card sending back and I am not finished with it yet, so I will take advantage of this recall later.

DDR trouble #2

There’s another problem with DDR, NXP issued an errata confessing their inverted signal for the differential DDR clock is skewed by no less than 1.2ns from the uninverted partner of the differential pair, a huge skew.  This issue removes a lot of comfort zone from designing with DDR and means only some memory devices will tolerate it.  However in the EA board case, they have not used the workaround suggested by NXP which is to nuke the inverted output entirely and make the clock unipolar, so the situation can’t be that bad.

DDR trouble #3

The last problem with DDR… operation at 208MHz with 1.2V Vcore is fine for the CPU, in fact while screwing with the PLL I had the CPU running fine at 400MHz, although there is no way to divide anything useful down for the memory clock at that speed and it’s illegal for the PLL over temperature, which tops out at 320MHz.  However at 1.2V and 208MHz, the CPU side of the DDR bus is unreliable: it requires cranking to 1.4V to operate DDR even at 104/208MHz.  That’s annoying because since 1.2V is needed anyway for other circuitry, it could have saved a regulator.

Unbrickability of LPC32x0

LPC32x0 chips feature UART-based bootloader injection… if you pull down the SERVICE_N pin, then next boot the ROM in the CPU will bring up UART5 at 115200 n81 and issue a simple protocol byte allowing for bootloader download.

Since I couldn’t find a Linux tool for injecting bootloaders, just a Windows one, I wrote a commandline tool for it and added it to Qi build.

http://git.warmcat.com/cgi-bin/cgit/qi/tree/tools/lpcboot.c?h=lpc

No matter how broken your nonvolatile image gets, it’s still possible to recover the device via this UART scheme with a USB <-> LVTTL serial cable.

Bootloader Hell

The LPC32x0 bootloader situation is ugly.  Basically NXP provided a huge suite used for chip verification called CDL (“common driver library”), this is a sort of chopped down OS in bootloader form.  It has all kinds of functions to drive the chip peripherals and test memory, but nothing to actually boot Linux!

What EA shipped, and what you are meant to do as a system integrator, is get an implementation of CDL in the form of “S1L” — stage one bootloader — to load U-Boot, which will then load Linux.  Both U-Boot and S1L — itself like 130KBytes! — store “state” on the board.  It leads to this insane situation that two bootloaders with two kinds of state must be right in order to boot.  Things are further complicated that SPI boot only allows the first 56KBytes to be loaded by ROM into IRAM and executed, but the bloated bootloaders are too big to do this in one step.

Bootloader Heaven

I added support for LPC32x0 to Qi last week, this is a single < 30KBytes image that can boot itself from SPI Flash or UART 5 injection and pull Linux from SD Card in VFAT partition or also via SPI Flash.  Boot from cold, with Qi and Kernel in SPI Flash to Fedora 12 bash prompt is less than 4 seconds.

http://git.warmcat.com/cgi-bin/cgit/qi/log/?h=lpc

This replaces both S1L and U-Boot, and in accordance with Qi philosophy it holds no state at all on the device.

Its strategy is if it finds that it is running via injection on UART5, it copies itself into SPI Flash / EEPROM so it will run next boot from there, and if it finds an SD Card kernel image it will also copy that into SPI Flash.

When it finds it is running from a non-injection source, ie, a normal boot from SPI Flash, it favours any kernel it can find on the first, VFAT, partition of an SD Card if found, otherwise it boots from the kernel also in SPI Flash.

This is why the lack of ROM -> SD Card boot is not critical, the cheapest, smallest SPI EEPROM can be used to contain Qi, which will then load the kernel and rootfs from SD Card if that’s what’s needed as during development.  If SD Card is overkill for the job, then Qi, Kernel and initrd can all be pushed into a single US$2 32MBit SPI Flash.

Since I only have the Embedded Artists board right now it wants to see a kernel image called k-ea3250.img on the SD Card; the way Qi works you add a new file for each supported board in ./src/cpu/lpc32x0/ copied from embart-steppingstone.c in that directory; the bootloaders need some way to identify what they’re running on at runtime since there is only a single image per cpu that supports all devices.  See  http://git.warmcat.com/cgi-bin/cgit/qi/tree/src/cpu/lpc32x0/embart-steppingstone.c?h=lpc for an idea of what’s involved to support a new board in the bootloader image.

libwebsockets now with SSL / WSS

November 8th, 2010

happy phoneSSL encrypted websockets

The websocket protocol allows for two kinds of transport, unencrypted ws:// sockets and encrypted wss:// ones.  The server on a given port is either listening unencrypted initially for http:// connections, or encrypted for https:// ones using SSL.

Today I added optional SSL support for libwebsockets using OpenSSL, so it now supports encrypted or unencrypted types.  When you connect by encrypted, you simply use a https:// URL to the server.  The server returns the script over the encrypted link, and the script on the client side opens a wss:// websocket on the server.  Otherwise the encryption is completely transparent.  In particular, the callback the library makes back into the user code for the server is totally unaware if it is being used over SSL or not.

I adapted the javascript that the test server sends to open ws:// or wss:// according to whether its own URL was http:// or https://.

The test server builds its own test https:// certificate, browsers correctly warn that the CA is not recognized but otherwise the certs work correctly in Firefox 4.0b6 and Chrome 8.0.552.28 beta, both current on Fedora F15 rawhide.

Changed license to lgpl2.1

I realized that GPL2 isn’t the best idea for this as a library so I changed the terms to LGPL-2.1 making it easier to integrate with systems using other licenses.

Autotools

The build system has also been moved to autotools / libtool so it has a traditional ./configure structure that should survive crossplatform builds better.  It now has an –enable-openssl switch to control if openssl is needed.

You can get libwebsocket via git by:

git clone git://git.warmcat.com/libwebsockets

libwebsockets – HTML5 Websocket server library in C

November 1st, 2010

Browser vs Apps

It’s been clear since browsers first started becoming popular in the 90s that they were going to be the answer to standardized cross-platform support, but somehow there were never quite enough pieces of the puzzle to replace applications outright. Java or Flash or me-toos like Sliverlight were needed and despite Flash solving the problem of video delivery, there hasn’t really been a shift away from old-style apps to the browser. (When I wrote Penumbra in 2007, I was able to use an exclusively https browser interface, but that’s only because it was fundamentally a filesharing app that didn’t challenge simple HTML).

The issue has never been more urgent because the number of incompatible platforms in wide use has been increasing, with iPhone. Android, Macs and Linux boxes alongside Windows. Making native apps for each platform is still possible, but it’s now a very large effort to cover and support all the platforms well natively.

HTML5 vs flash

HTML5 looks like it might have enough firepower to eliminate flash, it has already proven with web-m that it will be able to replace flash for the most critical job it does for the internet as a whole, video delivery, without having to worry too much about patents. Because of that, it has increasing mindshare and there’s already a lot of support in place in recent browsers, eg, Chrome and Firefox 4.0b6 at the time of writing, and considering Chrome is webkit, that covers many embedded scenarios too; Apple have committed themselves to HTML5 support in order to screw over Adobe… uh… I mean as part of their love of open standards.

Adobe did make actionscript a standard, but they have never been able to get away from being denounced as the main cause of browser crashes.  HTML5 moves all the hard work Adobe tried to do by themselves in terms of cross-platform media support to the people writing the browser and eliminates the need for Flash.

Websockets

Websockets are a new part of HTML5 that allow the client to get away from the ancient bias of browsers that any network connection is ultimately there to serve some kind of …ML, HTML or XML or whatever.  Websockets start off life as an HTTP connection, but the client immediately sends a request to the HTTP server to “upgrade” the protocol to websocket protocol.

After a complex handshake confirming both sides really speak websocket, websocket protocol is MUCH simpler than HTTP.  In the case of UTF-8 text packets, it’s as simple as sending 0×00 <vari-size payload> 0xff to terminate.  Binary payload packets have a slightly more complex length descriptor and then the payload with no terminator.

The value of it over http is the javascript on the client side can just get the raw binary or UTF-8 payload, and the socket stays open for async traffic in either direction.  There is no HTTP header overhead on each packet, as mentioned for UTF-8 the protocol overhead is 2 bytes per packet only.  There’s no huge XML encode / decode overhead either, so this is a great transport for low-latency data like speech, and it’s no-messing async nature lets it carry event information too ajax-style.

Because (once the connection is established) the protocol overhead is so low, it’s very suitable for weak embedded devices that have some kind of network connectivity but no real UI capability or CPU cycles for bloating data into formats browsers otherwise prefer.

Websocket servers

Sounds good right?  Well, to use it practically you need server-side support, because you are literally using a new socket-level protocol other than http.  There are Java and Python implementations suitable for Apache… but… unlike http there are no C library implementations suitable for embedded devices.  So, I wrote libwebsockets to allow embedded devices to participate in the new UIs possible with HTML5 and websockets.

Introducing libwebsockets

libwebsockets (in git at http://git.warmcat.com/cgi-bin/cgit/libwebsockets/ ) is a lightweight GPL2 http and websocket server that hides all the protocol handshakes and detail from the user code driving the server.

Because it supports file serving on http, it is able to provide a single listening socket that can serve your html script page normally and then when the browser starts running your script, come back and make websocket connections to the same port.

A test server is provided

http://git.warmcat.com/cgi-bin/cgit/libwebsockets/tree/test-server/test-server.c

because everything to do with the protocols is handled by the library, it’s very simply able to serve http and websockets using a single callback.

Don’t let Production Test Be Special

February 12th, 2010

Lesson 3: Test is not special

Commonly in embedded work test is the “red-haired stepchild”, nobody wants to take care of it and by common, silent consent it is always left until last.  Eventually the need for a test plan becomes overwhelming as the date to go to the factory nears, and the task is assigned to the most junior engineers available, since everybody knows that test is the death knell of your career.

Coming cold to and excluded from being inside an already-existing project, the engineers try to create some kind of test coverage the best way they can.  At openmoko two giant test suites were created, DM1 and DM2, written by people who were learning C for the first time.  I got the job of modernizing this code so I know from experience the code was already truly terrible and bitrotted at an alarming rate.  However I had to admire the guys who wrote it, with everything against them and little experience they did manage to create something that did provide test coverage at the factory, however much it was on life-support.

Totentanz

Similarly, Openmoko used production test jigs, special additional PCBs that formed a kind of custom test environment for the PCB under test.  At one version of GTA03 there were so many test points added it was a serious concern that the board would break down under the overall pressure needed to mate the spring-loaded test probes to the test points.

Jigs and test points have an obvious advantage in terms of test throughput, but there are some big disadvantages.

First, you have to design and build the jig, and track changes to the actual device with it.  This effort is completely disconnected from moving your actual product on, except that it’s meant to help in production.

Second, test points don’t test your connectors; the test point may be connected OK but not the connector pin the user actually accesses.

Third, you need something else outside the device to assess what is happening on the test points, the code for that also has to be written and maintained against changes in the actual product.  It also means that it’s not possible for the tests to be casually performed outside the factory, or maybe by the original engineers if they have access to the ATE gear themselves.

Pain into torture

Additionally the bringup of GTA02 required special versions of U-Boot and kernel which had added “test magic” created by the test guys and unknown to anyone else.  These versions were seldom uplevelled.

Since GTA02 had raw NAND, it needed filling up at the factory with the rootfs.  The way to do this was via a very fragile OpenOCD using a custom USB – serial based device that was bitbanged.  It only worked with certain versions of the usb library needed to talk to it.

All of these quirks and requirements at the factory made production runs difficult and expensive to get right.

I only hurt you because I love you

I spent a lot of time thinking about how to avoid this end result next time I would design something.  The mistakes started in having anything special for test I concluded.  The jig: special, and so evil.  Test kernels or bootloader: special -> evil.  Test rootfs -> Evil.  test software, like Openmoko’s DM1 and DM2, evil.  The device should naturally be able to test itself with the arrangements that already exist inside it to operate at all.

The answer to the problem of “production test” is to completely subsume it into the rest of the design.  So it is the responsibility of Linux drivers to provide enough functionality by probe errors, or sysfs features, that one can perform test and diagnosis.  The “test suite” should boil down to a bash script that is using features exposed in a normal shipping rootfs and kernel.  Bash is ideal because most of the test action will be calling existing commandline tools like ifconfig, ping, l2ping and grepping or looking at their return code, this is what bash is best at.  It’s also easily understood and edited by anyone who has worked with Linux for a while.

The bootloader is required for test in only one capacity, it is the only part of the system that is capable to run the SDRAM tests; once you enter Linux you can’t perform a full SDRAM test any more.  But even that should be done by the one shipping bootloader image.

In many cases, device interfaces can be tested by external loopback connectors, this proves connectivity through the connectors and it leaves open the possibility of end-users being able to run the same tests on the shipping rootfs.

Bootloader Envy

February 8th, 2010

Lesson #2:  A bootloader is to load and boot Linux

On the first day of FOSDEM I sat through a presentation on what could be called another “U-Boot derivative”.  One of the greatest asspains at Openmoko was the various kinds of Hell caused by the U-Boot bootloader and its philosophy, which can be summed up as “I wanna be Linux when I grow up”.

Configure system is a bad alternative to good bootloader design

First, it has a config system.  That should be good though, right?  The problem with the config system is that if anything differs from your current config, you must build another incompatible binary with another config and take care of that.  When you have more than a handful of different boards, you are in a maze of incompatible bootloaders.  Openmoko took it one step further, they mandated a different bootloader binary per PCB revision, so left unchecked there would have been a continuous proliferation of incompatible bootloaders, all basically the same.

All persistent bootloader private state is EVIL

Second, U-Boot thinks it’s a good idea to have these environment “scripts”, because it’s “configurable”.  Actually, the job of a bootloader is to Load, then Boot Linux.  You don’t need any configurability for that if the bootloader can figure out what it’s running on and therefore where the memory is and how much there is.  These scripts expose a really deadly trap I call “private bootloader state”.  It means the bootloader stores stuff in nonvolatile memory on the PCB and acts different according to what it hides there.  The end result is that two boards from the same factory may act totally different even with the same rootfs due to “bootloader secrets”.  This is totally needless and ALL private bootloader state can be eliminated by correct design of the bootloader leading to completely deterministic boot action per rootfs.

A good example how that lead you to the path to hell is hardcoding in the U-Boot environment of the amount of kernel image you will copy from somewhere.  People commonly set it to 2MBytes, forget about it and one day they generate a 2.1MB kernel image and wonder why decompress blows up.  Actually, that whole procedure is insane, the kernels are uImages that report their length in a header.  The bootloader should examine the header and compute the length of image to pull.  But that doesn’t fit with this “environment” nonsense.

Do Linux Stuff In Linux

In any of these bloated U-Boot style bootloaders, is there even one feature they do better than the same feature in Linux?  The startup time should be better by a few 100ms.  Other than that, no, every single bloated “I will add it to the bootloader beacuse I can” feature is shittier than you get in Linux.  Every single feature!

If you need some advanced capability or backup / recovery boot action, check for a button held down at boot-time in the bootloader and go fetch a different Linux partition + kernel.  Use standard Linux tools and shells.  In return, get really high quality network stack, proper USB support, NAND access that’s compatible to your main Linux system access in BBT / ECC terms, and all the other advantages of Linux.

Do your peripheral bringup in drivers in Linux

Typically you do not need ANY bringup in the bootloader except SDRAM controller and chip init, since it’s a prerequisite to put Linux in the RAM that it’s initialized.

That’s right, all the megabytes of source spent in U-Boot providing support for so many kinds of peripheral is a waste of time, effort and maintenance.  I am being kind saying “maintenance”, because the drivers in U-Boot are typically “dumbed down” versions of the equivalent Linux driver that were forked irretrievably the moment all the Linux APIs were ripped, so there’s no coherent effort to keep them up to date with the Linux ones .  Lately I saw that they try to ape some Linux APIs there… why not go the whole hog and just load and boot real Linux?  After all, modern CPUs can be running your driver probes in Linux in ~2 seconds from power using a bootloader that doesn’t get in the way.

You typically don’t even need to talk to the PMU in the bootloader, after all, you are running code fine already, right?  Otherwise you wouldn’t be able to run the bootloader code itself.

Fat girl in Ibiza

At least at Openmoko, code quality inside U-Boot was awful bad.  I called U-Boot on the lists there “the fat girl in Ibiza” because you know she’s going to do anything you want.  All kinds of constant-only code, weird new scripting keywords were added for test undocumented, you name it.  Hardware guys felt up to writing such code secretly by themselves once they learned the software engineering marvel that is *((unsigned int *)0x…) = 0x…;

Your bootloader just tests SDRAM

There’s only one test action your bootloader is suited to do, and that is SDRAM test.  Once you are in Linux, it can’t perform a full SDRAM test while it’s running.  But the bootloader is typically starting from on-CPU SRAM, it can actually run a true SRAM test from there.  Otherwise, the bootloader should be completely absent from the test plan.  All other tests should be performed in Linux via standard driver and rootfs tools.

More about board and test and board bringup will feature in another report of a lesson learned.

Qi

While at Openmoko (mainly) I wrote a bootloader that meets these ideals, you can find it in git here One of the nicest things about it is that unlike the bloated bootloaders whose job never finishes trying to become Linux cargo cult style, Qi has been pretty much complete for a few months.  It’s a new job to support a new CPU, a much smaller job to add a new board and it doesn’t want to talk to your peripherals anyway so no problem there.

Qi creates one binary per CPU, that supports all boards with that CPU.  That sounds like a big job but we don’t care about your peripherals so all boards with the same CPU look almost identical.  You have to find something that can detect your particular board at runtime, for example NOR device ID read check.  So there is zero build-time config and Qi generates all CPU support when it’s buit, it takes 3 sec or so typically.

Typical bootloader binary size per CPU is 28-30KBytes.  That supports VFAT, ext2/3/4 typcially the SD controller as well.  The single Qi image also supports being booted from NAND, JTAG or SD Card on processors that support it just by being copied into place and without any changes.

There is zero bootloader private state, however Qi can look in the rootfs and append kernel commandline text from the content of a filesystem file.  This maintains the rule that boot should be completely deterministic per rootfs.

Fosdem and the Linux Cross Niche

February 8th, 2010

fosdem

I was at Fosdem over the weekend, there were several interesting talks I attended but the most interesting one for me was a roundtable about the future of Cross distributions.  I was invited to give a 5 minute talk there which I gave, but unfortunately it was right at the end and the people before had overrun, so there was no time to make much of a coherent case.   So I am going to write some articles covering the issues involved here.

Cross as a niche

Cross itself remains absolutely necessary for systems below a certain level of horsepower.  For example, 8051, ARM7, cortex M3 are not really capable to consider native build.  But processors get faster each year, a lot of things we would have used an 8051 on use an ARM7 or cortex M3 now, in a few years it is likely that baseline has moved further up and it’s an ARM9 equivalent.  What I am suggesting then is that over time, the niche where you need cross is shrinking.

All four of the cross distros at FOSDEM target a CPU that’s powerful enough to run Linux, but not powerful enough to build its own binaries.  That is the niche that I believe will shrink to the point that it won’t support all these cross Linux distro projects, possibly none of them in the end.

My background with cross Linux

A few years ago I created an RPM-based cross distro singlehanded, and used it on a product for a customer  This was AT91RM9200-based, a 200MHz ARM9 with 32MBytes of SDRAM.  The amount of effort needed to create a set of cross packages sufficient to create a workable rootfs was huge, it took me many weeks.  Some packages like perl were just so cross-unfriendly that they were basically out of reach (although I later saw other people have done the invasive magic necessary).  It did work well, and I added patches for busybox RPM support that allowed it to do more useful things like erase and keep a package database.  The packaging was valuable in itself but a nice advantage was the source RPMs it generated ensuring GPL compliance.

My background with Openmoko

Subsequently I spent 14 months as (mainly) the kernel maintainer for Openmoko.  Openmoko had an OpenEmbedded basis for it’s rootfs, also a cross system.  I attempted to use it for “hello world” while I was at Openmoko, but it broke because I was on a newly released Fedora.  How it broke was very revealing, the official way to get started with it was to run a huge script that wgetted and locally built 1100 packages.  It died due to some assumption somewhere breaking while it tried to build host dbus libraries.

What I wanted was a cross toolchain that would let me package “hello world”.  What I got was a massive host build action including host dbus libs.  I have perfectly good host dbus libs in my Fedora install, I enquired about it and was told they were the “wrong” libs for the expectation of the rest of the packages, so they had to be rebuilt.

I gave up on trying to use OpenEmbedded, as I guess most of Openmoko’s customers did.

After Openmoko imploded, I designed the software architecture (and influenced the hardware design in some aspects) for the txtr reader device.  On this device, I put into action various lessons I had learned in how not to do things from Openmoko.  I will write further about the other lessons in future articles, but here’s the first one:

Lesson #1: Don’t compile your own rootfs

I was told by a manager at Openmoko that Openmoko had hired most of the main devs of OpenEmbedded and were paying for that accordingly.  This was a pretty big drain on their resources over a long period.

In contrast, nowadays you can head over to http://fedoraproject.org/wiki/Architectures/ARM and download a generic rootfs tarball of prebuilt binaries for ARMv5 and above[1].  It’s made from unpacking prebuilt binary packages.  Once you boot into it, you can install further packages with the usual yum install type action.  You can be up in a high quality rootfs in five minutes flat.

You do not need to go around compiling everything personally when binary packages exist from a reputable distro already.  Normal distros provide -dev and -devel packages for you to link against too, so you do not need to recompile the universe just because you want to build “hello world” either.  That’s how we do things on desktop and server systems, as the processors involved get stronger embedded does not have to be different.

If you want to cross-build specific packages, you install the Fedora ARM Cross Toolchain RPMs on you host via yum and you are ready to go in a couple of minutes.  This is very useful for cooking the kernel on your host both to get started and during development; you can’t native-build the bootstrap stuff needed to boot your platform.  But that’s just a cross compiler and related pieces, it’s not a cross distro.  (The guy from emdebian at this FOSDEM talk also made this point that you do not need to get into making your own toolchain, your distro should have one you can just install).

Fedora ARM’s strategy is native build.  So you install gcc and other dependencies into the actual device, and use standard rpmbuild to build your package there; you can also just configure ; make ; make install for development too down there.  If something’s missing on the rootfs you can yum install it.

(1 To make the comparison fair to openmoko Fedora ARM came along too late for them to choose it from the start, and the GTA02 s3c2442 was not a v5 class processor, they would have been into a distro recook after changing the distro-level compile options.  However my worry is not repeating Openmoko’s errors and today Fedora ARM is available.)

Quality and Quantity

Another major issue is distro quality.  I was so surprised to hear at Fosdem Dr Mickey Lauer of OpenEmbedded boast about the number of devices that managed to use that distro (including the sad shape of the GTA02) and say that unlike the other cross distros, OpenEmbedded focused on “Quantity not Quality”.  From my experience I think he’s right alright about not focusing on quality, and he did go on to explain there are problems with OpenEmbedded they are trying to address.

In the near future, there will be a carcrash between these difficult cross distros that have relatively poor quality and strange requirements to use them and standard, “proper distros” like Fedora ARM, because on higher-end ARMv5s say 400MHz and above, it is already perfectly possible to compare the two worlds on the same device.  I think many devs currently are trained by their experience with buildroot type systems to assume they have to personally build everything Gentoo style.  However as CPUs increase in power at the same price point, the ways of working with these systems efficiently change, and desktop / server “treat it like a PC” lessons like the value of packaging start to really show their traditional advantages over rootfs tarballs.

Like Debian, Fedora has all kinds of rules and requirements about packaging to ensure high quality, there are a huge number of users of these two normal distributions that leads to tested and debugged basic packages and their dependencies.  OpenEmbedded’s boast about number of users is not even a blip in comparison to Fedora or Debian’s consumers and contributors.

Cross distros are locked into local patch hell

A worse problem against their quality even than not many users is the patch load these projects are carrying, I think all of the cross distro projects bemoaned that they were carrying huge patchsets across a large number of packages to get them to build cross at all, and that most upstreams did not care to take them (I assume they don’t want to have to get into testing them).  To uplevel packages, which distros have to daily when they have a large package universe, it can become a nightmare of breakage because of the private patchsets being dragged around.

(BTW I also saw in another presentation that the limo foundation are carrying around more than 80MBytes of diff between their distro and the upstream projects, and these are the guys who sent out a whitepaper explaining the massive cost of delaying sending patches upstream in dollar terms.)

There was proposed a unified crossbuild patch promoting effort, but the effort seemed only to consist of a domain like “sends-patches.org” that you could use when sending patches instead of your own project name, which seems to just be tea and sympathy rather than a solution.

It’s clear that quality will tend to be higher if you are getting packages built with normal distro specfiles and no pile of local patches to get them to build cross (because they were built native).  Combined with higher quality thresholds at the project level and sheer number of users, native Fedora (or Debian) rootfs basis will provide Quantity and Quality if your processor is appropriate.

A couple of hours after the talk I had an interesting conversation with OpenInkpot dev Mikhail Gusarov, who I found also shared my lack of enthusiasm for OpenEmbedded, although he is trapped still in the cross niche generally by the weak processors he targets at the moment.

[update Feb 10 09:00] Mikhail has written his own response, he still likes the speed of cross (and still hates OpenEmbedded).  But there’s some confusion about what Fedora ARM offers, it’s a generic ARMv5 rootfs, it doesn’t care what exact kind of CPU, vendor or peripherals available.  Build farms are less of a requirement when you are no longer building your rootfs but installing it from distro binary packages.  Sheevaplug makes available a 1.2GHz Marvell ARM compatible with 512MBytes of SDRAM that Fedora ARM can work on if you need a native build machine.  Shortly fast dual processor Cortex A9 machines will become available.