Archive for November, 2010

New NXP LPC32x0 in Qi bootloader

Monday, November 29th, 2010

LPC3250 from scratch

NXP’s new LPC32x0 is a very cheap and feature-filled ARM926.  According to Digikey anyway, it’s the cheapest ARM chip with at least v5 instruction set that’s going.  That’s important not just because of the extra processor strength over older ARM9 core, but because ARM Fedora is built requiring armv5 or newer instruction set.  Being able to use ARM Fedora and RPM as a basis means freedom from compromise and having to own the building of an integrated, self-consistent rootfs; you can just focus on doing your specialized code on top using the reliable Fedora quality basis.

There are four chips in the series, they differ in having an LCD controller and Ethernet MAC or not; also the smallest guy LPC3220 has “only” 128KBytes of Static IRAM and the others 256KBytes.  Well, having worked with the 2KBytes of internal static RAM on the iMX31 for SD boot on Qi, having to shoehorn an SD card driver in there, even 128KBytes is crazy amounts.

They have support for resistive touchscreen, USB OTG, NAND controller and Mobile DDR, and up to 266MHz CPU clock at 1.4V Vcore (208MHz at 1.2V Vcore but as we will see that is not entirely true).  They don’t support SD Card boot from ROM, but that can be solved for about US$0.30 as will be shown.

In short they’re ready to do some serious embedded work at a budget price.

Embedded Artists EA3250 Dev kit

There are a few dev kits around for LPC32x0, Hitex have a cheap USB stick format one that has been permanently two weeks away from availability since I first looked at it a month or so ago, and it still is two weeks away.

NXP anoited two real dev boards they evidently worked with the vendors for during development, they don’t actually make an NXP branded dev board, it’s Phytec and Embedded Artists.  Since the EA one is in Digikey, that’s what I ended up with.

The dev board is well made but there are some problems with it: like many dev boards it comes in two halves, a cheaper, large breakout board and a 8-layer DIMM type board that has the actual CPU BGA and memory.  In an act of supreme lunk-headedness, the large breakout board re-uses the Pn.m nomenclature that the CPU uses for GPIO, with no care to retain the CPU mapping.  So for example a header is marked with having a pin P1.27, very confusingly this is nothing to do with the CPU GPIO P1.27.  This is also true in the schemtatics for the baseboard and CPU board, complete confusion trying to trace a signal between the two boards or looking for a misnamed signal on the baseboard.

DDR trouble #1

There’s also a more serious problem, the DDR on the CPU card is marginal and Embedded Artists have made a recall where they will replace the board with one with a different DDR DRAM for free.  The CPU board I got was affected but not at room temperature; they want the old card sending back and I am not finished with it yet, so I will take advantage of this recall later.

DDR trouble #2

There’s another problem with DDR, NXP issued an errata confessing their inverted signal for the differential DDR clock is skewed by no less than 1.2ns from the uninverted partner of the differential pair, a huge skew.  This issue removes a lot of comfort zone from designing with DDR and means only some memory devices will tolerate it.  However in the EA board case, they have not used the workaround suggested by NXP which is to nuke the inverted output entirely and make the clock unipolar, so the situation can’t be that bad.

DDR trouble #3

The last problem with DDR… operation at 208MHz with 1.2V Vcore is fine for the CPU, in fact while screwing with the PLL I had the CPU running fine at 400MHz, although there is no way to divide anything useful down for the memory clock at that speed and it’s illegal for the PLL over temperature, which tops out at 320MHz.  However at 1.2V and 208MHz, the CPU side of the DDR bus is unreliable: it requires cranking to 1.4V to operate DDR even at 104/208MHz.  That’s annoying because since 1.2V is needed anyway for other circuitry, it could have saved a regulator.

Unbrickability of LPC32x0

LPC32x0 chips feature UART-based bootloader injection… if you pull down the SERVICE_N pin, then next boot the ROM in the CPU will bring up UART5 at 115200 n81 and issue a simple protocol byte allowing for bootloader download.

Since I couldn’t find a Linux tool for injecting bootloaders, just a Windows one, I wrote a commandline tool for it and added it to Qi build.

http://git.warmcat.com/cgi-bin/cgit/qi/tree/tools/lpcboot.c?h=lpc

No matter how broken your nonvolatile image gets, it’s still possible to recover the device via this UART scheme with a USB <-> LVTTL serial cable.

Bootloader Hell

The LPC32x0 bootloader situation is ugly.  Basically NXP provided a huge suite used for chip verification called CDL (“common driver library”), this is a sort of chopped down OS in bootloader form.  It has all kinds of functions to drive the chip peripherals and test memory, but nothing to actually boot Linux!

What EA shipped, and what you are meant to do as a system integrator, is get an implementation of CDL in the form of “S1L” — stage one bootloader — to load U-Boot, which will then load Linux.  Both U-Boot and S1L — itself like 130KBytes! — store “state” on the board.  It leads to this insane situation that two bootloaders with two kinds of state must be right in order to boot.  Things are further complicated that SPI boot only allows the first 56KBytes to be loaded by ROM into IRAM and executed, but the bloated bootloaders are too big to do this in one step.

Bootloader Heaven

I added support for LPC32x0 to Qi last week, this is a single < 30KBytes image that can boot itself from SPI Flash or UART 5 injection and pull Linux from SD Card in VFAT partition or also via SPI Flash.  Boot from cold, with Qi and Kernel in SPI Flash to Fedora 12 bash prompt is less than 4 seconds.

http://git.warmcat.com/cgi-bin/cgit/qi/log/?h=lpc

This replaces both S1L and U-Boot, and in accordance with Qi philosophy it holds no state at all on the device.

Its strategy is if it finds that it is running via injection on UART5, it copies itself into SPI Flash / EEPROM so it will run next boot from there, and if it finds an SD Card kernel image it will also copy that into SPI Flash.

When it finds it is running from a non-injection source, ie, a normal boot from SPI Flash, it favours any kernel it can find on the first, VFAT, partition of an SD Card if found, otherwise it boots from the kernel also in SPI Flash.

This is why the lack of ROM -> SD Card boot is not critical, the cheapest, smallest SPI EEPROM can be used to contain Qi, which will then load the kernel and rootfs from SD Card if that’s what’s needed as during development.  If SD Card is overkill for the job, then Qi, Kernel and initrd can all be pushed into a single US$2 32MBit SPI Flash.

Since I only have the Embedded Artists board right now it wants to see a kernel image called k-ea3250.img on the SD Card; the way Qi works you add a new file for each supported board in ./src/cpu/lpc32x0/ copied from embart-steppingstone.c in that directory; the bootloaders need some way to identify what they’re running on at runtime since there is only a single image per cpu that supports all devices.  See  http://git.warmcat.com/cgi-bin/cgit/qi/tree/src/cpu/lpc32x0/embart-steppingstone.c?h=lpc for an idea of what’s involved to support a new board in the bootloader image.

libwebsockets now with SSL / WSS

Monday, November 8th, 2010

happy phoneSSL encrypted websockets

The websocket protocol allows for two kinds of transport, unencrypted ws:// sockets and encrypted wss:// ones.  The server on a given port is either listening unencrypted initially for http:// connections, or encrypted for https:// ones using SSL.

Today I added optional SSL support for libwebsockets using OpenSSL, so it now supports encrypted or unencrypted types.  When you connect by encrypted, you simply use a https:// URL to the server.  The server returns the script over the encrypted link, and the script on the client side opens a wss:// websocket on the server.  Otherwise the encryption is completely transparent.  In particular, the callback the library makes back into the user code for the server is totally unaware if it is being used over SSL or not.

I adapted the javascript that the test server sends to open ws:// or wss:// according to whether its own URL was http:// or https://.

The test server builds its own test https:// certificate, browsers correctly warn that the CA is not recognized but otherwise the certs work correctly in Firefox 4.0b6 and Chrome 8.0.552.28 beta, both current on Fedora F15 rawhide.

Changed license to lgpl2.1

I realized that GPL2 isn’t the best idea for this as a library so I changed the terms to LGPL-2.1 making it easier to integrate with systems using other licenses.

Autotools

The build system has also been moved to autotools / libtool so it has a traditional ./configure structure that should survive crossplatform builds better.  It now has an –enable-openssl switch to control if openssl is needed.

You can get libwebsocket via git by:

git clone git://git.warmcat.com/libwebsockets

libwebsockets – HTML5 Websocket server library in C

Monday, November 1st, 2010

Browser vs Apps

It’s been clear since browsers first started becoming popular in the 90s that they were going to be the answer to standardized cross-platform support, but somehow there were never quite enough pieces of the puzzle to replace applications outright. Java or Flash or me-toos like Sliverlight were needed and despite Flash solving the problem of video delivery, there hasn’t really been a shift away from old-style apps to the browser. (When I wrote Penumbra in 2007, I was able to use an exclusively https browser interface, but that’s only because it was fundamentally a filesharing app that didn’t challenge simple HTML).

The issue has never been more urgent because the number of incompatible platforms in wide use has been increasing, with iPhone. Android, Macs and Linux boxes alongside Windows. Making native apps for each platform is still possible, but it’s now a very large effort to cover and support all the platforms well natively.

HTML5 vs flash

HTML5 looks like it might have enough firepower to eliminate flash, it has already proven with web-m that it will be able to replace flash for the most critical job it does for the internet as a whole, video delivery, without having to worry too much about patents. Because of that, it has increasing mindshare and there’s already a lot of support in place in recent browsers, eg, Chrome and Firefox 4.0b6 at the time of writing, and considering Chrome is webkit, that covers many embedded scenarios too; Apple have committed themselves to HTML5 support in order to screw over Adobe… uh… I mean as part of their love of open standards.

Adobe did make actionscript a standard, but they have never been able to get away from being denounced as the main cause of browser crashes.  HTML5 moves all the hard work Adobe tried to do by themselves in terms of cross-platform media support to the people writing the browser and eliminates the need for Flash.

Websockets

Websockets are a new part of HTML5 that allow the client to get away from the ancient bias of browsers that any network connection is ultimately there to serve some kind of …ML, HTML or XML or whatever.  Websockets start off life as an HTTP connection, but the client immediately sends a request to the HTTP server to “upgrade” the protocol to websocket protocol.

After a complex handshake confirming both sides really speak websocket, websocket protocol is MUCH simpler than HTTP.  In the case of UTF-8 text packets, it’s as simple as sending 0×00 <vari-size payload> 0xff to terminate.  Binary payload packets have a slightly more complex length descriptor and then the payload with no terminator.

The value of it over http is the javascript on the client side can just get the raw binary or UTF-8 payload, and the socket stays open for async traffic in either direction.  There is no HTTP header overhead on each packet, as mentioned for UTF-8 the protocol overhead is 2 bytes per packet only.  There’s no huge XML encode / decode overhead either, so this is a great transport for low-latency data like speech, and it’s no-messing async nature lets it carry event information too ajax-style.

Because (once the connection is established) the protocol overhead is so low, it’s very suitable for weak embedded devices that have some kind of network connectivity but no real UI capability or CPU cycles for bloating data into formats browsers otherwise prefer.

Websocket servers

Sounds good right?  Well, to use it practically you need server-side support, because you are literally using a new socket-level protocol other than http.  There are Java and Python implementations suitable for Apache… but… unlike http there are no C library implementations suitable for embedded devices.  So, I wrote libwebsockets to allow embedded devices to participate in the new UIs possible with HTML5 and websockets.

Introducing libwebsockets

libwebsockets (in git at http://git.warmcat.com/cgi-bin/cgit/libwebsockets/ ) is a lightweight GPL2 http and websocket server that hides all the protocol handshakes and detail from the user code driving the server.

Because it supports file serving on http, it is able to provide a single listening socket that can serve your html script page normally and then when the browser starts running your script, come back and make websocket connections to the same port.

A test server is provided

http://git.warmcat.com/cgi-bin/cgit/libwebsockets/tree/test-server/test-server.c

because everything to do with the protocols is handled by the library, it’s very simply able to serve http and websockets using a single callback.