The Alignment Monster
Currently I am working on an embedded Linux ISDN-2 device I have designed... the hardware works fine but it's clear that the challenges lie in the software stack. ISDN uses a cryptic, stateful protocol called LAPD to manage call state and features many layers of protocol stacks to get the job done. You know you are dealing with the old school when they refer to 64kbps log coded PCM as "3.1kHz voice", meaning the audio bandwidth.
Naturally at this.. mature... stage of ISDN development (ie, I am plundering the ancient dusty tombs of a dead protocol that happens to be in wide use) I am not anxious to become a guru capable of winning arguments at the bar on ISDN protocol trivia, instead I need the freaking thing to work. If it's a new technology, exploring the byways and understanding it closely can often pay off in the future, but there is much less chance of that when dealing with something old and basically deprecated (cf ADSL). So I chose to use mISDN, an attractive proposition with a driver for the chipset I am using and capable of working in both NT and TE modes -- basically allowing acting as the exchange and the customer.
mISDN is getting some reasonable use as part of Asterisk via chan_mISDN, so I hoped for an easy ride. However it is clear that mISDN has not had much of a life outside of x86. The Makefile is not set up for crosscompile and indeed the thing from git when I started on it would not compile against a current kernel source. In fact the thing caused a segfault in the kernel build process on contemporary kernels, the two-line fix for which represents my first contribution to the actual Linux kernel tree. (Rather a weird bug... the text CONFIG_MODULE appearing in any source file will cause the build to fail with a segfault in a script. This text appeared accidentally in mISDN -- CONFIG_MODULES was meant which did not trigger the bug)
However, there are at least two other strugglers which suffered before me on ARM crosscompile and offered some support on the list. Well the actual crosscompile action I managed fine... the problem is that the resulting sources did not work properly in a very curious way. The downstream symptom of it, that was actually noticable, was that an opaque handle for a resource was wrong. When it later tried to dereference the handle to an actual in-memory object, no object matched that handle. But the handle was broken in a curious way. Here is the actual packet that returned the handle
0000: 00 00 00 00 81 23 0F 00 00 00 00 00 08 00 00 00 .....#..........
0010: 80 01 00 40 00 00 00 00 ...@....
This chunk of data is represented in mISDN with an explicit struct for the first 16 bytes, and then an unformatted "argument" block of data follows, in this case a further 8 bytes of it.
typedef struct _iframe {
u_int addr;
u_int prim;
int dinfo;
int len;
union {
u_char b[4];
void *p;
int i;
u_int ui;
u_int uip[0];
} __attribute__((packed)) data;
} __attribute__((packed)) iframe_t;
The opaque handle where the problem comes from is found in the first 4 bytes of the arg region at +0x10 in the dump above, represented by ->uip[0]. The correct result is to walk away from the packet understanding that the opaque handle is 0x40000180, the ARM9 I am using being little-endian.
However to my surprise in fact the code left under the impression that the opaque int handle was 0x80. I confirmed that the pointer was at the right offset from the start of the struct: somehow dereferencing a 32-bit int pointer that looked at memory containing 80 01 00 40 gave the result 0x80!
Further, if instead of using my own int * based on ->ui to do the dereferencing, I used ->uip[0] directly from the struct I got the correct result of 0x40000180. And I confirmed that & ->ui and & ->uip[0] are exactly the same!
Diego Serafin on the mISDN mailing list had seen this crap before. He provided the solution: on ARM, misaligned that is, b1 b0 of the address bus being nonzero for 32-bit access, reads are silently BROKEN. What happens in this case is that the read happens at the address & 0xfffffffc BUT rotated according to the original address &3. An example.... if at address 0x0 one finds 11 22 33 44 55 66, then it's clear that dereferencing an int * pointing to 0x0 will result in 0x44332211 on both x86 and ARM. On x86, dereferencing an int * to 0x1 will give you 0x55443322. BUT on ARM, the same dereference of an int * to 0x1 would give you instead 0x11443322.
And indeed that is what had happened in the example case above, where the pointer to the start of the (struct & 0x3) was 0x3... it was some address 0x...f. In this case it read the 0x80 and then filled in the upper bits from byte offset +0xd, which are all zeros.
The reason that ->uip[0] gave the right result is purely down to it being marked as __attribute__ ((packed)). In such a case, the compiler understands it cannot use a 32-bit bus access but has to use four byte reads and or together the 32-bit result.
So: the takeaway from this is that it is not enough that the C code be "correct" for x86. If it is to be portable, ALL int accesses must be aligned to int boundaries. It is NOT enough that the compiler pads ints inside a struct definition to an int boundary either: because the start address of the struct may not be aligned to an int boundary.
I resolved this by adding macros to check pointers for int alignment (to find the instances where sensitive pointers are misaligned and can cause trouble) and macros to allow single step allocation of storage on the stack that is allocated to int boundaries, disallowing misalignment. But still it is an education to find perfectly sane C code that can work on x86 can blow violent chunks on ARM or other processors that insist on width-based bus alignment.