Coolest Mailserver
Sata-eating Monster
Losing my 24/7 local box to the SATA-eating monster left an immediate problem - nothing was taking my mail. My MX for warmcat.com ends up at my cablemodem, incoming mail was just timing out with nobody to talk to. I thought for a bit about setting up an external postfix and moving the MX, but I didn't like it as a permanent solution, therefore it was wasted effort to make such a temporary solution. Annoyingly the AMD64 box went down just before I had to take a trip to Spain with Jenny for a few days. I shrugged my shoulders and hoped anyone with important mail would retry, and that the mailing lists I am on would be understanding of the rather unrefined behaviour.
When I returned from the few days of overheated childcare-in-another-land (formally, 16 years ago now, known as "holiday") I immediately fell ill with some kind of bad cold that laid me up in bed for a couple of days additionally. So after nearly a week the stress was on to get a permanent solution for the missing mailserver question.
Replacing the fridge with something cooler
The huge fridge-with-fans that is the AMD64 box had never been a good fit for being on 24/7, it must have sucked significant power over the year it was on all the time. Since I am working on embedded ARM systems at the moment it had been in my mind for a while to port Postfix and run one of these lowpower AT91-based systems as my mailserver, with a USB memory stick as the storage. As soon as I got back I began laying the groundwork.
The first problem was my backed-up, transplanted buildroot gcc toolchain that used to run on the AMD64 box was an x86_64 --> ARM crosscompiler, but it had been restored on to what was basically an x86 32-bit box. Therefore I could not now even run the crosscompiler toolchain I had been relying on for many months to generate ARM packages.... a bigger problem even than the mailserver issue, since I can't do any work without a working toolchain. I decided to try to regenerate the crosscompilers from the original 4.0.2 gcc sources, which I had kept. This worked fairly smoothly, and I moved the old toolchain directory and replaced it with the new x86 compilers and started on porting Postfix in order to get my mail up, but also in order to test that the x86 -> Arm compilers generated code that would interoperate with the piles of existing x86_64 -> ARM compiled packages that were out in the world.
Postfix with added signal 11
Postfix needed db4, or so I thought, it seemed later that there was some kind of HAS_DB define IIRC. Anyway I packaged db4 into RPM, installed it into the crosscompile devel filesystem and compiled and packaged postfix. When I scp'd them into the ARM box, all the postfix apps blew chunks with segfaults. If you ran them through strace they still segfaulted immediately before any dynamic libs were touched. This was a bit depressing... I assumed then that the problem was an incompatability between the code generated on 32-bit crosscompilers vs the 64-bit ones. Such differences shouldn't exist, but they could exist if there were bugs in the compiler sources easily enough. Well, I half expected it, so I recompiled busybox, uClibc and the gcc libs under the new compiler and updated the ARM box. I was surprised when I got the same segfaults on the postfix but good behavious on everything else, even things I hadn't recompiled running on newly compiled libs. It was clear the segfaults were not coming from differences between the new and old compilers after all.
Postfix Diet
Postfix is not set up to be thin and trim. It seemed that because of the desire to have it run on OSes that don't support dynamic libs, something I would be applauding if I were stuck with uClinux, it builds 5 static libraries and the dozen or so apps that make up postfix each include a bunch of them into themselves. The result is that the postfix package built like this totals 10MB compressed, since each app has copies of one or more libraries over and over. As I only have a meager 8MB root filesystem on these ARM boxes, that was not so optimal.
As part of the porting action I took the time to convert those five static libs into dynamic libs, so they would exist just the once, and to make all the apps bring them in as runtime libs. This reduced the postfix package to under 500KBytes. But suspicion fell first on the pretty widespread Makefile.in edits that had enabled this. After some futzing around, I learned an interesting lesson it seemed many people already knew well... -shared in the LDFLAGS to generate an executable does not indicate to used .so instead of any .a's that might be lying around, and it does not do good things without generating warnings or errors.
With this removed, half of postfix started up, the half that collects mail from port 25 and stores it in the internal postfix queues. The other half of it, "local", which takes mail from the internal queue and delivers to Mbox or Maildir for normal mail apps to see, continued to segfault. But I was very encouraged! I was at least collecting my mail again even if I couldn't easily read it yet. It was very cool to tail -f /var/log/messages and to see the familiar Postfix actions happening down on the little embedded ARM.
IRC -> GDB
I hit IRC to try to get some clue as to what would make Postfix's "local" process blow chunks, the #postfix channel on freenode. For whatever reason -- I think because my questions are either stupid or difficult -- I never have much luck with IRC support. #kernelnewbies helped me out really well once, but the other channels I have gone to my explanations and questions tend to be met with silence, simply because nobody feels they have a good answer, I think. If you stick around and try to help other people's questions, usually someone will suggest the mailing list, or give some general advice, so it's not unfriendliness.
I broke out the gdb package that I had prepared some months ago but not had a use for, and used postfix's /etc/postfix/master.conf to specify that the "local" processes being spawned should be debugged, using the -D switch. In /etc/postfix/main.conf you can define some bash script around gdb to account for the fact that there is no real console for the spawned process. This one worked for me
debugger_command =
PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont;
echo where ; sleep 10 ) | /media/usbstick/usr/bin/gdb $daemon_directory/$process_name $process_id 2>&1
>$config_directory/$process_name.$process_id.log & sleep 5
Burning CFLAGS
The result was a text file with a gdb backtrace was generated each time that the "local" postfix process blew chunks. Looking through these showed that each time it was dying when it was calling a function that lived in libdb4. This was a very strong hint that something was broken between the way libdb4 was compiled and the way the postfix app thought it was compiled.
I started looking through the db4 sources and saw some preprocessor code to catch a mismatch between the db4 include file versioning and that of the library. Then I realized where the problem was coming from... the libdb4 build was okay because it was using its own includes. But the Postfix build was getting db.h from the default, host path of /usr/include, so the db4 ABI in there reflected the version on the FC5 box, not the version that was going to be down on the ARM box. I added an -I to the CFLAGS on the postfix Makefile to direct it to look at the devel-filesystem path, where I had installed db-devel package with the matching db.h, and the problem was gone.
Dovecot
After I noticed I had accidentally set the /etc/postfix/main.cf to use Mbox "all in one file" format and converted the collected mail to Maildir and set it to use Maildir, the MTA side of the equation was complete. Mail was getting collected and stored on the USB memory stick in Maildir format. The next job was to set up Dovecot so I could access the Maildir contents over secure SSL IMAP.
Happily I had already packaged OpenSSL, and installed the -devel package into the host devel-filesystem. So all I needed to do was to turn off most everything else in the Dovecot ./configure action and make sure it was looking at the devel-filesystem includes and libraries. Dovecot pretty much worked first time, but it took a little messing to get the SSL cert generated cleanly. Then to my surprise Dovecot sat on ssl-build-param for a very long time, maybe 30 minutes, generating a 120 byte crypto signature of some kind. Afterwards it wrote the signature to disk as a cache and reads it, starting immediately. It seems at intervals (settable by the user in /etc/dovecot.conf) Dovecot wants to regenerate these very precious bytes for reasons I don't understand.
After this, I only had to configure /etc/dovecot.conf to listen for IMAP/S on port 993, add a user and password, and I was able to collect my mail securely in Thunderbird once again.
No shades of grey
The only thing missing is postgrey, which I am working on. But it requires perl, which is proving difficult to package in crosscompiled form, although it does have support of some sort for crosscompile. It seems to build miniperl and then use it as part of the build action, which is a dumb idea when the miniperl you just built is compiled to run on the target ARM arch, not the host.
Anyway, I do have a silent, low power mailserver up 24/7 with no moving parts now, which is very much where I wanted to end up.