warmcat.com2023-12-15T13:26:40+00:00https://warmcat.com/warmcat.comXilinx Ultrascale Vitis on Rocky 9https://warmcat.com/2023/12/13/xilinx-ultrascale-vitis-on-rocky-9.html2023-12-13T12:41:39+00:00<a name="Xilinx-Ultrascale-Vitis-on-Rocky-9"></a>
<h2>Xilinx Ultrascale Vitis on Rocky 9</h2>
<p><img src="20231213_122322.jpg" alt="ZCU104" /></p>
<p>I’ve been using Xilinx chip and toolchains since the late 1980s, they have
moved on a lot but still are tied together with tcl underneath and push the
boundaries of what your PC can handle… the 2023.01 default “Vitis” download
for Linux is nearly 100GB plus other needed pieces.</p>
<p>They’ve moved on from Java to Electron, both make sense at the time for cross-
platform solutions that feel native, and added this “Vitis” app on top in an
attempt to bring the whole flow management GUI thing to cross development as
well.</p>
<p>They officially support specific distros, RHEL type and Ubuntu, but I was always
able to coax it to run OK on Fedora, and now my machines have switched to Rocky
9, I tried installing on that.</p>
<a name="Vitis-vs-Rhel-9"></a>
<h3>Vitis vs Rhel 9</h3>
<a name="Difficulties-with-Vitis-cross-2d-distro-packaging"></a>
<h4>Difficulties with Vitis cross-distro packaging</h4>
<p>Although the download recommended earlier RHEL, it mostly worked out of the box,
the first sign of madness was Vitis complaining that it couldn’t start
“gitlens”. Git was installed fine, but poking around with strace, Vitis has
taken the approach that everything it runs should inherit a special
<code>LD_LIBRARY_PATH</code> environment that allows it to override some key libs, namely
OpenSSL 1.0.2k. This EOL’d in 2019 and has no security updates since then, it’s
not a great way around the multi-distro packaging problem in 2023.</p>
<p>Putting the security failure on one side, the bigger problem is that the
install chose to use the override path as <code><install base dir>/Vitis/2023.2/tps/lnx64/cmake-3.24.2/libs/Default</code>
which does not exist, so anything wanting openssl was confused to see the
current OpenSSL 3 pieces from Centos instead of the trick ones the Vitis pieces
were linked against, and fails.</p>
<p>The solution for these problems was to create a symlink of the <code>Default</code> name
it already chose, to the <code>Rhel/9</code> one that contains the insecure tls libs it
was built to use.</p>
<pre><code class="bash">$ cd ~/Xilinx/Vitis/2023.2/tps/lnx64/cmake-3.24.2/libs
$ ln -sf Rhel/9 Default
</code></pre>
<p>Restarting vitis that leaves two much smaller problems</p>
<ul>
<li>if you open a shell console in Vitis, anything in your <code>.bashrc</code> that wants
to bind to openssl will inherit the Vitis <code>LD_LIBRARY_PATH</code> and see the 1.0.2k
era libraries and die after being unable to bind its imports that exist on the
distro openssl versio (like flatpak on my box) but it’s basically cosmetic</li>
</ul>
<pre><code class="bash">flatpak: symbol lookup error: /lib64/libldap.so.2: undefined symbol: EVP_md2, version OPENSSL_3.0.0
</code></pre>
<ul>
<li>the links in the Vitis “Welcome” page to the internet don’t do anything
probably again because of it spawning something that was actually built for
OpenSSL 3 getting surprised it’s being handed 1.0.2k api. They’re just a
convenience and not neccessary.</li>
</ul>
<p>The business end of Vivado (which is still its own standalone thing although
Vitis is intimately aware of its output products) and Vitis seem to work fine
so far.</p>
<a name="Serial-terminal-arrangements-on-ZCU-2d-104"></a>
<h3>Serial terminal arrangements on ZCU-104</h3>
<p>ZCU-104 has a micro-B USB connector which comes out into four ttyUSB. These
unfortunately move around between suspends, but there’s a workaround, which is
to go by <code>/dev/serial/by-id/usb-Xilinx_JTAG+3serial_81472-if01-port0</code>.</p>
<p>The Vitis docs claim you have to use root in linux to access the serial ports,
but typically you’ll only need to one time add your user to <code>dialout</code> group with</p>
<pre><code>$ sudo usermod -a -G dialout myuser
</code></pre>
<p>… and then reboot.</p>
<p>The documentation shows an integrated serial terminal in Vitis, but I couldn’t
find it in 2023.2 release, I build <code>gtkterm</code> from source since it’s not in Rocky.</p>
<p>When you get to download things over JTAG, the synchronization of connectivity
between the host and the dev board is a bit shaky, as described by this
complaint four years ago</p>
<p>https://support.xilinx.com/s/question/0D52E00006hprI5SAI/error-while-launching-program-invalid-target?language=en_US</p>
<p>For me initially it was really shaky, because powertop was enabling suspending
my hosts’s USB controllers causing them to reset randomly, I disabled the
powertop service and it just became what is apparently “normally shaky”.</p>
<a name="Running-Vitis-etc-over-a-network"></a>
<h3>Running Vitis etc over a network</h3>
<p>I recently upgraded my network here to 2.5Gbps, although the license is node-
locked (Mac address locked), it’s perfectly feasible to run Vitis and Vivado
over <code>ssh -Y</code> at 4K, at least locally. So you can use a noisy desktop with lots
of RAM + cores to run it from somewhere else away from you, and control it
perfectly well from a laptop.</p>
<p>There are other problems specific to Rocky 9’s Gnome 40, like there is no UI
for creating a desktop session headless, in Gnome 44 it’s supposedly possible to
use RDP instead of VNC. This would be desirable because the state of what
you’re working on with the remote box will survive suspending either side then,
with ssh -Y the app instances will die and need saving / restoring.</p>
<a name="Vivado-pieces"></a>
<h3>Vivado pieces</h3>
<p>I followed the flow here</p>
<p>https://xilinx.github.io/Embedded-Design-Tutorials/docs/2023.1/build/html/docs/Introduction/ZynqMPSoC-EDT/3-system-configuration.html</p>
<p>to do the Vivado work to create a default .xsa hardware description (you’ll
need it later) and bitstream. This was relatively straightforward and no quirks
for Rocky.</p>
<a name="Petalinux"></a>
<h3>Petalinux</h3>
<p>More downloads are needed to work with Linux, Xilinx provide their own Yocto
distro needed in two parts, a commandline “installer” and a “BSP” for your
specific dev board.</p>
<p>The download page says:</p>
<pre><code class="bash">Supported OS:
Completely removed RHEL and CENTOS to align with upstream Yocto.
</code></pre>
<p>https://docs.xilinx.com/r/en-US/ug1144-petalinux-tools-reference-guide/Installation-Requirements</p>
<p>But I have used Yocto on Rocky for years, whatever it’s trying to say there I
doubt it can’t work. After “install” as we’ll see there is RHEL9 support in
there just not linked up.</p>
<p>Petalinux / Yocto is also a discontiguity for cross-platform Vitis… that
supports Windows via electron, but Yocto doesn’t, so they just announce that
you “need a Linux box to use Petalinux”.</p>
<p>The “.bsp” file for your platform is actually just a .tar.gz without the file
suffix, but don’t unpack it - the petalinux tools consume these “.bsp” files
as compressed files with the suffix <code>.bsp</code> directly.</p>
<p>1) First create a directory for petalinux to live in at <code>~/Xilinx/petalinux</code>.</p>
<p>2) Move the .bsp file(s) you downloaded into there.</p>
<p>3) Run the “petalinux installer” packed shell script you downloaded so it
installs into <code>~/Xilinx/petalinux</code>, and since it also does not really understand
Rocky is Rhel you must also create the neccessary link similar to Vitis:</p>
<pre><code class="bash">$ cd ~/Xilinx/petalinux/tools/xsct/lib/lnx64.o
$ mv Default old-Default
$ ln -sf Rhel/9 Default
</code></pre>
<p>4) It also needs <code>xterm</code> installed on the host to proceed.</p>
<p>5) After that you simply source a toplevel script to customize your shell,
Yocto style:</p>
<pre><code class="bash">$ cd ~/Xilinx/petalinux
$ . settings.sh
PetaLinux environment set to '/home/agreen/Xilinx/petalinux'
WARNING: This is not a supported OS
INFO: Checking free disk space
INFO: Checking installed tools
INFO: Checking installed development libraries
INFO: Checking network and other services
WARNING: No tftp server found - please refer to "UG1144 2023.2 PetaLinux Tools Documentation Reference Guide" for its impact and solution
$
</code></pre>
<p>I looked at UG1144 but didn’t see anything useful for this, I guess it’s only
needed for network boot, which isn’t today’s problem.</p>
<a name="Practical-meaning-of-integration-for-Vitis"></a>
<h4>Practical meaning of integration for Vitis</h4>
<p>The scope of the tools needed for development on ultrascale is pretty scary, not
only Linux, yocto packages, a-t-f, userland libs, but hardware definition,
toolchains, debugger, JTAG and the whole HDL -> bitstream flow in Vivado.
Knowing and having experience in each of these separately (in particular having
the VHDL + Vivado flow knowledge) initially doesn’t help much dealing with how
Vitis takes the rest away from you and provides access to them under its own
rules.</p>
<p>Vitis “integration” doesn’t always mean that it brings them inside Vitis UI or
idioms in the same way the source debugger is subsumed into Vitis, Vivado
remains a separate app with its own UI and “integrated” mainly in the sense a
“hardware description” <code>.xsa</code> file produced from Vivado-land is the source of
truth about the platform in the rest of Vitis.</p>
<p>The jury’s out whether this is overall better than just providing idiomatic
customization for each tool - eg, I’m very familiar with Yocto, the old way was
to just provide the necessary Yocto config for the dev boards and let me
get on with it.</p>
<p>Petalinux is again its own wrapped, unintegrated external stack and again
requires its own glue flow outside vitis.</p>
<a name="Creating-a-linux--22-bsp-22--project-and-adapting-it-to-the-hardware-.xsa"></a>
<h4>Creating a linux “bsp” project and adapting it to the hardware .xsa</h4>
<p>First source the yocto/petalinux environment in a terminal</p>
<pre><code class="bash">$ cd ~/Xilinx/petalinux
$ . settings.sh
...
$
</code></pre>
<p>Then create a “petalinux project” from the BSP</p>
<pre><code class="bash">$ cd ~/xilinx-workspace
$ petalinux-create -t project -n plx-zcu104-1 -s ~/Xilinx/petalinux/xilinx-zcu104-v2023.2-10140544.bsp
INFO: create project plx-zcu104-1
INFO: new project successfully created in /home/agreen/
$
</code></pre>
<p>For the next step to customize the “petalinux project”, we need to “be inside
the petalinux project” directory, so enter the project we just created as the
cwd and then “configure” the petalinux project we created from the bsp reference
just now</p>
<pre><code class="bash">$ cd ~/xilinx-workspace/plx-zcu104-1
$ petalinux-config --get-hw-description ~/xilinx-tutorials/test1/design_2_wrapper.xsa
</code></pre>
<p>This now does a kernel menuconfig / buildroot style menu so we can see what’s
going to be included in the build</p>
<p><img src="petalinux-config.png" alt="Petalinux config" /></p>
<p>… on exiting the menu, having made changes or not, it will spend a few seconds
modifying the petalinux project to follow the .xsa we just gave it, and any
config overrides from the menuconfig.</p>
<p>Finally you can start the yocto build</p>
<pre><code class="bash">$ petalinux-build
</code></pre>
Efficient CI UPShttps://warmcat.com/2022/03/26/efficient-ci-ups.html2022-04-02T11:03:10+01:00<a name="Improving-reliability-and-power-efficency-of-my-CI-rack"></a>
<h2>Improving reliability and power efficency of my CI rack</h2>
<p>This article is about hacking a UPS so it can manage power to my CI rack (which
has 30 physical and virtual machines on it) so it can handle outages cleanly and
be switched off when no jobs to do.</p>
<p><img src="Eunice_2022-02-17_0822.jpg" alt="Storm Eunice" />
Public Domain. By NASA - https://worldview.earthdata.nasa.gov/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=115344722</p>
<p>In mid-Feb 2022 there were a <a href="https://en.wikipedia.org/wiki/Storm_Eunice">series of storms in the UK</a> that caused widespread
damage. Our physical damage was limited to our bin shed being blown over and
reduced to matchwood. But due to a power outage, which is unusual in the UK,
there was quite a bit more virtual damage.</p>
<p>The LWS CI rack runs a mixture of 30 physical and virtual platforms and some of
those did not react well to the outage. After a day figuring out what was
broken in each virtual context I fixed most of it, but today over a month later
there are still several platforms out. After updating the host OS for the
nspawns, I can no longer share the related ttyUSB device nodes for the embedded
platforms into the VM, Xenial and another platform can no longer boot. So it’s
in a bit of a degraded state, and although it will be fixed, I don’t want to
have to run around dealing with the same amount of fallout again from events I
can’t predict or control.</p>
<p>In addition to that, energy security is going to be a global problem in the
medium term, with prices that will only be going up, it’s no longer reasonable
to just burn the power this uses 24h a day when it is idle.</p>
<p>So Something Must Be Done.</p>
<a name="Step-1:--Get-a-suitable-UPS"></a>
<h2>Step 1: Get a suitable UPS</h2>
<p>Since power is normally very reliable in the UK, I did not bother with a UPS
until now; but nothing else can be improved until there’s a way both to handle
outages cleanly and programmatically power off the rack as a whole. I chose a
<a href="https://www.amazon.co.uk/gp/product/B07JQC1CB1">3U rackmount 1200VA one</a>, at
under GBP200 it’s a pretty good deal. They’re available from other vendors than
Amazon if you look around. It has an LCD display showing instantaneous power
usage as a percentage of the 1200VA budget, which proved very interesting. But
if you read on, you’ll see it has problems for this use-case and we’ll be deep-
diving into the guts of that and hacking our way around it.</p>
<p>The UPS comes with a mini CD which is about as useful as a stone tablet in 2022,
you can find a copy of the windows 8 driver that’s on it if you look around the
net, but that’s no use to me. There’s no vendor Linux support.</p>
<p>I expected UPSes to all use the UPS USB class devices nowadays, but no, the UPS
market is a big stinking mess of proprietary hacks on silicon vendor reference
designs with each one doing things slightly differently. The FOSS project for
UPS management, NUT <a href="https://networkupstools.org">Network Ups Tools</a>, is therefore in a difficult position
and the userland driver part of that is also by necessity a big mess of
duplicated onetime hacks nobody is really able to unify, since they don’t own
all the 100+ models of UPS they support for testing and the original user lost
interest when it worked well enough for him.</p>
<p>The typical flow is discussion on a mailing list between someone with the UPS
and a dev, who guesstimates the required changes and gets the user to test it
iteratively. He can never test it directly since he doesn’t have it and was
never in the same room with one either.</p>
<p>This particular UPS model is not explicitly supported by NUT, and is somewhat
bizarrely set to use a USB vid:pid of 0001:0000 which Fry’s Electronics got
associated with first, this is not the sign of competence you would hope to see.
It also reports a product ID MEC0003 which is seems is also found in many other
UPS products, although with the variety of configurations that are supposed to
work with products reporting like that, clearly that refers to the inner
protocol and not the USB layer one. After some trial and error using Fedora’s
packaged NUT I was able to get it to report all zeros as its status, which is
something but not much use.</p>
<pre><code>read: (000.0 000.0 000.0 000 00.0 0.00 00.0 00000000
</code></pre>
<p>With the CD, windows 8, stoneage USB proprietary protocol, it looks like it was
designed in the early 2000s against a silicon vendor reference design and they
just keep churning them out since they are good enough to be competitive.</p>
<p>I cloned the NUT sources and started fiddling in there. In the sources is a
comment suggesting that Powercool users might have luck with <code>nutdrv_qx</code>
userland driver set to use the <code>hunnox</code> subdriver. This subdriver does not even
exist in the packaged Fedora NUT version. That did indeed work, it buys me sane
lowlevel status</p>
<pre><code>read: (238.0 000.0 240.0 023 50.1 27.5 29.0 00001000
</code></pre>
<p>The magic ups.conf for that (remember hunnox only exists in the git NUT) is</p>
<pre><code>[powercool]
driver=nutdrv_qx
vendorid=0001
productid=0000
product=MEC0003
subdriver=hunnox
langid_fix=0x409
port = auto
</code></pre>
<p>Using <code>upsc</code> to cook the lowevel driver status, we get</p>
<pre><code>battery.voltage: 27.60
device.type: ups
driver.name: nutdrv_qx
driver.parameter.langid_fix: 0x409
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: MEC0003
driver.parameter.productid: 0000
driver.parameter.subdriver: hunnox
driver.parameter.synchronous: auto
driver.parameter.vendorid: 0001
driver.version: 2.7.4-5059-ga8e3687a
driver.version.data: Q1 0.07
driver.version.internal: 0.32
driver.version.usb: libusb-1.0.23 (API: 0x1000107)
input.frequency: 50.0
input.voltage: 243.0
input.voltage.fault: 0.0
output.voltage: 245.0
ups.beeper.status: disabled
ups.delay.shutdown: 30
ups.delay.start: 180
ups.load: 22
ups.productid: 0000
ups.status: OL
ups.temperature: 29.0
ups.type: online
ups.vendorid: 0001
</code></pre>
<p>… which looks sane.</p>
<a name="Step-2:-No-more-Always-On"></a>
<h2>Step 2: No more Always On</h2>
<p>Until now I just left the CI rack on all the time and hooked up the the Sai
server ready to go. But actually the rack doesn’t have anything to do except
CI jobs, and although those sometimes come thick and fast, typically the rack
is in fact idle.</p>
<p>The LCD on the UPS shows the power being used as a percentage of the 1200VA
capacity, after arranging that the contents of the rack, and its DUT LAN switch,
is powered via the UPS, I can see the rack idles at around 30%, which is 400VA.
This… is a lot of power when it’s on 24h a day and mainly idle.</p>
<p>What would be desirable is if the rack was off - the UPS can turn it all off
programmatically - except when something to do on the Sai server. That’s easy
to describe, but harder to implement.</p>
<p>To do this, there needs to be a “UPS manager” device that the UPS is plugged
into, which is the only thing that is “always ON”. It needs to be on and have
access to the internet normally when mains power is available, to check for
remote jobs, regardless of whether the UPS has the rest of the rack powered or
not. And it needs to be ON, with local DUT LAN access, when mains power has
failed, so it can inform the different devices in the rack they need to perform
an orderly shutdown.</p>
<p><img src="ups-monitoring.png" alt="UPS monitoring architecture" /></p>
<p>Rack physical devices are on their own ethernet switch + DUT LAN subnet with noi
(which is the big PC we hope to be mainly OFF) acting as a router on to the home
LAN, so it means on backup power the UPS Manager RPi can talk to anything on
the DUT subnet, which will also be taking backup power. During an outage, the
UPS Manager RPi is just trying to inform all the machines in the rack they
should shut down in an orderly fashion due to an outage, and then turn off the
UPS Backup and wait for happier days.</p>
<p>It boils down to the “UPS manager” device must be powered from both sides and
have dual ethernet interfaces, one to access the internet when there is mains
power and connectivity, and the other to access the rack devices on their LAN to
inform them when they must cleanly shut down.</p>
<a name="Step-3:-Create-the-USB-manager-RPi-with-Rocky"></a>
<h2>Step 3: Create the USB manager RPi with Rocky</h2>
<p>I redeployed an RPI4 in the rack with Rocky Linux 8.5 on it, and rebuilt NUT
from git on that to the point I could reproduce the UPS operation the same on
the RPi 4. The rackmount kit that I used adds a 2.4mm DC jack to the Rpi at the
back so there are already two power sources, the USB-C on the RPi and this
jack, and I checked I can run both and unplug either without crashing the RPi.</p>
<p>Switchmode PSUs naturally adapt their output to the voltage at the load
dynamically, so they don’t find it strange if their load is already close to
or above their target voltage, they just stop driving the load until it falls
below their target voltage.</p>
<p>The Rocky image for RPi can be found here</p>
<p>https://download.rockylinux.org/pub/rocky/8/rockyrpi/aarch64/images/</p>
<p>Don’t bother trying to use an SD card for this, they are too unreliable.
Update the RPi4 bootloader to support USB boot and <code>xzcat | dd</code> the image on to
a USB3 flash drive and use that as your storage. Hopefully the next gen of
RPi boards will have an eMMC.</p>
<p>After install, don’t forget to add your own user and <code>userdel</code> their default
<code>rocky</code> user since it has a fixed password <code>rockylinux</code>. Similarly, set up
<code>.ssh/authorized_keys</code> and associated chmod for your user with your main PC
user key so you can ssh in, you should check it works and then also change
<code>PasswordAuthentication no</code> in <code>/etc/ssh/sshd_config</code> and restart sshd service.</p>
<p>The Rocky xz image expands to a fixed ~4GB size, run this script included with
Rocky to resize the partition and expand the fs to fill your storage device.</p>
<pre><code>$ sudo rootfs-expand
</code></pre>
<p>Building nut from git requires adding the nut user with <code>useradd nut</code>, the
group is already existing in Rocky.</p>
<p>I added a udev rule in <code>/lib/udev/rules.d/52-nut-usbups.rules</code></p>
<pre><code>ATTR{idVendor}=="0001", ATTR{idProduct}=="0000", ATTRS{product}=="MEC0003", MODE="0774", GROUP="nut", SYMLINK+="usb-ups"
</code></pre>
<p>then</p>
<pre><code>$ sudo udevadm control --reload-rules && sudo udevadm trigger
</code></pre>
<p>to coldplug it and get the correct group on the device node.</p>
<p>After that, enable the Powertools repo at <code>/etc/yum.repos.d/Rocky-PowerTools.repo</code>
setting <code>enabled=1</code> and do the <code>dnf update -y</code>.</p>
<p>Open the default nut port so clients will be able to connect to us</p>
<pre><code>$ sudo firewall-cmd --permanent --add-port 3493/tcp
</code></pre>
<p>Set the hostname in <code>/etc/hostname</code> to something like <code>ups-monitor</code>.</p>
<p>The DUT LAN side ethernet muxt use manual / static address / subnet / DNS /
gateway, because otherwise it may not be able to reacquire DHCP properly when
the RPi stayed up and the DHCP server went and stayed down.</p>
<pre><code>$ sudo nmcli c m "Wired connection 2" ipv4.addresses 192.168.xx.xx/24
$ sudo nmcli c m "Wired connection 2" ipv4.dns 192.168.xx.1
$ sudo nmcli c m "Wired connection 2" ipv4.gateway 192.168.xx.1
$ sudo nmcli c m "Wired connection 2" ipv4.method manual
</code></pre>
<p>Reboot with <code>sudo shutdown -h now</code></p>
<a name="Step-4:-Building-NUT-from-git"></a>
<h2>Step 4: Building NUT from git</h2>
<p>First bring in the build prerequisites</p>
<pre><code>$ sudo dnf install usbutils make git autoconf automake libtool libusb-devel openssl-devel python39
</code></pre>
<p>(yes, nut brings in a whole dependency on python, just to parse a config file)
then</p>
<pre><code>$ git clone https://github.com/networkupstools/nut.git
$ cd nut
$ ./autogen.sh
$ ./configure
$ make -j8 && sudo make install
$ sudo mkdir -p /var/state/ups
$ sudo chgrp nobody /var/state
$ sudo chgrp nobody /var/state/ups
</code></pre>
<p>It defaults to install its stuff in <code>/usr/local/ups/...</code></p>
<a name="Step-5:-Realize-the-UPS-has-some-problems-and-working-around-them..."></a>
<h2>Step 5: Realize the UPS has some problems and working around them…</h2>
<p>Generally the UPS was workable as a UPS with <code>hunnox</code> and git NUT. Although we
do need it to act like a traditional UPS and provide battery backup and failure
indication so we can shut down, for us the main use of it is to power down the
rack cleanly, either programmatically or upon an outage, keep it mostly powered
off, and power it back up again automatically when there is work to do.</p>
<p>The UPS has two problems with that..</p>
<p>1) it will not stay powered down for longer than ~30m .. 2hr (shutdown.stayoff),
it will autonomously repower itself presumably due to hardware bugs, and</p>
<p>2) it will not come back up again on command (load.on) or indeed by sending it
any variation thereof, I have to press the frontpanel button for 3s to bring it
back up with the load on.</p>
<p>As we will see, once it enters “PWR DN”, communication becomes sporadic,
presumably due to powersaving sleeps inside the UPS, since it may be effectively
“running from last dregs of battery” if it’s like that due to an outage and no
mains. As it is, it works only as an always-on battery backup that drains its
battery every time. It is not able to work as a programmatic switch for the
load.</p>
<table>
<thead>
<tr>
<th>State</th>
<th>Button action</th>
<th>Result</th>
</tr>
</thead>
<tbody>
<tr>
<td>Load ON</td>
<td>Press</td>
<td>Instant OFF</td>
</tr>
<tr>
<td>Load OFF</td>
<td>Press 1s</td>
<td>Show display backlight for a few secs, keep load OFF</td>
</tr>
<tr>
<td>Load OFF</td>
<td>Press 3s</td>
<td>Bring load ON</td>
</tr>
</tbody>
</table>
<p>After understanding the flow in NUT and not finding the main problems there, I
removed the power, waited a bit and removed the 8 screws holding the front
panel.</p>
<p>WARNING - I don’t recommend you do this unless you understand the dangers from
having a 240VAC generator exposed to your hands… there is a “cold” side to
the UPS that’s referenced to Earth ground, the USB connector and the metal case
are on the safe, cold side. But inside, there is a “hot” side that is referenced
to 240VAC as its “hot 0V”, touching this or anything referenced to it is
hazardous to life. The internal wiring and PCBs are on this “hot” side. Since
it is battery powered, simply having unplugged it has not made it safe at all.</p>
<p>I was initially a bit nonplussed, there are two sealed lead-acid batteries, with
space for two more, a large wound transformer and a board with heatsink-ed
discretes. But I could not see any ICs that might contain the smarts. I
realized later that for power domain isolation standards reasons, all the PCBs
are single-sided, and the traces for the “hot side” pcb are below it where you
can’t see any SMT. So it’s under the “hot” pcb which I didn’t want to touch
while there is 24V battery powering a 240V inverter on that board.</p>
<p><img src="20220401_065802.jpg" alt="daughterboard" /></p>
<p>There’s a daughterboard at the back that has the USB connector and two RJ45s,
but the RJ45s are not connected to anything but each other and a couple of
diodes, it’s trying to be some kind of pointless surge protector or so.</p>
<p>Four wires come back from that to the hot board.</p>
<p><img src="20220401_070039.jpg" alt="daughterboard" /></p>
<p>The chip is a CY7C63313 lowspeed HID controller (<a href="https://www.infineon.com/dgdl/Infineon-CY7C63310_CY7C638xx_enCoRe_II_Low_Speed_USB_Peripheral_Controller-DataSheet-v21_00-EN.pdf?fileId=8ac78c8c7d0d8da4017d0ecc994f46c9">PDF</a>)</p>
<p>The -13 variant seems to have 8KB flash, there are two opto-isolators mounted
there too. These turned out to be RX and TX for a 2400bps link.</p>
<p>I studied what travels on the link from the “cold” side, it’s literally the
Q1 type protocol sent over the serial link by the nut <code>hunnox</code> driver, what
comes back from the hot-side controller chip is the <code>(000.0 ...</code> stuff at 2400
bps 8/N/1.</p>
<p>I had thought I might replace or reflash the CY7C63313 but since it’s just
dumbly passing through the serial protocol from USB <-> 2400bps UART, that’s
not the source of the problems.</p>
<p><img src="./ups-hack.png" alt="ups innards from USB" /></p>
<p>After musing for a bit I brought out an LTV816 5kV-rated isolator I had lying
around and added a parallel optoisolated way to programmatically “press the
front panel button”. Holding the button for 3s does bring us out of “PWR DN”
with the load powered, the isolator hack allows us to control that from the Rpi.</p>
<p>There are still problems… when the UPS is in “PWR DN”, the hot side does not
issue status data except at > 30s intervals, the USB HID controller then seems
to reply with <code>0x05</code> byte indicating that it is still there, but it did not get
any response from the hot side to forward back to the USB host.</p>
<p><code>hunnox</code> does not understand this since it’s not the <code>(</code> it expected for status.
However the NUT driver continues to report the last actual status that it had
as if it was current: but this is stale garbage.</p>
<p>However even allowing for this hot-side narcolepsy where it is waking only once
per 30s briefly when in PWR DN, it does not respond to sending it even
continuous <code>load.on</code> from NUT (which sends <code>C</code> at the protocol level, to cancel
the shutdown) for over a minute. So there is no way to switch the UPS back on
programmatically over USB as it stands.</p>
<p>It is also part of the reason why you see complaints around the net that “FSD”
state reported by <code>upsc</code> is “sticky”, it was the last state received by the USB
HID controller from the now OFF “hot side controller” and just keeps telling it
in the absence of any new information. The main reason the FSD state is sticky
is that <code>nut-server</code> holds the state and won’t stop reporting everything is
about to shutdown until you restart the service.</p>
<p>So bringing the load back up and having it stay up can be done by</p>
<ul>
<li><code>sudo systemctl restart nut-server</code> on the Rpi4</li>
<li>have the Rpi4 “press the fontpanel UPS button” for 3s</li>
</ul>
<p>(Update 2020-04-06: when in PWR DN, unfortunately sending it load.on over and
over is not effective to bring the load back on by itself).</p>
<p>And there is a second problem, that even with the NUT RPi powered down, the UPS
will restart the load between 30m .. 2h by itself, it looks like the “hot side
controller” has a leakage problem when it is OFF that, eg, its power rail used
to pull up the button interrupt gradually leaks when depowered until it looks
like the button is pressed and it reapplies the load autonomously.</p>
<p>After some experiementation, when in PWR DN, sending shutdown.stayoff every 10m
is an effective to keep it in PWR DN for as long as you want.</p>
<p>Tying it all together, all the hacks are aimed at this state diagram:</p>
<p><img src="ups-states.png" alt="ups states" /></p>
<p>Helper scripts:</p>
<p><code>/usr/local/bin/ups-up.sh</code></p>
<pre><code>#!/bin/sh
sudo rm -f /tmp/last-stayoff
sudo touch /tmp/last-on
sudo systemctl restart nut-server
sleep 4s
sudo /usr/local/ups/bin/upsrw -s ups.delay.start=1 -u nutuser -p nutpassword powercool
sudo /usr/local/ups/bin/upsrw -s ups.delay.shutdown=1 -u nutuser -p nutpassword powercool
sudo /usr/local/bin/gpioset --mode=time -s 3 0 26=1
for i in 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 ; do
sleep 2s
OUT=`/usr/local/ups/bin/upsc powercool output.voltage | cut -d'.' -f1`
if [ $OUT -gt 90 ] ; then
echo "back up"
exit 0
fi
done
exit 0
</code></pre>
<p><code>/usr/local/bin/ups-down.sh</code></p>
<pre><code>#!/bin/sh
sudo rm -f /tmp/last-on
sudo systemctl restart nut-driver@powercool
sleep 5s
sudo /usr/local/ups/bin/upsrw -s ups.delay.shutdown=20 -u nutuser -p nutpassword powercool
sudo touch /tmp/last-stayoff
sudo /usr/local/ups/bin/upscmd -u nutuser -p nutpassword powercool shutdown.stayoff
sudo /usr/local/ups/bin/upscmd -u nutuser -p nutpassword powercool shutdown.stayoff
</code></pre>
<p><code>/usr/local/bin/ups-poll.sh</code></p>
<pre><code>#!/bin/sh
# powercool UPS will auto-wake after 30m - 2h if left alone in PWR-DN
# remind it to stay off every 10m while that's what we want avoids this
ON=0
if [ -e /tmp/last-on ] ; then
ON=`stat /tmp/last-on -c %W`
fi
OFF=0
if [ -e /tmp/last-stayoff ] ; then
OFF=`stat /tmp/last-stayoff -c %W`
fi
if [ $ON -gt $OFF ] ; then
echo "exiting: ON more recently than OFF"
exit 0
fi
UT=`date +%S`
OUT=`/usr/local/ups/bin/upsc powercool output.voltage | cut -d'.' -f1`
if [ -z "$OUT" ] ; then
echo "Empty output.voltage"
OUT=0
fi
if [ "$ON" -eq 0 -a "$OFF" -eq 0 -a "$OUT" -gt 90 ] ; then
echo "exiting: No on or off info, and output.voltage > 90"
exit 0
fi
if [ $OUT -lt 90 ] ; then
echo "telling it to stay off"
/usr/local/ups/bin/upscmd -u nutuser -p nutpassword powercool shutdown.stayoff
fi
</code></pre>
<p><code>/etc/crontab</code></p>
<pre><code> 0,10,20,30,40,50 * * * * root /usr/local/bin/ups-poll.sh
</code></pre>
<p>… with this set of workarounds we’re finally back in business after a hardware
hack and a software hack bound to an external controller solves two faults the
UPS shipped with: it’s a bit messier than expected but that’s what you get for
your cheapo GBP200 rackmount UPS.</p>
<a name="Step-5a:-Building-libgpiod"></a>
<h2>Step 5a: Building libgpiod</h2>
<p>The button hack needs libgpiod, naturally that is not packaged in Rocky. Mostly
it uses the pieces already needed to build nut from git.</p>
<pre><code>$ sudo dnf install autoconf-archive
$ git clone git://git.kernel.org/pub/scm/libs/libgpiod/libgpiod.git
$ cd libgpiod.git
$ ./autogen.sh
$ ./configure --enable-tools
$ make && sudo make install
</code></pre>
<a name="Step-6:-Set-up-the-networking-on-NUT"></a>
<h2>Step 6: Set up the networking on NUT</h2>
<p>NUT is a bit complicated but it does deal with the distributed shutdown process
that is necessary when we have a lot of physical devices hanging off the UPS.
There are several different systemd services (config paths are for git make
install default configuration). “Server Side” means runs on the device that has
the connection to the UPS, “Client Side” means a device that is powered via the
UPS, but does not have a connection to it, and wants to be told about power
status by the server.</p>
<p>For the Powercool UPS I have, the git version of NUT is needed on the server.
For the clients though, it’s possible to user older distro NUT. The main
difference is older NUT uses master / slave nomenclature and newer uses
primary / secondary. Non-git, distro NUT is also likely built to use distro
path conventions, like <code>/etc/ups/</code>.</p>
<table>
<thead>
<tr>
<th>Side</th>
<th>Service</th>
<th>Config</th>
<th>Functionality</th>
</tr>
</thead>
<tbody>
<tr>
<td>Server</td>
<td>nut-driver@upsname</td>
<td><code>/usr/local/ups/etc/ups.conf</code></td>
<td>Userland driver for connection to UPS</td>
</tr>
<tr>
<td>Server</td>
<td>nut-server</td>
<td><code>/usr/local/ups/etc/upsd.conf</code></td>
<td>Network listener that accepts clients and informs them about UPS status</td>
</tr>
<tr>
<td>Client</td>
<td>nut-monitor</td>
<td><code>/usr/local/ups/etc/upsmon.conf</code></td>
<td>Network client that connects to a nut-server and reacts locally to UPS status there</td>
</tr>
</tbody>
</table>
<p>For Server <code>/usr/local/ups/etc/ups.conf</code>:</p>
<pre><code>[powercool]
driver=nutdrv_qx
vendorid=0001
productid=0000
product=MEC0003
subdriver=hunnox
langid_fix=0x409
port = auto
</code></pre>
<p>Then start the nut userland driver</p>
<pre><code>$ sudo systemctl enable nut-driver@powercool
$ sudo systemctl start nut-driver@powercool
</code></pre>
<p>For Server <code>/usr/local/ups/etc/upsd.conf</code>:</p>
<pre><code>LISTEN 0.0.0.0 3493
</code></pre>
<p>For Server <code>/usr/local/ups/etc/upsd.users</code>:</p>
<pre><code>[nutuser]
password = nutpassword
actions = set
actions = fsd
instcmds = all
upsmon primary
</code></pre>
<p>… and for the clients <code>/usr/local/ups/etc/upsmon.conf</code> (if older NUT, then
use <code>slave</code> instead of <code>secondary</code>)</p>
<pre><code>MONITOR powercool@myserver 1 nutuser nutpassword secondary
SHUTDOWNCMD "/sbin/shutdown -h +0"
</code></pre>
<p>… on the UPS monitor, set it instead to</p>
<pre><code>MONITOR powercool@localhost 1 nutuser nutpassword primary
SHUTDOWNCMD "/usr/local/ups/bin/shutdown-if-no-mains.sh"
</code></pre>
<p>… and create a file <code>/usr/local/ups/bin/shutdown-if-no-mains.sh</code> containing</p>
<pre><code>#!/bin/sh
INFREQ=`/usr/local/ups/bin/upsc powercool input.frequency | cut -d'.' -f1`
if [ $INFREQ -gt 47 ] ; then
echo "Skipping shutdown as input freq $INFREQ"
exit 0
fi
/sbin/shutdown -h +0
</code></pre>
<p>Also <code>chmod +x</code> that.</p>
<p>Then on the UPS monitor, all of these; on the clients just the last two</p>
<pre><code>$ sudo systemctl enable nut-server
$ sudo systemctl start nut-server
$ sudo systemctl enable nut-driver@powercool
$ sudo systemctl start nut-driver@powercool
$ sudo systemctl enable nut-monitor
$ sudo systemctl start nut-monitor
</code></pre>
<p>From the clients or the UPS monitor RPi, you should be able to run upsc to check
the status of the ups remotely, if the NUT server IP is in /etc/hosts as
<code>ups-monitor</code>, then</p>
<pre><code>$ upsc powercool@ups-monitor
</code></pre>
<p>at the client machine (which is not hooked up to the UPS USB, but uses UPS
backup power) should show something like</p>
<pre><code>battery.voltage: 27.50
device.type: ups
driver.name: nutdrv_qx
driver.parameter.langid_fix: 0x409
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: MEC0003
driver.parameter.productid: 0000
driver.parameter.subdriver: hunnox
driver.parameter.synchronous: auto
driver.parameter.vendorid: 0001
driver.version: 2.7.4-5059-ga8e3687a
driver.version.data: Q1 0.07
driver.version.internal: 0.32
driver.version.usb: libusb-1.0.23 (API: 0x1000107)
input.frequency: 50.0
input.voltage: 244.0
input.voltage.fault: 0.0
output.voltage: 245.0
ups.beeper.status: disabled
ups.delay.shutdown: 30
ups.delay.start: 180
ups.load: 22
ups.productid: 0000
ups.status: OL
ups.temperature: 29.0
ups.type: online
ups.vendorid: 0001
</code></pre>
<a name="NUT-problem:-monitor-deps"></a>
<h3>NUT problem: monitor deps</h3>
<p>Nut has / had a bug on the nut-monitor.service file, it says it goes <code>After:
nut-server</code> but on a client, that is not true. So you may have to hack the
service file to remove this on client-only installs.</p>
<p>Otherwise NUT will not auto-start next reboot.</p>
<a name="NUT-problem:-target-enables"></a>
<h3>NUT problem: target enables</h3>
<p>Related to the above, you must also</p>
<pre><code>$ sudo systemctl enable nut.target nut-driver.target
</code></pre>
<p>… so the deps in the NUT service files can be satisfied.</p>
Centos 9 Stream on RPI4https://warmcat.com/2021/12/12/centos-9-rpi4.html2021-12-12T10:17:40+00:00<a name="Steps-to-use-distro-Centos-9-Stream-on-RPI4"></a>
<h2>Steps to use distro Centos 9 Stream on RPI4</h2>
<p><img src="/2021/12/12/20211212_163045.jpg" alt="UEFI firmware booting splash" /></p>
<p>CentOS 9 stream is out, with native aarch64 ISO, but it doesn’t directly support
rpi4, which is an annoyance. With a bit of fiddling, it can be made to work
nicely.</p>
<p>I didn’t find any articles about it yet, so here is the necessary flow.</p>
<a name="L1.-Start-the-CentOS-9-Stream-ISO-download-going"></a>
<h3>1. Start the CentOS 9 Stream ISO download going</h3>
<p>It’s 6.5GiB so best to get that started first.</p>
<p><code>wget -O c9.iso "https://mirrors.centos.org/mirrorlist?path=/9-stream/BaseOS/aarch64/iso/CentOS-Stream-9-latest-aarch64-dvd1.iso&redirect=1&protocol=https"</code></p>
<a name="L2.-Use-the-latest-eeprom-bootloader-on-the-rpi4"></a>
<h3>2. Use the latest eeprom bootloader on the rpi4</h3>
<p>Download <a href="https://www.raspberrypi.com/software/">rpi-imager</a>, on Fedora at least
this is actually available as a distro package. It automates getting the latest
bootloader and creates a bootable SD image that flashes your rpi4 with it.</p>
<p>A sacrificial SD card is needed, if you only have one RPi4 to do, it can be the
same one you will put CentOS on later. If you have a few RPi4 or expect to get
more, you probably want to dedicate an SD card for this so you can later avoid
having to repeat all this and just boot the Rpi with it one time to update the
EEPROM as needed.</p>
<ul>
<li>insert a sacrifical SD card in your development machine</li>
<li>run <code>rpi-imager</code></li>
<li>Select “Operating System” pulldown</li>
<li>“Misc utility images”</li>
<li>SD Card Boot (assuming that’s your usual boot pattern)</li>
<li>Select “Storage” pulldown</li>
<li>Select the destination SD Card</li>
<li>WRITE</li>
<li>remove the card</li>
</ul>
<p>Boot the Rpi4 one time using this card, it will flash the EEPROM to the latest
bootloader.</p>
<a name="L3.-Use-the-RPI4-2d-aware-UEFI--22-Firmware-22-"></a>
<h3>3. Use the RPI4-aware UEFI “Firmware”</h3>
<p>We have to go a little “off-piste”, but not much. The issue is that the boot
flow special requirements for rpi4 bootloads is not handled by CentOS out of the
box. So we need to own the first partition and put RPi-aware UEFI pieces in
there, and later tell CentOS it can have the rest of the SD card. CentOS gets
the situation (once we tell it at install time later) and it will all fly well.</p>
<ul>
<li>insert the sd card intended for the install in your development box</li>
<li><code>fdisk /dev/<device></code>, eg, <code>/dev/sda</code></li>
<li><code>p</code> to double-check you are looking at the expected device</li>
<li>delete any existing partitions (d 1, d 2) etc</li>
<li>add a new p1 with <code>n</code>, set size to <code>+200M</code></li>
<li><code>t 0c</code> to set the type to FAT32</li>
<li><code>w</code> to write</li>
<li><code>partprobe</code> to make sure we re-read the new partition table</li>
<li><code>mkfs.vfat /dev/sda1</code> to format our new partition</li>
<li><code>sudo mount /dev/sda1 /mnt</code></li>
<li>Browse <a href="https://github.com/pftf/RPi4/releases">here</a> to see the newest available UEFI firmware</li>
<li>Download <code>RPi4_UEFI_Firmware_vX.YZ.zip</code> and unzip it</li>
<li>Copy the unzipped files into your partition <code>sudo cp -rp RPi4_UEFI_Firmware_vX.YZ/* /mnt</code></li>
<li>sudo umount /mnt</li>
<li>remove the sd card and plug in the RPi4</li>
</ul>
<p>UEFI is kind of awful, but it does impose some order on the boot flow rather
than an increasing amount of random SBC flows to support, and lets us more or
less look like a PC install. As part of that, it wants to use ACPI instead of
DT to start the kernel, but that works on upstream kernels nowadays… I don’t
think it’s what we would be using if we could do it over, but that’s how it is.</p>
<a name="L4.-Copy-the-ISO-on-to-a-USB-stick"></a>
<h3>4. Copy the ISO on to a USB stick</h3>
<ul>
<li>insert your USB stick that will hold the Centos ISO image</li>
<li>check with <code>dmesg | tail</code> what sdX your USB device is using</li>
<li>check with <code>mount</code> if anything that was on it got automounted, if so use
<code>sudo umount xxx</code> to unmount them all</li>
<li>Use your favourite tool to copy the iso on to the mass storage device, or
just use <code>sudo dd if=c9.iso bs=1M of=/dev/sdX</code> where sdX will be <code>sda</code> or
whatever.</li>
<li>remove the USB stick and put in a USB3 hole of your rpi4</li>
</ul>
<a name="L5.-Boot-for-the-install"></a>
<h3>5. Boot for the install</h3>
<p><img src="/2021/12/12/20211212_163045.jpg" alt="UEFI firmware booting splash" /></p>
<ul>
<li>It should be enough to just boot, you will see the RPI logo and a 5 second
countdown… you should be able to just leave it and it will show grub for
the CentOS install.</li>
<li>Select Install Centos Stream 9 and wait a bit.</li>
</ul>
<p> <img src="/2021/12/12/20211212_163100.jpg" alt="Install Grub menu" /></p>
<ul>
<li>The graphical install failed for me, I used text install.</li>
</ul>
<p> <img src="/2021/12/12/20211212_163247.jpg" alt="Select text install" /></p>
<ul>
<li>Select the Install Destination item <code>5</code>, then <code>c</code> to continue with the
default device, and on the next “partitioning options” tell it to use the
Unused space on the SD card. This will completely leave your rpi4-aware
p1 you have prepared alone</li>
</ul>
<p> <img src="/2021/12/12/20211212_163517.jpg" alt="Select Use Free Space" /></p>
<ul>
<li>Personally I don’t think LVM is needed for this kind of thing, at the next
“partitioning scheme options” I selected <code>1</code> to just use standard partitions.
Anaconda on Centos creates XFS partitions (as opposed to Fedora’s BTRFS).</li>
<li>Set up the other missing items like a user (best to mark as “administrator”
== entered into sudoers and leave root disabled nowadays) and timezone</li>
<li>I selected minimal and no extras for package selection, you can choose more
things as you like</li>
<li>Start the install, it takes 10 - 20 minutes or so and then you can hit enter
to reboot. It’s logging packages to the display as it goes.</li>
</ul>
<a name="L6.-Remove-the-3GB-restriction-at-UEFI"></a>
<h3>6. Remove the 3GB restriction at UEFI</h3>
<ul>
<li>hit <code>ESC</code> at the UEFI boot part to get to the UEFI config menu</li>
</ul>
<p> <img src="/2021/12/12/20211212_184227.jpg" alt="UEFI toplevel menu" /></p>
<ul>
<li>Select “Device Manager | Raspberry Pi Configuration | Advanced Configuration”</li>
</ul>
<p> <img src="/2021/12/12/20211212_184327.jpg" alt="UEFI toplevel menu" /></p>
<ul>
<li>Change “Limit RAM to 3GB” to disabled, the kernel in CentOS 9 stream is
modern enough to not need it (not sure about the installer kernel, which is
usually a bit older, which is why it is done here now we are done with the
install).</li>
<li><code>F10</code> and <code>Y</code> to save</li>
</ul>
<a name="L7.-Set-the-boot-order-to-be-SD-card-first"></a>
<h3>7. Set the boot order to be SD card first</h3>
<ul>
<li><code>ESC</code> back up to the top level menu and select “Boot Maintenance Manager |
Boot Options | Change Boot Order” and check if Centos 9 is listed as the
first, if not, change the boot order to prefer the install on the SD card,
then <code>F10</code> and <code>Y</code> again</li>
<li><code>ESC</code> up to the top level and select restart.</li>
</ul>
<a name="L8.-Boot-into-CentOS-Stream-9"></a>
<h3>8. Boot into CentOS Stream 9</h3>
<ul>
<li>You should see the GRUB menu coming after the UEFI one times out after 5s,
select the top entry</li>
<li>There’s no feedback to the video console during boot! Just wait a couple of
minutes and it will show the login on the video tty and everything up.</li>
</ul>
<a name="L9.-Jobs-from-inside-the-OS"></a>
<h3>9. Jobs from inside the OS</h3>
<ul>
<li>You will want to ssh in using the user:pw you told it during the
install, and set your ssh pubkey, then disable pw based login.</li>
<li><p>CentOS always had a bizarre packaging split between its core repos (eg, with
libuv package) and what on CentOS 8 was called “powertools” epel repo, this
contained libuv-devel amongst others). On CentOS 9 Stream, they randomly
changed the repo with this stuff to “crb” just to keep life interesting.
Enabling it consists of</p>
<pre><code>$ dnf config-manager --set-enabled crb
$ dnf install \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm \
https://dl.fedoraproject.org/pub/epel/epel-next-release-latest-9.noarch.rpm
</code></pre></li>
</ul>
<p>Enjoy!</p>
Sai CIhttps://warmcat.com/2021/08/21/Sai-CI.html2021-09-30T09:24:52+01:00<a name="Sai-CI"></a>
<h1>Sai CI</h1>
<ul>
<li>Git repo: https://warmcat.com/git/sai</li>
<li>Lws Sai: https://libwebsockets.org/sai</li>
</ul>
<p>Sai is a libwebsockets-based crossplatform CI server and distributed builder
aimed at selfhosting your build testing. It integrates with git hooks and
gitohashi gitweb: it’s behind the mass CI testing used by libwebsockets.</p>
<p><img src="https://warmcat.com/sai-screenshot.png" alt="Sai Screenshot" /></p>
<p>For lws, in all it currently orchestrates 582 builds per push of libwebsockets,
on 30 platforms with a variety of OSes, including big-endian NetBSD. Almost
all the builds also run ctest on the native platform to confirm functionality.</p>
<p>This extreme testing lets us develop and ship using <code>-Wall -Wextra -Werror</code>
even though we support a huge number of toolchains and platforms.</p>
<p>This article discusses</p>
<ul>
<li>Sai</li>
<li>How projects use a <code>.sai.json</code> file to describe their tests</li>
<li>The 19-inch rack that contains the lws builders</li>
<li>How Sai is configured on the builders</li>
<li>Delights and limitations of ctest</li>
</ul>
<a name="Sai"></a>
<h1>Sai</h1>
<p><img src="https://warmcat.com/sai-overview.png" alt="Sai Overview" /></p>
<p>Sai is a CMake / C project dependent on lws, that creates three daemons and some
helpers, <code>sai-server</code> + <code>sai-web</code>, which should run somewhere convenient to
serve https to interested parties, and <code>sai-builder</code>, which runs inside each
environment that wants to offer build service, eg, inside a VM or systemd-nspawn
container, and connects to the <code>sai-server</code> over a wss link that speaks JSON.</p>
<p>To deploy it, you set up hooks in your git repo that inform <code>sai-server</code> of a
new push along with the <code>.sai.json</code> file from the pushed tree. The <code>.sai.json</code>
file lists platforms you want to build on with platform-specific build scripting
and a list of builds / tests you want to run on which platforms.</p>
<p>Sai-server will distribute tests to matching platforms and collect the logs that
come back, along with any artifacts like RPM packages or zip files, all
accessible via sai-web, so, eg you can watch the build and test logs in realtime
from your browser.</p>
<a name="How-projects-use-a--3c-code-3e-.sai.json-3c--2f-code-3e--file-to-describe-their-tests"></a>
<h2>How projects use a <code>.sai.json</code> file to describe their tests</h2>
<p>The project contains a “saifile” <code>.sai.json</code> at the top level, which lists the
global set of platforms and build scenarios it wants to be tested on.</p>
<p>This is from the <a href="https://libwebsockets.org/git/libwebsockets/tree/.sai.json">saifile for lws main</a>,
first you define what platforms your project targets, using structured names
in the form OS/arch/toolchain:</p>
<pre><code>{
"schema": "sai-1",
# We're doing separate install into destdir so that the test server
# has somewhere to go to find its /usr/share content like certs
"platforms": {
"linux-ubuntu-1804/x86_64-amd/gcc": {
"build": "mkdir build destdir;cd build;export CCACHE_DISABLE=1;export SAI_CPACK=\"-G DEB\";cmake .. ${cmake} && make -j && make -j DESTDIR=../destdir install && ctest -j4 --output-on-failure ${cpack}"
},
"linux-ubuntu-2004/x86_64-amd/gcc": {
"build": "mkdir build destdir;cd build;export CCACHE_DISABLE=1;export SAI_CPACK=\"-G DEB\";cmake .. ${cmake} && make -j && make -j DESTDIR=../destdir install && ctest -j4 --output-on-failure ${cpack}"
},
"linux-fedora-32/x86_64-amd/gcc": {
...
</code></pre>
<p>… for each of these tuples, there are build machines that connect to <code>sai-server</code>
and offer build and test services on that platform.</p>
<p>For each platform listed, platform-specific build instructions are provided,
these contain macros like <code>${cmake}</code> and <code>${ctest}</code> that are filled in by the
“configuration” section next in the Saifile.</p>
<p>The list of configurations or “build scenarios” provides definitions for the
macros above, eg, selection of project configuration option sets, and lists out
which of the above platforms to build (and test) it on.</p>
<p>Unless platforms were marked with <code>"default": false</code>, any listed platforms are
selected by default for building entries in the <code>"configurations"</code> section; the
“platforms” member of the configuration lets you strip any defaults (by starting
with “none”), or modify the platform list (with “thisplatform” or
“not thatplatform”) in a comma-separated list.</p>
<pre><code> ...
},
"configurations": {
"default": {
"cmake": "",
"platforms": "w10/x86_64-amd/msvc, w10/x86_64-amd/noptmsvc, freertos-linkit/arm32-m4-mt7697-usi/gcc, linux-ubuntu-2004/aarch64-a72-bcm2711-rpi4/gcc, w10/x86_64-amd/mingw32, w10/x86_64-amd/mingw64, netbsd/aarch64BE-bcm2837-a53/gcc, netbsd/x86_64-amd/gcc, w10/x86_64-amd/wmbedtlsmsvc, openbsd/x86_64-amd/llvm, solaris/x86_64-amd/gcc"
},
"default-noudp": {
"cmake": "-DLWS_WITH_UDP=0",
"platforms": "w10/x86_64-amd/msvc, w10/x86_64-amd/noptmsvc, freertos-linkit/arm32-m4-mt7697-usi/gcc, linux-ubuntu-2004/aarch64-a72-bcm2711-rpi4/gcc, w10/x86_64-amd/mingw32, w10/x86_64-amd/mingw64, netbsd/aarch64BE-bcm2837-a53/gcc, netbsd/x86_64-amd/gcc, w10/x86_64-amd/wmbedtlsmsvc"
},
"fault-injection": {
"cmake": "-DLWS_WITH_SYS_FAULT_INJECTION=1 -DLWS_WITH_MINIMAL_EXAMPLES=1 -DLWS_WITH_CBOR=1",
"platforms": "w10/x86_64-amd/msvc"
},
"esp32-heltec": {
"cmake": "-DLWS_IPV6=0",
"cpack": "esp-heltec-wb32",
"platforms": "none, freertos-espidf/xl6-esp32/gcc"
},
...
</code></pre>
<p>This ends up defining a sparse matrix or platforms vs configurations that
should be built, at the time of writing lws saifile describes 47 distinct
configuration build scenarios (lws has a lot of build options) and 30 platforms,
in all 581 builds for each push.</p>
<p>It may seem excessive, but historically some of the build scenarios were very
prone to silent breakage, for example <code>-DLWS_WITH_NETWORK=0</code> that builds lws
without any networking related code, or <code>-DLWS_WITH_CLIENT=0</code>.</p>
<p>Similarly, since all code ships with <code>-Wall -Wextra -Werror</code>, due to variations
in toolchain warnings and, eg, natural types used for virtual ones like <code>size_t</code>,
it’s possible to add code that blows a warning promoted to an error just on one
specific toolchain, eg, an embedded one that has <code>uint32_t</code> as a <code>long</code> rather than
an int.</p>
<p>By making sure most code goes through most platform builds, these kind of things
can be solved before they reach users.</p>
<a name="Security-approach-of-Sai"></a>
<h1>Security approach of Sai</h1>
<p>The basic approach is that the code being built is somewhat trusted, since we
will allow it to execute build actions on our infrastructure. But there are
steps taken to limit what it can do.</p>
<a name="Security-in-the-build-environment-and-hosts"></a>
<h2>Security in the build environment and hosts</h2>
<p>Inside the build environment, sai builds execute under an unprivileged <code>sai</code>
user and do not have sudo or other access to root.</p>
<p>Network access on the builder is needed so <code>sai-builder</code> can connect out to the
<code>sai-server</code>, but network access is not required inside the builder, although
for lws we use it during ctest.</p>
<p>Many of the build contexts used for lws are implemented as systemd-nspawn, these
are not very hardened against attack on the host from inside. So Sai is aimed
at testing your own code and code you trust.</p>
<a name="Security-for-build-injection"></a>
<h2>Security for build injection</h2>
<p>The git build hook that informs <code>sai-server</code> of new jobs sign the JSON they post
(which contains the .sai.json extracted from the project) with a secret key
that <code>sai-server</code> has the public key for in its configuration JSON, only jobs
correctly signed are accepted.</p>
<a name="The-hardware-Sai-runs-on-for-lws"></a>
<h1>The hardware Sai runs on for lws</h1>
<p><img src="https://warmcat.com/20210828_091925-2.jpg" alt="Lws Sai Rack" /></p>
<p>For lws, the builders are living in a 19 inch rack. Most of the builders are
implemented as systemd-nspawns or qemu VMs in one beefy PC living underneath the
rack, noi. But there are also a bunch of “real” physical builders, eg, for
Apple OSX and iOS.</p>
<table>
<thead>
<tr>
<th>name</th>
<th>type</th>
<th>Physical</th>
</tr>
</thead>
<tbody>
<tr>
<td>linux-ubuntu-xenial/x86_64-amd/gcc/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-ubuntu-2004/x86_64-amd/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-fedora-32/x86_64-amd/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-gentoo/x86_64-amd/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-centos-7/x86_64-amd/gcc</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-centos-8/x86_64-amd/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-centos-8/aarch64-a72-bcm2711-rpi4/gcc</td>
<td>Real</td>
<td>RPi4/8GB</td>
</tr>
<tr>
<td>linux-ubuntu-2004/aarch64-a72-bcm2711-rpi4/gcc</td>
<td>Real</td>
<td>RPi4/4GB</td>
</tr>
<tr>
<td>linux-android/aarch64/llvm</td>
<td>systemd-nspawn / Cross</td>
<td>Noi</td>
</tr>
<tr>
<td>netbsd-iOS/aarch64/llvm</td>
<td>Real / Cross</td>
<td>Mac Mini Intel</td>
</tr>
<tr>
<td>netbsd-OSX-catalina/x86_64-intel-i3/llvm</td>
<td>Real</td>
<td>Mac Mini Intel</td>
</tr>
<tr>
<td>netbsd-OSX-catalina/x86_64-intel-i3/llvm</td>
<td>Real</td>
<td>Mac Mini M1</td>
</tr>
<tr>
<td>freertos-linkit/arm32-m4-mt7697-usi/gcc</td>
<td>systemd-nspawn / Cross</td>
<td>Noi</td>
</tr>
<tr>
<td>windows-10/x86_64-amd/msvc</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>windows-10/x86_64-amd/mingw32</td>
<td>systemd-nspawn / Cross</td>
<td>Noi</td>
</tr>
<tr>
<td>windows-10/x86_64-amd/mingw64</td>
<td>systemd-nspawn / Cross</td>
<td>Noi</td>
</tr>
<tr>
<td>freertos-espidf/xl6-esp32/gcc</td>
<td>systemd-nspawn / Cross / Real</td>
<td>Noi + Heltec ESP32</td>
</tr>
<tr>
<td>freertos-espidf/xl6-esp32/gcc</td>
<td>systemd-nspawn / Cross / Real</td>
<td>Noi + WROVER KIT ESP32</td>
</tr>
<tr>
<td>linux-fedora-32/riscv64-virt/gcc</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-debian-11/x86_64/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-debian-buster/x86_64-amd/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-debian-buster/x86_64-amd32/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-debian-sid/x86_64-amd/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>linux-debian-sid/x86_64-amd32/gcc</td>
<td>systemd-nspawn</td>
<td>Noi</td>
</tr>
<tr>
<td>netbsd/x86_64-amd/llvm</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>openbsd/x86_64-amd/llvm</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>freebsd/x86_64-amd/llvm</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>solaris/x86_64-amd/gcc</td>
<td>VM</td>
<td>Noi</td>
</tr>
<tr>
<td>netbsd-BigEndian/x86_64/llvm</td>
<td>Real</td>
<td>RPi3</td>
</tr>
</tbody>
</table>
<p>Each of these OSes are running <code>sai-builder</code>, which makes an outgoing client
wss connection to the remote <code>sai-server</code> using wss. When <code>sai-server</code> sees
there are jobs needing doing on a particular platform, it farms out the build
tasks to connected builders that offer build services on that platform.</p>
<a name="The-mighty-Noi"></a>
<h2>The mighty Noi</h2>
<p>Noi is a 32 core, 64-thread AMD 3970X box with 64GB RAM, all of the Linux
variations run on it as systemd-nspawn containers, the cross-build platforms run
on those too; it has several QEMU instances, eg, running Windows 10 and Fedora
on RISC-V64.</p>
<p>It also provides a subnet for devices under test, both on Ethernet and presents
its wireless as a local wlan AP for the devices connect to.</p>
<a name="Apple"></a>
<h2>Apple</h2>
<p>OSX and iOS cross builds take place on two physical mac minis, one intel i3 and
one newer M1-based one. These are mounted inside a 1U rack mount case.</p>
<a name="aarch64-RPi3-2f-4"></a>
<h2>aarch64 RPi3/4</h2>
<p>The rack includes several RPI3/4 in a rack mount, these are modern “jellybean”
boards you can casually get for a reasonable price with many available ISOs.</p>
<p>We run Ubuntu 20.10 and CentOS 8 on RPi4s, and NetBSD Big-Endian on an RPi3,
that means we are building and testing on a true physical BE machine, so we
can know we are endian-clean.</p>
<a name="ESP32--2f--Freertos"></a>
<h2>ESP32 / Freertos</h2>
<p>ESP32 is cross-built in a systemd-nspawn on Noi, but there are two physical
ESP32 device plugged in the USB, one heltec and another WROVER-KIT. Lws
includes minimal examples specifically for these</p>
<p>https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/embedded/esp32</p>
<p>These are flashed with the results of the CI build, and it is run on the
physical device via ctest each time.</p>
<p>Sai provides a device access mediator <code>sai-device</code>, which has a config file
decribing device resource connections and types, the ctest script queues for
access to a physical device of a specific type on the builder using that.</p>
<p>Another helper <code>sai-expect</code> handles monitoring the console UART and determining
pass or fail (or timeout) result.</p>
<a name="Windows"></a>
<h2>Windows</h2>
<p>There’s a windows 10 VM running in noi that builds and tests using msvc, there
are also 32- and 64-bit mingw builds, but these don’t run ctest.</p>
<a name="Building-Sai-on-30-platforms"></a>
<h1>Building Sai on 30 platforms</h1>
<a name="Sai-relies-on-lws-services"></a>
<h2>Sai relies on lws services</h2>
<p>Sai-builder than runs on the build platforms relies on lws and Secure Streams
to communicate to sai-server on the Internet, so the first step is to configure
and build lws for the platform.</p>
<p>Lws provides a lot of services to sai such as lws_spawn, that manages spawning
sub- processes in a crossplatform way, as well as preparing child wsi to handle
stdin, stdout and stderr on the spawn. Lws Threadpool is used to encapsulate
the spawned process in a way that allows communication to be synchronized with
the lws event loop cleanly, so logs are uploaded in realtime.</p>
<p>When dealing with so many platforms, there are many quirks that are handled by
these lws apis, for example spawning and handling stdin/out/err on windows and
bsd or OSX are different from Linux, as are handling process killing.</p>
<p>For embedded physical platforms, Sai uses lws to convert UART / USB UART traffic
into wsi, and bring those into a realtime log channel feeding sai-server.</p>
<p>Sai-server is also built around lws and Secure Streams, it handles serving to
the builders using JSON and Secure Streams Serving over wss.</p>
<a name="Sai-Configuration"></a>
<h2>Sai Configuration</h2>
<p>All of the sai daemons and tools take their configuration from <code>/etc/sai/</code>,
these are further separated by daemon, so <code>/etc/sai/builder/</code> etc.</p>
<p>JSON is used to define which logical platforms are offered to sai-server, and
how to reach out from the platform to the sai-servers that the platform wants
to provide services for.</p>
<p>For sai-builder, it lists platform objects in the JSON, each with its own tuple
name, and these are instantiated at sai-server while the connection to the
builder remains up.</p>
Libwebsockets lws_structhttps://warmcat.com/2020/03/27/libwebsockets-lws_struct.html2020-04-28T20:22:07+01:00<a name="Overview"></a>
<h2>Overview</h2>
<ul>
<li>CMake options: <code>-DLWS_WITH_STRUCT_JSON=1</code>, <code>-DLWS_WITH_STRUCT_SQLITE3</code></li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-struct.h">include/libwebsockets/lws-struct.h</a></li>
<li>Implementation (JSON): <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/misc/lws-struct-lejp.c">lib/misc/lws-struct-lejp.c</a></li>
<li>Implementation (sqlite): <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/misc/lws-struct-sqlite.c">lib/misc/lws-struct-sqlite.c</a></li>
<li>Api test (JSON): <a href="https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/api-tests/api-test-lws_struct-json">minimal-examples/api-tests/api-test-lws_struct-json</a></li>
<li>Api test (sqlite): <a href="https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/api-tests/api-test-lws_struct_sqlite">minimal-examples/api-tests/api-test-lws_struct_sqlite</a></li>
</ul>
<a name="Background"></a>
<h2>Background</h2>
<p>Similar to Javascript’s relationship with JSON, where JSON can directly be
converted into JS objects, a lot of C code consuming JSON is basically trying to
turn the JSON into a struct that matches the schema. It’s not trivial to do
that either using traditional JSON -> object model -> struct or a stream parser
like lejp… members may appear in any order and there may be deeper structure
like lists of other objects. The other direction is also common, you have the
information in a struct, and need to translate that to JSON conveniently.</p>
<p>For example, a common pattern is receive JSON representing an object / struct,
validate it and then want to apply it in an sql database. Or, receive a request
in a URL or JSON, and need to return the sql database query results as JSON.
These patterns boil down to <code>JSON -> struct -> SQL</code>, or <code>SQL -> struct -> JSON</code>.</p>
<p><img src="./lws_struct.png" alt="lws_struct overview" /></p>
<p>Doing it by hand is possible if it’s one or two instances, but if it’s the
basic bread-and-butter of your application and is done dozens or hundreds of
times thoughout the code with different structs and schema, having to deal with
it at a low level quickly overwhelms any chance to be able to maintain it.
Trying to manually deal with schemas where, eg, the struct contains a list of
different structs that contain lists of different structs, gets out of hand in
terms of the amount of custom code needed.</p>
<a name="Features"></a>
<h2>Features</h2>
<p><code>lws_struct</code> lets you describe the struct members you want to convert in a table
that can then be used by generic apis to serialize and deserialize your actual
structs or list of structs between JSON, on-heap structs, and sqlite3
interchangeably… lws_dll2 support is built into it, so, eg, it handles
linked-lists of subobjects that can be manipulated before consuming them as
structs. And it also natively uses lwsac for heap storage, so deserialized
objects exist inside a single logical chained heap allocation that can be
destroyed in one step, no matter how much complexity or amount of objects were
allocated inside, without having to walk the objects inside. Strings pointed
to by struct members are also allocated inside the same lwsac.</p>
<p>JSON itself and <code>lws_struct</code> approach to produce explicit schemas burns some
transmission efficiency. But it’s real easy to look at packets and understand
what is going on, and much easier to produce and understand code translating
between JSON / structs / sqlite3. If your usecase is nontrivial, you may care
a lot more about keeping the complexity manageable than some bloat on data.</p>
<a name="Glossary"></a>
<h2>Glossary</h2>
<a name="Serialization"></a>
<h4>Serialization</h4>
<p>The process of encoding some or all members of a struct suitable for storage
or transmission in Sqlite3 or JSON, be they strings, numbers, arrays of objects
etc.</p>
<a name="Deserialization"></a>
<h4>Deserialization</h4>
<p>The process or decoding a Sqlite3 record or JSON back into a C struct, together
with copies of any strings or other objects it referenced.</p>
<a name="Preparing-your-objects-for-lws_struct"></a>
<h2>Preparing your objects for lws_struct</h2>
<p>First you would define your C struct as usual… it can have other members that
are not part of the serialization or, eg, are absent in a particular JSON
object… any structs that are instantiated are zeroed down by default so other
members and unspecified members become zero or NULL by default.</p>
<pre><code> typedef struct mystruct {
lws_dll2_t list; /* not serialized, optional list we are part of */
char fixstring[30];
const char *varstring;
int value;
} mystruct_t;
</code></pre>
<p>To use lws_struct, you would first mark up the serializable members using
mapping helpers that lws defines for you. It’s OK if some members have no
markup, they will be skipped for serialization and deserialized to NULL / 0
until you set them. You only need one of these “maps” per struct type that
you will serialize or deserialize.</p>
<table>
<thead>
<tr>
<th>Member Helper</th>
<th>Functionality</th>
</tr>
</thead>
<tbody>
<tr>
<td>LSM_SIGNED</td>
<td>Signed integer… size will be discovered by sizeof</td>
</tr>
<tr>
<td>LSM_UNSIGNED</td>
<td>Unsigned integer… size will be discovered by sizeof</td>
</tr>
<tr>
<td>LSM_BOOLEAN</td>
<td>true or false… size will be discovered by sizeof</td>
</tr>
<tr>
<td>LSM_CARRAY</td>
<td>C String array… size will be discovered by sizeof</td>
</tr>
<tr>
<td>LSM_STRING_PTR</td>
<td><code>const char *</code> string pointer</td>
</tr>
<tr>
<td>LSM_LIST</td>
<td>A <code>lws_dll2_t</code> list of other objects (ie, <code>[ {...}, ... ]</code>)</td>
</tr>
<tr>
<td>LSM_CHILD_PTR</td>
<td>A single pointer to an object of a given type</td>
</tr>
</tbody>
</table>
<p>The general format is to map the type to the member name in the type, and
an export name for the member, this is used as a JSON field name and as an
sqlite3 schema field name for the member.</p>
<pre><code>const lws_struct_map_t lsm_mystruct[] = {
LSM_CARRAY (mystruct_t, fixstring, "fixstring"),
LSM_STRING_PTR (mystruct_t, varstring, "varstring"),
LSM_UNSIGNED (mystruct_t, value, "value"),
};
</code></pre>
<p>It is possible to model serializable, typed, lists-of-objects in members, but
for simplicity let’s just stick with these simple types. The arguments to the
helper list the members twice, the first is the member name in the struct,
and the second is the name in JSON or sqlite column.</p>
<p>The toplevel types used with lws_struct need at least one “schema” description.</p>
<pre><code>const lws_struct_map_t lsm_schema_map_mystruct[] = {
LSM_SCHEMA (mystruct_t, NULL,
lsm_mystruct, "mystruct-schema-name"),
};
</code></pre>
<p>The last entry is used as a member <code>.schema</code> in JSON, or the table name to use
for this type of object in sqlite3. It’s possible to have different SCHEMA
structs using the same <code>lws_struct_map_t</code> so you can use different names for
the JSON schema and sqlite3 table cases.</p>
<p>The toplevel schema naming allows pattens like receiving arbirtrary <code>lws_struct</code>
messages for different purposes, and using the schema to understand what kind
of message you have and what type of struct it would instantiate to, basically
a polymorphic deserialization to the correct C type.</p>
<a name="lws_struct-for-JSON"></a>
<h2>lws_struct for JSON</h2>
<p>When lws_struct produces JSON output, it includes a “schema” entry with the
name given above, “mystruct-schema-name”. When it’s asked to parse JSON back
into an object, it checks through the array of schemas it was given to find out
which matching object to instantiate, for the above, a <code>mystruct_t</code>.</p>
<p>The code to parse the incoming JSON object into a struct is</p>
<pre><code> struct lejp_ctx ctx;
lws_struct_args_t a;
mystruct_t *ms;
memset(&a, 0, sizeof(a));
a.map_st[0] = lsm_schema_map_mystruct;
a.map_entries_st[0] = LWS_ARRAY_SIZE(lsm_schema_map_mystruct);
a.ac_block_size = 512;
lws_struct_json_init_parse(&ctx, NULL, &a);
m = (int)(signed char)lejp_parse(&ctx, in, len);
if (m < 0) {
lwsl_notice("%s: JSON decode failed '%s'\n",
__func__, lejp_error_to_string(m));
return m;
}
if (!a.dest)
return 1;
/* parsed object is pointed-to by a.dest, a.top_schema_index says
* which schema it is, 0 = first in map array, etc */
switch (a.top_schema_index) {
case 0:
ms = (mystruct_t *)a.dest;
...
break;
}
...
lwsac_free(&a.ac); /* destroy everything from the parse action */
</code></pre>
<p>You can see it easily extends to being able to parse a bunch of different
schemas into different structs if the map array contained more SCHEMA entries.</p>
<p>Conversely, the code to emit a single JSON from a struct is like this</p>
<pre><code> ... mystruct_t *ms ...
uint8_t buf[1024], *start = buf, *end = buf + sizeof(buf) - 1, *p = start;
lws_struct_serialize_t *js = lws_struct_json_serialize_create(
lsm_schema_map_mystruct,
LWS_ARRAY_SIZE(lsm_schema_map_mystruct), 0, ms);
size_t w;
if (!js)
return -1;
lws_struct_json_serialize(js, p, end - p, &w);
lws_struct_json_serialize_destroy(&js);
/* w = number of bytes used from p */
</code></pre>
<p>This will emit something like</p>
<pre><code>{
"schema": "mystruct-schema-name",
"fixstring": "whatever",
"varstring": "blah",
"value": 1
}
</code></pre>
<a name="JSON-serialization-of-lists"></a>
<h2>JSON serialization of lists</h2>
<p>You can use a slightly different schema type to indicate that you will give an
lws_dll2 list of the structures (<code>list</code> is the member name of the list in
mystruct_t) instead of a pointer to the structure itself</p>
<pre><code>
typedef struct mystruct_owner {
lws_dll2_owner_t mylist; /* list of mystructs */
} mystruct_owner_t;
static const lws_struct_map_t lsm_mystruct_owner[] = {
LSM_LIST (mystruct_owner_t, mylist, mystruct_t,
list, NULL, lsm_mystruct, "mylist"),
};
const lws_struct_map_t lsm_schema_map_mystruct_list[] = {
LSM_SCHEMA_DLL2 (mystruct_owner_t, mylist, NULL, lsm_mystruct_owner,
"mystruct-list-schema-name"),
};
</code></pre>
<p>It still points to the same underlying lsm_mystruct definition above, but
instead of just binding the schema name to that it introduces a top level
lws_dll2_t list and identifies the list in the objects. This way it’s the
list owner that is passed in as the thing that’s actually being dumped. It will
produce output like this (with the number of elements in the mylist <code>[...]</code>
reflecting the number of entries in the list</p>
<pre><code>{
"schema": "mystruct-list-schema-name",
"mylist": [
{
"schema": "mystruct-schema-name",
"fixstring": "whatever",
"varstring": "blah",
"value": 1
},
{
"schema": "mystruct-schema-name",
"fixstring": "something",
"varstring": "something else",
"value": 2
}
]
}
</code></pre>
<p>Using the same <code>lws_struct</code> metadata that produced this, the recipient can
turn it back into the same structs, including the object list as an lws_dll2.
And again reusing the same metadata, it can store those structs in sqlite3, and
recover them back from there later into structs.</p>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<ul>
<li><p>If your data is following a lifecycle of JSON for transport, in a struct
for processing, and maybe Sqlite for storage, <code>lws_struct</code> can help formalize
handling it at each step and drastically reduce the code involved</p></li>
<li><p>Although the member description is overhead, you only have to do it once
and it works for both JSON and sqlite cases. It’s also smaller and much easier
to maintain than the equivalent code in all 3 cases in both directions.</p></li>
</ul>
Libwebsockets lwsachttps://warmcat.com/2020/03/26/libwebsockets-lwsac.html2020-03-29T19:15:14+01:00<a name="Overview"></a>
<h2>Overview</h2>
<ul>
<li>CMake options: <code>LWS_WITH_LWSAC</code> (default on)</li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-lwsac.h">include/libwebsockets/lws-lwsac.h</a></li>
<li>Implementation: <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/misc/lwsac/lwsac.c">lib/misc/lwsac/lwsac.c</a></li>
</ul>
<p>Lwsac (“Allocated Chunks”) is a chained heap allocator that can be very useful
when you know you need heap for:</p>
<ul>
<li>an unknown but possibly large number of objects</li>
<li>objects that may be different sizes</li>
<li>storage for, eg, strings pointed to by objects</li>
<li>are for a related purpose</li>
<li>may tend to reference each other</li>
<li>share the same lifetime, when the job is done they all need to be freed</li>
</ul>
<p>The basic trick is that in one lwsac object, you can chain together multiple
<code>malloc</code> type heap allocations and suballocate inside that in an automanaged
linked-list; if it happens you don’t have enough space for the next
suballocation (or “use” in lwsac language), it simply malloc’s up another chunk
for the chain that’s at least as big as you need (chunks in the lwsac can all be
different sizes without problems). The individual “uses” are contiguous blocks
without gaps, although there can’t be assumed to be any relationship between
the addresses of two different “uses” other than they’re all in the one logical
lwsac.</p>
<p><img src="./lwsac.png" alt="lwsac overview" /></p>
<p>lwsac comes into its own when a) you don’t know at the start how much heap
you’re going to have to use, and b) your suballocations have relationships to
each other, for example, structs that live in the lwsac point to other structs
also using the same lwsac… the “use” suballocations simply move forward the
chunk high water mark to the next alignment boundary and are not tracked
otherwise at all. You cannot free() individual uses, since nothing tracks them,
but only allocate more at the head of the lwsac chunk list.</p>
<p>When it’s time to destroy the whole related complex of suballocations, lwsac
just goes through the chunk list freeing those… it doesn’t track what’s inside
just gets rid of the lwsac chunks everything was sitting on in one step. So
it’s very simple to bulk-destroy the related allocations, you don’t have to
walk everything to the nth degree just make one call to wrap it all up.</p>
<a name="lwsac-initialization"></a>
<h2>lwsac initialization</h2>
<p>The user footprint is only an <code>lwsac_t *</code> that is initially NULL… when it’s
used, this points to the first chunk which also has a header that describes
the lwsac itself. When it’s eventually freed, all the chunks are freed and
the user <code>lwsac_t *</code> set to NULL again. So there is no init necessary, just
make sure your <code>lwsac_t *</code> starts off as NULL.</p>
<a name="Using-space-inside-the-lwsac"></a>
<h2>Using space inside the lwsac</h2>
<p>The main api is</p>
<pre><code>void *
lwsac_use(struct lwsac **head, size_t ensure, size_t chunk_size);
</code></pre>
<p>You can simply point to your <code>lwsac_t *</code>, ask for <code>ensure</code> bytes of
contiguous memory, optionally set the size for any new chunk (0 defaults to
4KB chunk or the size of ensure if it’s bigger than that). You’ll either
get <code>NULL</code> returned if it’s OOM, or a pointer aligned to <code>sizeof(void *)</code> you
can immediately use for your use.</p>
<p>Because there’s no overhead except alignment, unlike malloc() where there is
considerable overhead for small sized allocations, there is basically no
overhead to “use” the lwsac for, eg, string bodies that were pointed-to by
<code>const char *</code> members in a struct you are instantiating the the lwsac.</p>
<p>It’s quite possible you only need <code>lwsac_use()</code> and <code>lwsac_free()</code> below, but
there’s also a useful variant</p>
<pre><code>void *
lwsac_use_zero(struct lwsac **head, size_t ensure, size_t chunk_size);
</code></pre>
<p>…which does the same as <code>lwsac_use()</code>, but zeroes down the memory before
handing the pointer back to the caller.</p>
<p>If memory is really tight, you can use another variant</p>
<pre><code>void *
lwsac_use_backfill(struct lwsac **head, size_t ensure, size_t chunk_size);
</code></pre>
<p>if a requested “use” won’t fit into the tail chunk free space, then this causes
lwsac to go check in all the prior chunks if any of their free space would fit
it without having to do any new allocation. If you might have a lot of chunks
this is slower, but if some of your “use” sizes are small this can save you some
heap if that is critical.</p>
<a name="Freeing-the-whole-lwsac"></a>
<h2>Freeing the whole lwsac</h2>
<p>There is no api to free individual uses, they are only tracked with a high-
water mark inside the chunk they are in, so there is no overhead.</p>
<pre><code>void
lwsac_free(struct lwsac **head);
</code></pre>
<p><code>lwsac_free()</code> frees any chunks in the chain in the lwsac and sets you pointer
to NULL. Depending on what your code does, it may “use” tens of thousands of
objects in only a few dozen lwsac chunks… when it comes time to discard them
all, it means you are done after a few dozen calls to free().</p>
<p>You may want to pass the lwsac around for it to be processed by different
chunks of code, for that it might be convenient to reference-count who still
has things in the lwsac they don’t want deleted. You can use <code>lwsac_reference()</code>
and <code>lwsac_unreference()</code> to do the free conditionally on the last reference
becoming unreferenced.</p>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<ul>
<li><p>Having an open-ended amount of structures or data in heap does not mean
having to use <code>strdup()</code> and individual <code>malloc()</code> type semantics, nor having
to walk what might be a very complex set of objects at <code>free()</code>-time.</p></li>
<li><p><code>lwsac</code> gives you a chained, chunked allocator within which you can make
suballocations without taking any care about tracking or planning ahead for
alloc sizes. Just ask to “use” whatever size you need next.</p></li>
<li><p>When you are done with the whole related allocation, you can just free the
underlying chained chunks, like cleaning up a kids birthday part by gathering
up the disposable tablecloth and throwing it and whatever was in it into the
garbage in one step.</p></li>
</ul>
Libwebsockets lws_dll2https://warmcat.com/2020/03/25/libwebsockets-lws_dll2.html2020-03-29T11:32:57+01:00<a name="Overview"></a>
<h2>Overview</h2>
<ul>
<li>CMake options: part of core lws</li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-dll2.h">include/libwebsockets/lws-dll2.h</a></li>
<li>Implementation: <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/core/lws_dll2.c">lib/core/lws-dll2.c</a></li>
</ul>
<p>lws_dll2 is a smart doubly-linked list that simplifies lists
while remaining without implementation restrictions or bloat.</p>
<p>lws_dll2 is used throughout lws and is a public export for a
few versions now. It’s mature, flexible, robust and leads to
restriction-free, readable and maintainable code. And it’s more
featureful than generic linked-list abstractions.</p>
<a name="Basic-approach"></a>
<h2>Basic approach</h2>
<p><img src="./lws_dll2.png" alt="lws_dll2" /></p>
<p>There are two objects involved, a singular <code>lws_dll2_owner_t</code> and
in every object that will join the list, a <code>lws_dll2_t</code>. This provides some
useful characteristics:</p>
<ul>
<li><p>you compose <code>lws_dll2_owner_t</code> and <code>lws_dll2_t</code> objects into
your objects that own and participate in lists. The lws_dll2 apis take care of
maintaining all the relationships between objects.</p></li>
<li><p>all <code>lws_dll2...</code> objects themselves point to <code>lws_dll2...</code>
objects, not your objects you composed them into. You can recover
your object pointer using <code>lws_container_of()</code> explained below</p></li>
<li><p>your objects can exist on multiple lists with different owners,
you need one <code>lws_dll2_t</code> in your object per list it will
participate on</p></li>
<li><p><code>lws_dll2_</code> objects can go anywhere in your struct, they don’t have to go
at the start or anything like that</p></li>
<li><p>you must remove your object from any lists it is on before
destroying it</p></li>
<li><p>you can remove an <code>lws_dll2_t</code> from its owner without having to track or
know who the owner is. Objects that may belong to different owners (eg, pending
and done lists) are simple to implement.</p></li>
<li><p>if your object is on an list owner in another object, you can always recover
a pointer to the owning object for free</p></li>
</ul>
<a name="lws_dll2-common-Apis"></a>
<h2>lws_dll2 common Apis</h2>
<table>
<thead>
<tr>
<th>api</th>
<th>description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>lws_dll2_add_head</strong>(lws_dll2_t <em>d, lws_dll2_owner_t </em>owner)</td>
<td>insert at start of owner list</td>
</tr>
<tr>
<td><strong>lws_dll2_add_tail</strong>(lws_dll2_t <em>d, lws_dll2_owner_t </em>owner)</td>
<td>insert at end of owner list</td>
</tr>
<tr>
<td><strong>lws_dll2_remove</strong>(lws_dll2_t *d)</td>
<td>remove from owner list</td>
</tr>
<tr>
<td><strong>lws_dll2_get_head</strong>(lws_dll2_t *d)</td>
<td>get pointer to first object’s lws_dll2_t on list</td>
</tr>
<tr>
<td><strong>lws_dll2_get_tail</strong>(lws_dll2_t *d)</td>
<td>get pointer to last object’s lws_dll2_t on list</td>
</tr>
<tr>
<td><strong>lws_dll2_foreach_safe</strong>(lws_dll2_owner_t <em>owner, void </em>user, int (<em>cb)(struct lws_dll2 </em>d, void *user))</td>
<td>Callback for each entry on owner’s list</td>
</tr>
<tr>
<td><strong>lws_dll2_add_sorted</strong>(lws_dll2_t <em>d, lws_dll2_owner_t </em>own, int (<em>compare)(const lws_dll2_t </em>d, const lws_dll2_t *i))</td>
<td>Use compare to figure out where to insert object into owner’s list</td>
</tr>
</tbody>
</table>
<a name="Example"></a>
<h2>Example</h2>
<p>Say you have your own struct already that you want to “own” a list, you just
need to embed an <code>lws_dll2_owner_t</code> member in it.</p>
<pre><code> struct mystruct {
...
lws_dll2_owner_t mylistowner;
...
};
</code></pre>
<p>you can have n of those and there’s no requirements about where they go in the
struct.</p>
<p>Then in the struct type that will appear on an lws_dll2 list</p>
<pre><code> struct myitems {
...
lws_dll2_t list;
...
}
</code></pre>
<p>Again you can have n of these, ie, an object can appear on multiple lws_dll2_
lists, so long as there is a different <code>lws_dll2_t</code> per list it’s on, and no
ordering requirement.</p>
<p>No init is required to use the list objects except the should be zeroed-down
from the start… you can explicitly ensure this by using <code>lws_dll2_clear()</code>
and <code>lws_dll2_owner_clear()</code> if necessary.</p>
<p>To add to the list</p>
<pre><code> struct mystruct *mystruct;
struct myitems *myitem;
myitem = zalloc(sizeof(*myitem));
if (!myitem)
return 1;
lws_dll2_add_tail(&myitem->list, &mystruct->mylistowner);
</code></pre>
<p>To walk the items on the list (using a callback)</p>
<pre><code>static int
my_cb(lws_dll2_t *p, void *user)
{
struct myitems *myitem = lws_container_of(o, struct myitems, list);
...
return 0;
}
...
lws_dll2_foreach_safe(mystruct->mylistowner, somepointer, mycb);
</code></pre>
<p>This calls back <code>mycb</code> with a pointer to every <code>struct myitem</code> on the list…
<code>somepointer</code> is passed to the callback’s <code>user</code> parameter, it can be NULL or
something of interest to the callback.</p>
<p>Notice that inside the callback, the pointer to <code>.list</code> used by lws_dll2 is
converted back into a pointer to the <code>struct myitems</code> that contains the list,
where it can be used as normal.</p>
<p>The <code>_safe</code> in <code>lws_dll2_foreach_safe()</code> refers to it being safe to remove the
object from the owner during the callback (and, if desired, destroy it), without
disturbing the list walk action.</p>
<p>In the case you want to walk the list inline, there’s a slighly more complex
way using a helper, it looks like</p>
<pre><code> lws_start_foreach_dll_safe(struct lws_dll2 *, p, p1, mystruct->listowner.head)
struct myitems *myitem = lws_container_of(p, struct myitems, list);
...
lws_end_foreach_dll_safe(p, p1)
</code></pre>
<p>the functionality is the same as <code>lws_dll2_foreach_safe()</code> but you may find it
more convenient to access local vars that are in scope without passing them
into a callback. p and p1 are arbitrary temp symbols that should not conflict
with anything already in scope and do not need additional init or declaration.</p>
<a name="Sorted-insertion"></a>
<h2>Sorted insertion</h2>
<p>Although it’s done linearly, under some conditions the <code>lws_dll2_add_sorted()</code>
may save you a lot of difficulty, it scans the existing list from the head
forward calling a provided callback to determine if the new item should be
inserted there… how it assesses the sorting order is arbitrary and is up to
the callback you provide… it could compare strings in the objects, or numbers
or some more complex function.</p>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<ul>
<li><p>linked-lists are highly desirable abstractions because they don’t waste
unused space and suffer from limits of preallocated arrays.</p></li>
<li><p><code>lws_dll2</code> offers bidi lists and a single object can appear on multiple
lists.</p></li>
<li><p>Objects know which list their <code>lws_dll2</code> is on and how many items are on
a list cheaply</p></li>
<li><p>Walking lists and recovering object pointers is cheap</p></li>
<li><p>Sorted link-list insertion with user-defined comparison callback is
supported.</p></li>
</ul>
Libwebsockets Secure Streamshttps://warmcat.com/2020/03/15/libwebsockets-secure-streams.html2020-03-15T18:14:30+00:00<a name="Overview"></a>
<h2>Overview</h2>
<ul>
<li>CMake options: <code>-DLWS_WITH_SECURE_STREAMS=1</code>, others</li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-secure-streams.h">include/libwebsockets/lws-secure-streams.h</a></li>
<li>Implementation: <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/secure-streams">./lib/secure-streams/</a></li>
<li>Examples: <a href="https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/secure-streams">./minimal-examples/secure-streams</a></li>
</ul>
<p>LWS Secure Streams is a method for abstracting out client connection policy
information from your code and allowing it to be set in (typically devicewide)
JSON. One way to think of it is the “client version” of <a href="https://libwebsockets.org/git/libwebsockets/tree/lwsws">lwsws</a>, where a generic
server is configured by JSON to be whatever mix of vhosts, proxies, cgi, etc you
want, placed where you like in the server url space without having to write
code to do it. (Everything on libwebsockets.org and warmcat.com is served by
lwsws with a JSON specification down <code>/etc/lwsws/conf.d/</code>).</p>
<p>Secure Streams goes further in that it even abstracts away the <strong>protocol
choice</strong> as well as the endpoint, tls validation stack, backoff / retry tables
etc; protocol-specific details are set in the JSON policy, not in the code and
that includes whether it happens to be transferring payloads in MQTT, or h1, or
h2, or ws.</p>
<p>Most of the boilerplate code needed for lws level code goes away, like the
protocol and protocol callbacks. And since the code doesn’t have to care what
wire protocol it’s using any more, the policies may be remotely loaded at
startup and what and how your device connects to cloud or other peers becomes
something that can be changed after the fact without changing any code.</p>
<p>One important change is <strong>the logical Secure Stream represents the desire for a
connection of a particular type, its lifecycle is not affected by actual tcp
connections coming up or going down and being reconnected</strong>, it exists for as long
as there’s a desire for the type of connection. This is in contrast to working
directly at lws layer, where <code>wsi</code> have a lifecycle that is completely bound up
with the socket connection they represent.</p>
<a name="Example-policy"></a>
<h2>Example policy</h2>
<p>You can find an example of a JSON policy here</p>
<p>https://warmcat.com/policy/minimal-proxy.json</p>
<p>The basic idea is that the system can use various named “stream types” from the
JSON policy; all the details about how that stream type connects and acts are
in the policy for it. Yes, including the protocol choice. Eg,</p>
<pre><code> "avs_metadata": {
"endpoint": "alexa.na.gateway.devices.a2z.com",
"port": 443,
"protocol": "h2",
"http_method": "POST",
"http_url": "v20160207/events",
"opportunistic": true,
"h2q_oflow_txcr": true,
"http_auth_header": "authorization:",
"http_auth_preamble": "Bearer ",
"http_multipart_name": "metadata",
"http_mime_content_type": "application/json; charset=UTF-8",
"http_no_content_length": true,
"rideshare": "avs_audio",
"retry": "default",
"tls": true,
"tls_trust_store": "avs_via_starfield"
}
</code></pre>
<a name="Secure-streams-user-apis"></a>
<h2>Secure streams user apis</h2>
<p>If there’s no lws protocol stuff, and it’s supposed to be independent of the
wire protocol… what does the remaining code look like then? When you create
a stream, you provide three callbacks in a const struct describing the
streamtype name in the policy and how your code interfaces to it:</p>
<ul>
<li><p><code>rx()</code> receives whatever deframed payload has appeared on the connection</p></li>
<li><p><code>tx()</code> provides a buffer, of a size chosen by Secure Streams but typically
around 1400 bytes, which you may write payload into for immediate sending</p></li>
<li><p><code>state()</code> is updated with the connection situation and remote acks</p></li>
</ul>
<p>There’s an api <code>lws_ss_request_tx()</code> which will be a familiar idea to lws users,
it schedules a <code>tx()</code> callback for the stream when possible. The are some other
misc helpers, but that is basically it. You don’t tell it where to connect in
the code or if it’s MQTT or h1 or h2 or ws, you tell it you want a stream of
a particular name, it looks it up in the JSON policy to find out how and where
to connect it.</p>
<p>You send and receive <strong>payloads</strong>, how they get framed and deframed isn’t really
relevant. Protocol-specific details go in the JSON policy, like special http
headers.</p>
<a name="Dealing-with-dynamic-metadata"></a>
<h2>Dealing with dynamic metadata</h2>
<p>This is enough to cover basic situations well… fundamentally if you are
sending JSON back and forth, that JSON payload is the same deal whether it went
by h1 or h2 or whatever. Anything other than the JSON is just a distraction.</p>
<p>However although you might not choose to architect things that break this model
if you are interested in it, sometimes you don’t control the remote endpoint,
and its apis may insist to make more than full use of whatever horrible quirks
your protocol offers, like nonstandard http headers containing dynamic info or
multipart mime.</p>
<p>Secure Streams allows the JSON stream type definitions to declare metadata
names, which may be set dynamically on the stream using <code>lws_ss_set_metadata()</code>
and then used in headers or other protocol-specific policy information like the
endpoint name itself using <code>${metadata}</code> type connection- and transfer- time
string substitutions.</p>
<a name="Updating-the-JSON-policy"></a>
<h2>Updating the JSON policy</h2>
<p>Secure Streams defines a special streamtype “fetch_policy”… if this is
defined in the hardcoded policy at context creation time, then when the
network is up and the ntp time acquired so tls can work, lws will follow the
policy in that stream type to fetch updated policy JSON and switch to that.
This allows modification of the devicewide communications policy after devices
are shipped (eg, switch cloud provider) without needing an OTA / reflash.</p>
<a name="Hardcoded-policy"></a>
<h2>Hardcoded policy</h2>
<p>In lws v4.1+, Secure Streams also supports a policy converted from JSON to
explicit structs to be built into code directly, if in some cases the platform
is too restricted to handle JSON. That way you can still maintain the policy
in JSON but autoconvert it to structs in a header file at build time, <a href="https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/secure-streams/minimal-secure-streams-staticpolicy/static-policy.h">like this</a>.</p>
<a name="Endpoint-tls-validation"></a>
<h2>Endpoint tls validation</h2>
<p>The JSON policy allows you to define x.509 certs (in Base64 DER strings) and
create named stacks of certificates in a “trust store”. These trust store
names can be associated with individual streamtypes to control the tls
validation process entirely from the JSON policy. It’s also possible to
skip specifying explicit trust stores if there’s a system trust store available
as there typically is with OpenSSL, but it may be preferable to directly
control which one or two issuer root certs can be accepted.</p>
<p>Because this is handled in the policy which may be updated remotely, managing
root cert updates is made simple and doesn’t need an OTA or reflash.</p>
<a name="Using-Secure-Streams-with-multiple-processes"></a>
<h2>Using Secure Streams with multiple processes</h2>
<p>In the case there’s a single process doing the communication, because there’s
a “communication daemon” kind of architecture or it’s a single statically-linked
RTOS image, then Secure Streams connections can be fulfilled directly.</p>
<p>However if it’s a Linux-class device with multiple separate processes that want
to send stuff, there are important pressures to make the best of muxed protocols
so different processes can potentially share a single tcp connection and tls
tunnel for, eg, h2 or even MQTT. One Linux-class client device might not feel
the pressure but at the server side, it doesn’t scale the same if each device is
making two or four connections back each with its own tls tunnel, and you pay a
bill for the additional resources.</p>
<p>For that reason, lws supports a Secure Stream Proxy mode, where the proxy that
has the policy and actually fulfils the connections runs as its own process, and
applications forward their serialized payloads to it over Unix Domain Sockets.
This can be selected at cmake with <code>-DLWS_WITH_SECURE_STREAMS_PROXY_API=1</code> and
additionally <code>LWS_SS_USE_SSPC</code> must be defined when the client applications are
built, so they transparently use a second implementation of the same apis that
uses the Secure Streams Proxy to get things done.</p>
<p>The proxy itself is provided in the minimal examples.</p>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<ul>
<li><p>For devices that are predicated on client connections, Secure Streams is a
layer on top of lws that separates out all connection policy into JSON… that
includes enpoint selection and tls validation certs etc but even the transport
protocol is decided by the JSON policy.</p></li>
<li><p>The stuff that’s left related to the connection is radically simplified and
just deals with payloads</p></li>
<li><p>Logical Secure Streams outlast any specific connection underneath, and can
reacquire underlying connections when needed</p></li>
</ul>
Libwebsockets string processing helpershttps://warmcat.com/2020/03/09/libwebsockets-tokenizer-strexp.html2020-03-09T09:16:45+00:00<a name="Overview"></a>
<h2>Overview</h2>
<ul>
<li>CMake option: part of core lws</li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-misc.h">include/libwebsockets/lws-misc.h</a></li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-purify.h">include/libwebsockets/lws-purify.h</a></li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-tokenize.h">include/libwebsockets/lws-tokenize.h</a></li>
<li>Implementation: <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/core/libwebsockets.c">./lib/core/libwebsockets.c</a></li>
<li>Api unit tests: <a href="https://libwebsockets.org/git/libwebsockets/tree/minimal-examples/api-tests/api-test-lws_tokenize">./minimal-examples/api-tests/api-test-lws_tokenize</a></li>
</ul>
<a name="Introduction"></a>
<h2>Introduction</h2>
<p>Writing string processing in C is a dangerous and unrewarding occupation, you
have to always be mindful of how the code will respond to malicious input. And
to operate well on machines with very limited resources, without limiting the
size of the input you can handle, sometimes it’s desirable to operate on C
strings in memory, but other times it’s desirable to be able to be able to
operate in a stateful way on partial blocks of input without restriction, ie,
with complete immunity to any fragmentation attack because it’s designed for
this case from the start.</p>
<p>Libwebsockets has a bunch of important safe helpers like wrappers on strncpy
and snprintf that guarantee there won’t be overflows and there will be a NUL, as
well as more complex helpers… let’s look at the simple ones first.</p>
<a name="L-3c-code-3e-lws_strncpy-28--29--3c--2f-code-3e-"></a>
<h2><code>lws_strncpy()</code></h2>
<p>You might be surprised to learn that libc <code>strncpy()</code> fails at being safe, it
can return without having applied a NUL at the end. <code>lws_strncpy()</code> is an
alternative that will always safely truncate at the limit.</p>
<p>lws_strncpy() adjusts back the size limit by 1 to always make space for the
NUL. For that reason, you can just feed it <code>sizeof(dest)</code> as the length limit
safely.</p>
<a name="L-3c-code-3e-lws_strnncpy-28--29--3c--2f-code-3e-"></a>
<h2><code>lws_strnncpy()</code></h2>
<p>This is a variation of <code>strncpy()</code> that takes two limit numbers at the end,
the lowest of the two numbers is used to restrict or truncate the copy and a
NUL is always applied at the end of the destination. This is useful when you
want to copy a string where you know the length of the source string, but there
is no terminating NUL in the source. With this you can do, eg,</p>
<pre><code> lws_strnncpy(dest, src_no_NUL, srclen, destlen);
</code></pre>
<p>and under all conditions end up with a NUL-terminated, possibly truncated copy
of the string in dest. This is used widely in lws to cover for the fact that
some platforms do not provide the “%.*s” format string option that allows
printing non-NUL-delimited strings where you give the length.</p>
<p><code>destlen</code> is corrected back by 1 to allow for the NUL at the end, so again it’s
safe to set this to <code>sizeof(dest)</code>.</p>
<a name="L-3c-code-3e-lws_snprintf-28--29--3c--2f-code-3e-"></a>
<h2><code>lws_snprintf()</code></h2>
<p>This is the swiss army knife of string generation, it’s a safe version of
snprintf(). It uses the platform <code>vsnprintf()</code> but guarantees destination NUL
termination. It’s very convenient for additively composing on to a large string
where the return value which is the destination length, is used to advance
where the next <code>lws_snprintf()</code> will write to.</p>
<p>In the event we reached the end of the string already, it will return 0, at the
end of the sequence the result string length can be compared to the destination
size to discover if we “crumpled up at the end”. But it won’t crash or blow
past the end of the destination in the meanwhile.</p>
<pre><code> char dest[1234];
size_t n = 0;
n += lws_snprintf(dest + n, sizeof(dest) - n, "%s etc", etc);
n += lws_snprintf(dest + n, sizeof(dest) - n, "%s etc", etc);
n += lws_snprintf(dest + n, sizeof(dest) - n, "%s etc", etc);
if (n >= sizeof(dest) - 1)
/* truncated */
</code></pre>
<a name="L-3c-code-3e-lws_purify_-3c--2f-code-3e-"></a>
<h2><code>lws_purify_</code></h2>
<p>Lws provides “purification” helpers for arguments that will be expressed in
JSON, sqlite, or filenames including externally-provided input. For the sqlite
and JSON versions, they use escaping so their arguments are formatted like
<code>strncpy()</code>, ie, dest, source, dest len.</p>
<p>For <code>lws_filename_purify_inplace()</code> as the name suggests it purifies the file
name inplace, by replacing scary characters like <code>..</code> or <code>/</code> in the filename
as <code>_</code>.</p>
<p>Notice that in the worst case, JSON escaping can turn one character into six
and the whole string may consist of those if it’s an attack. So you should
allow that the destination is 6x the size of the input.</p>
<a name="L-3c-code-3e-lws_tokenize-3c--2f-code-3e--Overview"></a>
<h2><code>lws_tokenize</code> Overview</h2>
<p>Given a UTF-8 string, <code>lws_tokenize</code> robustly and consistently separates it into
syntactical units, following flags given to it to control how ambiguous things
should be understood by it. By default, contiguous characters in a token are
only alphanumeric and <code>_</code>, but the flags can modify that.</p>
<p>Whitespace outside of quoted strings is swallowed by the parser, so it is
immune to different behaviours based on different types or amounts of whitespace
inbetween tokens. It silently consumes it where it’s valid and just reports the
delimiter or token abutting it. Similarly if comments are enabled, the comments
are silently and wholly swallowed.</p>
<p><code>lws_tokenize</code> is designed to operate on an all-in-memory chunk that typically
covers “one line”, using it chunked is possible but code outside lws_tokenize
must collect enough chars to cover whole tokens first, whatever that means for
your use-case.</p>
<p>It’s use-cases cover decoding small or large strings easily and robustly where
lws_tokenize already took care of syntax-level error checking like correctness
of comma-separated lists or float format, and rejected nonsense… user code
just has to look at the flow of tokens and delimiters and decide if that’s valid
for its purpose. For example, lws_tokenize is helpful decoding header content
where the header has some structure but it otherwise quite free-form… it’s
difficult for user code to parse from scratch without missing some validation or
introducing bugs, but much easier to deal with a stream of tokenized tokens and
delimiters that already restricted the syntax.</p>
<p><code>lws_tokenize</code> also covers complex usage like parsing config files robustly,
including comments.</p>
<a name="Token-types"></a>
<h3>Token types</h3>
<table>
<thead>
<tr>
<th>Type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWS_TOKZE_ENDED</td>
<td>We found a NUL and parsing has completed successfully</td>
</tr>
<tr>
<td>LWS_TOKZE_DELIMITER</td>
<td>Some character that can’t be in a token appeared, like <code>,</code></td>
</tr>
<tr>
<td>LWS_TOKZE_TOKEN</td>
<td>A token appeared, like <code>my_token</code>, this is reported as a unit</td>
</tr>
<tr>
<td>LWS_TOKZE_INTEGER</td>
<td>A token that seems to be an integer appeared, like <code>1234</code></td>
</tr>
<tr>
<td>LWS_TOKZE_FLOAT</td>
<td>A token that seems to be a float appeared, like <code>1.234</code></td>
</tr>
<tr>
<td>LWS_TOKZE_TOKEN_NAME_EQUALS</td>
<td>A token followed by <code>=</code> appeared</td>
</tr>
<tr>
<td>LWS_TOKZE_TOKEN_NAME_COLON</td>
<td>A token followed by <code>:</code> appeared (only if flag <code>LWS_TOKENIZE_F_AGG_COLON</code> enabled)</td>
</tr>
<tr>
<td>LWS_TOKZE_QUOTED_STRING</td>
<td>A quoted string appeared, like <code>"my,s:t=ring"</code></td>
</tr>
</tbody>
</table>
<a name="Parsing-Errors"></a>
<h3>Parsing Errors</h3>
<table>
<thead>
<tr>
<th>Error</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWS_TOKZE_ERR_COMMA_LIST</td>
<td>We were told to expect a comma-separated list, but we saw things like “,tok” or “tok,,”</td>
</tr>
<tr>
<td>LWS_TOKZE_ERR_NUM_ON_LHS</td>
<td>We encountered nonsense like 123=</td>
</tr>
<tr>
<td>LWS_TOKZE_ERR_MALFORMED_FLOAT</td>
<td>We saw a floating point number with nonsense, like “1..3” or “1.2.3” (float parsing can be disabled by flag)</td>
</tr>
<tr>
<td>LWS_TOKZE_ERR_UNTERM_STRING</td>
<td>We saw a <code>"</code> and started parsing a quoted string, but the string ended before the close quote</td>
</tr>
<tr>
<td>LWS_TOKZE_ERR_BROKEN_UTF8</td>
<td>We encountered a UTF-8 sequence that is invalid</td>
</tr>
</tbody>
</table>
<a name="Parser-modification-flags"></a>
<h3>Parser modification flags</h3>
<p>There are many different conventions for tokenizing depending on what you’re
doing… the default is restrictive in that only alphanumeric and <code>_</code> can be in
a token, but for different cases you want to modify this. There are several
flags allowing selection of a suitable parsing regime for what you’re doing</p>
<table>
<thead>
<tr>
<th>Flag</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>LWS_TOKENIZE_F_MINUS_NONTERM</td>
<td>treat - as part of a token, so <code>my-token</code> is reported as one token, not <code>my</code> - <code>token</code></td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_AGG_COLON</td>
<td><code>token:</code> or <code>token :</code> should be reported as a special token type <code>LWS_TOKZE_TOKEN_NAME_COLON</code>, not <code>token</code> <code>:</code></td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_COMMA_SEP_LIST</td>
<td>Enforce comma-separated list syntax, eg “a”, or “a, b” but not “,a” or “a, b,”</td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_RFC7230_DELIMS</td>
<td>Allow more chars in a token following http style</td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_DOT_NONTERM</td>
<td>Allows, eg, “warmcat.com” to be treated as one token</td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_NO_FLOATS</td>
<td>This allows you to process, eg, “192.168.0.1” as a token instead of a floating point format error</td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_NO_INTEGERS</td>
<td>Don’t treat strings consisting of numbers as integers, just report them as a string token</td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_HASH_COMMENT</td>
<td>Take a <code>#</code> on the line as meaning the rest of the line is a comment</td>
</tr>
<tr>
<td>LWS_TOKENIZE_F_SLASH_NONTERM</td>
<td>Allow <code>/</code> inside string tokens, so <code>multipart/related</code> is a single token</td>
</tr>
</tbody>
</table>
<a name="Typical-usage"></a>
<h3>Typical usage</h3>
<pre><code>{
struct lws_tokenize ts;
char *str;
...
str = "mytoken1, mytoken, my-token";
lws_tokenize_init(&ts, str, LWS_TOKENIZE_F_NO_INTEGERS |
LWS_TOKENIZE_F_MINUS_NONTERM);
do {
ts.e = lws_tokenize(&ts);
switch (ts.e) {
case LWS_TOKZE_TOKEN:
/* token is in ts.token, length ts.token_len */
break;
case LWS_TOKZE_DELIMITER:
/* delimiter is in ts.token[0] */
...
break;
case LWS_TOKZE_ENDED:
/* reached end of string and tokenizer had no objections */
...
break;
default:
}
} while (ts.e > 0 && /* still space in output buffer */);
...
}
</code></pre>
<a name="L-3c-code-3e-lws_strexp-3c--2f-code-3e--Overview"></a>
<h2><code>lws_strexp</code> Overview</h2>
<p><code>lws_strexp</code> implements generic streaming, stateful, string expansion for
embedded symbols like <code>${mysymbol}</code> in an input of unlimited size chunked to
arbitrary sizes for both input and output. It doesn’t deal with the symbols
itself but passes instances of the symbol name that needs substitution to a
user-provided callback as they are found, where it privately looks up the symbol
and emits the substituted data inline.</p>
<p>Neither the input nor the output needs to be all in one place at one time, and
either can be arbitrarily fragmented down to single-byte buffers safely, so this
api is immune to fragmentation type attacks. Any size input and output can be
processed without using any heap other than a ~64-byte context object and the
input and output chunk buffers; depending on what you’re doing all of these
can be on the stack.</p>
<table>
<thead>
<tr>
<th>expansion api return</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>LSTRX_DONE</td>
<td>We reached the end OK</td>
</tr>
<tr>
<td>LSTRX_FILLED_OUT</td>
<td>We filled up the output buffer, but once you spilled it, we need to continue</td>
</tr>
<tr>
<td>LSTRX_FATAL_NAME_TOO_LONG</td>
<td>Met a name longer than 31 chars</td>
</tr>
<tr>
<td>LSTRX_FATAL_NAME_UNKNOWN</td>
<td>Callback reported it doesn’t know the symbol name</td>
</tr>
</tbody>
</table>
<a name="Example-usage"></a>
<h2>Example usage</h2>
<p>The symbol substitution should look like this, to be able to deal with the
arbitrary output chunking</p>
<pre><code>int
exp_cb1(void *priv, const char *name, char *out, size_t *pos, size_t olen,
size_t *exp_ofs)
{
const char *replace = NULL;
size_t total, budget;
if (!strcmp(name, "test")) {
replace = "replacement_string";
total = strlen(replace);
goto expand;
}
return LSTRX_FATAL_NAME_UNKNOWN;
expand:
budget = olen - *pos;
total -= *exp_ofs;
if (total < budget)
budget = total;
memcpy(out + *pos, replace + (*exp_ofs), budget);
*exp_ofs += budget;
*pos += budget;
if (budget == total)
return LSTRX_DONE;
return LSTRX_FILLED_OUT;
}
</code></pre>
<p>… and performing the substitution…</p>
<pre><code> size_t in_len, used_in, used_out;
lws_strexp_t exp;
char obuf[128];
int n;
lws_strexp_init(&exp, NULL, exp_cb1, obuf, sizeof(obuf));
/* for large input, you would do this in a loop */
n = lws_strexp_expand(&exp, in, in_len, &used_in, &used_out);
if (n != LSTRX_DONE) {
lwsl_err("%s: lws_strexp failed: %d\n", __func__, n);
return 1;
}
</code></pre>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<ul>
<li><p>If you deal with strings that have internal structure, C can require a lot
of code that is unforgiving with security issues and difficult to switch around
after it’s written, or extend without getting a ratsnest.</p></li>
<li><p>The tokenizer provides your code with robust, well-formed tokens and
delimiters, and hides details like whitespace and if selected, comma-separated
list sequencing.</p></li>
<li><p>You can configure it at runtime for a variety of kinds of situation</p></li>
<li><p>You can very easily deploy ${symbol} string substitution without needing the
input or output in one place at one time and even if the substitution is huge.</p></li>
</ul>
Libwebsockets Lightweight Embedded JSON Stream Parserhttps://warmcat.com/2020/03/08/libwebsockets-lejp.html2020-03-07T13:07:23+00:00<a name="Overview"></a>
<h2>Overview</h2>
<ul>
<li>CMake option: <code>LWS_WITH_LEJP</code> (default ON)</li>
<li>Public header: (included by libwebsockets.h) <a href="https://libwebsockets.org/git/libwebsockets/tree/include/libwebsockets/lws-lejp.h">include/libwebsockets/lws-lejp.h</a></li>
<li>Implementation: <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/misc/lejp.c">./lib/misc/lejp.c</a></li>
<li>Helper app: <a href="https://libwebsockets.org/git/libwebsockets/tree/test-apps/test-lejp.c">./test-apps/test-lejp.c</a></li>
<li>Example: <a href="https://libwebsockets.org/git/libwebsockets/tree/lib/secure-streams/policy.c">./lib/secure-streams/policy.c</a></li>
</ul>
<a name="Introduction"></a>
<h2>Introduction</h2>
<p>JSON is deservedly very popular, and there are a lot of JSON parsers to choose from.</p>
<p>With JSON, one trades off minimizing the representation of the data for readability
and extensability. Often this tradeoff is gratefully accepted by programmers with
tears in their eyes, compared to having to deal with, log, debug and maintain some
proprietary binary coding directly. If you bloated some binary coding from 4 bytes
sent very infrequently to 400 bytes, but you or the next guys are going to be able to
understand, debug and extend that 20 years from now easily, plus casually read the
logs and understand who was trying to do what, it’s a great deal.</p>
<p>But depending on what you’re doing, while you may be able to accept increasing the
size of the data transferred accordingly, you may not have the memory at the receiving
side to even store all the JSON in one place at one time, let alone instantiate parser
objects for what might be a deep hierarchy.</p>
<p>Most JSON parser libraries require the JSON to be in one linear array and then
transform it into objects which can be walked by the user code to transform it once
again, before destroying the parser objects. In this model, the existence of the
JSON all in one place and the generation of the JSON object model overlap, meaning
the peak heap usage is both together… you can free the JSON after creating the
model, but then you are creating your user representation in heap too.</p>
<p>The JSON parsing action can’t commence until all the JSON has been received and is
in one place, and the transformation into the parser object model has happened.</p>
<p>In addition many implementations recurse, with potentially large impact on stack
usage.</p>
<a name="lws-LEJP-model"></a>
<h2>lws LEJP model</h2>
<p><img src="./lejp-1.png" alt="lejp approach" /></p>
<p>lws offers a JSON stream parser with insanely amazing characteristics compared to
“the usual”.</p>
<ul>
<li><p>LEJP is a stateful <strong>stream parser</strong>, it means it processes whatever data is coming
in as it comes in, ie, it does not need all the JSON in one place at one time but
processes and discards each chunk as it becomes available. “chunk” size can be as low
as 1 byte, so it’s completely immune to fragmentation issues.</p></li>
<li><p>it does not allocate any heap, at all, and has a fixed-size parsing context object of
around 560 bytes on a 32-bit machine that exists while the parsing is ongoing. As JSON
objects are parsed, a user callback is informed and can produce the related user
objects directly. Peak heap is drastically reduced compared to having a JSON parser
model in memory as the user’s object is created… there is no JSON parser model.</p></li>
<li><p>it does not recurse on the stack, at all, it manages its own parsing stack inside
the LEJP parsing object</p></li>
<li><p>it handles floats as a type of string, ie, does not bring float types into the picture
itself; your user code can choose whether to use floating point itself or, eg, fractional
scaling using integers</p></li>
<li><p>In the case of long strings, the strings are “chunked” into a 254-byte buffer (already
allocated in the parsing context object) and passed to user code together with information
about if this chunk is the beginning and / or end. So huge strings are supported cleanly
without huge buffers or needing it all in one place at one time.</p></li>
<li><p>the code size for all this generic functionality on 32-bit ARM is 2.1KB!</p></li>
</ul>
<a name="Understanding-the-parsing-model"></a>
<h2>Understanding the parsing model</h2>
<p>lws includes a helpful test app for LEJP that’s built and installed with lws when LEJP is
enabled at cmake. This allows you to parse arbitrary strings from stdin and decompose
them to LEJP’s parsing events and paths, so you can see the correct paths to “program”
the lejp context with for your schema. For example, with this in <code>/tmp/my.json</code></p>
<pre><code>{
"schema":"xxx",
"uid":1004,
"len":194,
"timestamp":1641458307868,
"channel":2,
"finished":0,
"task_uuid":"2a31db22f1180d77734ccaee5af18472c733bce4078405eadc8568d8173eb855"
}
</code></pre>
<p>You can find out the paths and events that lejp will use to parse it with the test tool</p>
<pre><code>$ cat /tmp/my.json | libwebsockets-test-lejp
[2020/03/07 10:55:49:3890] N: libwebsockets-test-lejp (C) 2017 - 2018 andy@warmcat.com
[2020/03/07 10:55:49:3891] N: usage: cat my.json | libwebsockets-test-lejp
[2020/03/07 10:55:49:3891] N: LEJPCB_CONSTRUCTED: path match 0 statckp 0
[2020/03/07 10:55:49:3891] N: LEJPCB_START: path match 0 statckp 0
[2020/03/07 10:55:49:3892] N: LEJPCB_OBJECT_START: path match 0 statckp 0
[2020/03/07 10:55:49:3892] N: path: 'schema' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3892] N: LEJPCB_PAIR_NAME: path schema match 0 statckp 6
[2020/03/07 10:55:49:3892] N: LEJPCB_VAL_STR_START: path schema match 0 statckp 6
[2020/03/07 10:55:49:3892] N: value 'xxx' (LEJPCB_VAL_STR_END)
[2020/03/07 10:55:49:3892] N: path: 'uid' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3892] N: LEJPCB_PAIR_NAME: path uid match 0 statckp 3
[2020/03/07 10:55:49:3892] N: value '1004' (LEJPCB_VAL_NUM_INT)
[2020/03/07 10:55:49:3892] N: path: 'len' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3893] N: LEJPCB_PAIR_NAME: path len match 0 statckp 3
[2020/03/07 10:55:49:3893] N: value '194' (LEJPCB_VAL_NUM_INT)
[2020/03/07 10:55:49:3893] N: path: 'timestamp' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3893] N: LEJPCB_PAIR_NAME: path timestamp match 0 statckp 9
[2020/03/07 10:55:49:3893] N: value '1641458307868' (LEJPCB_VAL_NUM_INT)
[2020/03/07 10:55:49:3893] N: path: 'channel' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3893] N: LEJPCB_PAIR_NAME: path channel match 0 statckp 7
[2020/03/07 10:55:49:3893] N: value '2' (LEJPCB_VAL_NUM_INT)
[2020/03/07 10:55:49:3893] N: path: 'finished' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3893] N: LEJPCB_PAIR_NAME: path finished match 0 statckp 8
[2020/03/07 10:55:49:3894] N: value '0' (LEJPCB_VAL_NUM_INT)
[2020/03/07 10:55:49:3894] N: path: 'task_uuid' (LEJPCB_PAIR_NAME)
[2020/03/07 10:55:49:3894] N: LEJPCB_PAIR_NAME: path task_uuid match 0 statckp 9
[2020/03/07 10:55:49:3895] N: LEJPCB_VAL_STR_START: path task_uuid match 0 statckp 9
[2020/03/07 10:55:49:3895] N: value '2a31db22f1180d77734ccaee5af18472c733bce4078405eadc8568d8173eb855' (LEJPCB_VAL_STR_END)
[2020/03/07 10:55:49:3895] N: LEJPCB_OBJECT_END: path task_uuid match 0 statckp 9
[2020/03/07 10:55:49:3895] N: Parsing Completed (LEJPCB_COMPLETE)
[2020/03/07 10:55:49:3895] N: LEJPCB_COMPLETE: path task_uuid match 0 statckp 9
[2020/03/07 10:55:49:3896] N: okay
[2020/03/07 10:55:49:3896] N: LEJPCB_DESTRUCTED: path task_uuid match 0 statckp 9
</code></pre>
<a name="Setting-up-the-lejp-context"></a>
<h2>Setting up the lejp context</h2>
<p>There are three pieces to the puzzle… first, specify paths you want to easily
identify from your callback. These are matched before the callback gets called
and reduced to a single uint8_t in <code>ctx.path_match</code>, so if there are just some
patterns you are interested in you can list them here. 0 in <code>path_match</code> means
no match, and 1+ means matched <code>path_match - 1</code>.</p>
<pre><code>static const char * const paths[] = {
"release",
"product",
"schema-version",
"via-socks5",
"retry[].*.backoff",
"retry[].*.conceal",
"retry[].*.jitterpc",
...
};
typedef enum {
LSSPPT_RELEASE,
LSSPPT_PRODUCT,
LSSPPT_SCHEMA_VERSION,
LSSPPT_VIA_SOCKS5,
LSSPPT_BACKOFF,
LSSPPT_CONCEAL,
LSSPPT_JITTERPC,
...
}
</code></pre>
<p>Patterns like <code>name[]*.entry</code> follow the path scheme lejp uses to track
its path during parsing. You can feed the test example your json and
watch what is happening in ctx.path to see how it works.</p>
<p>Specify your callback that handles parsing events from lejp</p>
<pre><code>static signed char
cb(struct lejp_ctx *ctx, char reason)
{
...
return 0;
}
</code></pre>
<p><code>reason</code> is one of <code>enum lejp_callbacks</code>, describing the reason for the callback.</p>
<pre><code> struct lejp_ctx ctx;
lejp_construct(&ctx, cb, NULL, paths, LWS_ARRAY_SIZE(paths));
m = lejp_parse(&ctx, (uint8_t *)buf, n);
if (m < 0 && m != LEJP_CONTINUE) {
...
}
lejp_destruct(&ctx);
</code></pre>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<p><img src="./dohedo-work.png" alt="we got work" /></p>
<ul>
<li>You can have a full-featured JSON parser in a couple of KB suitable for a microcontroller</li>
<li>It handles subobjects, arrays of objects, huge strings floats-as-strings etc</li>
<li>It doesn’t use any heap, and doesn’t recurse</li>
<li>It’s stateful and can process the JSON in arbitrary chunks as it comes in</li>
</ul>
Libwebsockets reaches v4https://warmcat.com/2020/03/07/libwebsockets-v4.html2020-03-07T07:53:23+00:00<p><img src="/2020/03/07/libwebsockets.org-logo.svg" alt="libwebsockets.org-logo.svg" /></p>
<a name="Major-update-for-lws--28-and-new-logo-29-"></a>
<h2>Major update for lws (and new logo)</h2>
<p>Libwebsockets has been going for ten years now, it’s slowly growing in popularity
in an organic way. Since it’s FOSS and freely available, it’s difficult to know
how many users it has since they typcially just clone it and start using it, and
only talk to me if there’s a problem.</p>
<p>The stats that are available are a bit indirect, it has 2.5K stars on github,
consistently > 200 subscribers to activity notifications there, and > 200 people
on its mailing list for many years now; there’s somewhere between 1 - 2000 clones
a week. Occasionally I happen to find out by contributions, accident, by <code>@xxx</code>
emails, emails from IP lawyers, or affiliations that it is in use at many major
names, and is part of at least tens of millions of devices working well in the field,
I can guess the real number is many times the ones I know of.</p>
<a name="License-change-to-MIT"></a>
<h2>License change to MIT</h2>
<p>Earlier in 2019 I changed master’s license to MIT after announcing it on the ml and
waiting some weeks for objections; at that time I made a v3.2 release which was the
last LGPLv2.1 version. The new v4 release is the first stable release using MIT.</p>
<p>The license change has its risks but I am guessing it will increase uptake, although
at the same time make usage of it even more opaque since MIT can be silently used in
proprietary firmware pefectly well. And generally, the more relevant lws is, the more
chance I can continue to get paid to work on FOSS.</p>
<p>Since the change, contributions have continued to appear at about the same rate as
when it was LGPLv2.1, which is a relief; it also suggests that in my situation
anyway, trying to mandate contribution via the license is maybe no more effective
as having strong forward progress in the project and constant cleaning / refactoring.
As well as reducing the risk of non-sharing forks possible with MIT, these factors
also tend to make keeping Out-of-tree (OOT) patches very expensive and scary and by
contrast, giving fixes upstream so they are already there and tested by others when
you update is highly preferable to users, whatever the license. Typically these
contributions are not some huge IP grant, they are just cleaning up my mistakes, so
there’s no downside for contributors. And companies in regions that tend to not care
much about license details, may be more responsive to practical considerations.</p>
<p>Anyway we will see over time if that was a good decision.</p>
<a name="New-and-not-well-2d-known-lws-features"></a>
<h2>New and not well-known lws features</h2>
<p>There are a huge number of features in lws that are just in there and hidden away,
many or off by default in CMake and people are unaware of the cool things that are
available and just a CMake config setting away. So I hope to use this post as a
placeholder for pointers to a series of blog posts exploring individual feaures and
philosophical approaches in lws.</p>
<ul>
<li><a href="/2020/03/08/libwebsockets-lejp.html">lws JSON Stream parser LEJP</a></li>
<li><a href="/2020/03/09/libwebsockets-tokenizer-strexp.html">lws String handling helper apis</a></li>
<li><a href="/2020/03/25/libwebsockets-lws_dll2.html">lws dll2 linked-list</a></li>
<li><a href="/2020/03/26/libwebsockets-lwsac.html">lwsac chained allocator</a></li>
<li><a href="/2020/03/15/libwebsockets-secure-streams.html">lws Secure Streams</a></li>
</ul>
RISC-V and Microsemi Polarfire on Fedora 27https://warmcat.com/2018/02/01/Microsemi-PolarFire.html2018-02-01T15:51:45+08:00<p><img src="/2018/02/01/japanese-sky.jpg" alt="Japanese Sky" /></p>
<a name="Introduction"></a>
<h2>Introduction</h2>
<p>Microsemi is one of many vendors that are cozying up to RISC-V… SiFive’s next
dev board for their 64-bit silicon, which is eagerly awaited, marries an
engineering sample 64-bit physical chip with a Microsemi Polarfire FPGA.</p>
<p><a href="https://www.microsemi.com/products/fpga-soc/fpga/polarfire-fpga">Polarfire web page</a></p>
<p>Interestingly - tellingly - you can find mention of RISC-V soft-core right on
the landing page above, but you have to download the overview PDF</p>
<p><a href="https://www.microsemi.com/document-portal/doc_download/136518-po0137-polarfire-fpga-product-overview-datasheet">Polarfire product overview PDF</a></p>
<p>…before there is mention of an Arm Cortex M3 “system controller” on the
FPGA die. Having some kind of “system controller” in there makes a lot of
sense, it’s not such a main presence as on Zynq or Altera / Intel variants
with more powerful Arm chips on the die and integrated to the DDR controller
and other buses, but then it’s not so expensive. Still, the fact it is left
off the headline features altogether in favour of soft RISC-V, is an interesting
straw in the wind.</p>
<a name="SiFive--22-Unleashed-22--platform"></a>
<h2>SiFive “Unleashed” platform</h2>
<p>I attended a Japanese “RISC-V day” in December, which a friend was helping to
run. There Jack Kang, the Si-Five CEO showed slides about this forthcoming
“unleashed” board… it’s not secret; here is the PR</p>
<p><a href="https://www.prnewswire.com/news-releases/sifive-and-microsemi-expand-relationship-with-strategic-roadmap-alignment-and-a-linux-capable-risc-v-development-board-300562439.html">Unleashed board PR</a></p>
<p>… and here is a little more information via Microsemi…</p>
<p><a href="https://www.microsemi.com/products/fpga-soc/mi-v-embedded-ecosystem/boards-solutions#partner-boards">Unleashed board via Microsemi</a></p>
<p>The concept is hard silicon 64-bit RISC-V with a PCIe type link to the FPGA;
the Polarfire is blessed with very high speed differential IO. So with this
dev board, it will become possible to implement FPGA HDL peripherals that appear
direct on the CPU bus.</p>
<p><img src="/2018/02/01/unleashed.png" alt="Unleashed FPGA architecture" /></p>
<p>Although RISC-V cores are standardized, and some acceleration infrastructure,
their direct interest ends at the ISA. So unlike Arm’s strategy with many
generic “Primecell” IPs you can also rent from them, which are standardized and
have Linux drivers already, RISC-V do not provide “house versions” of anything
other than the core, not even a serial port IP.
This will inevitably lead to greater fragmentation than found even in the Arm
ecosystem, for no benefit. Unfunded FOSS hackers who
want to throw together soft IPs inside an FPGA will also meet a brick wall, as
discussed later.</p>
<p>Update 2018-02-04: there’s a <a href="https://www.sifive.com/products/hifive-unleashed/">public release</a> of the board now</p>
<p><img src="/2018/02/01/FU540-board-vert.jpg" alt="Unleashed board" /></p>
<p>But no mention of the FPGA / Companion board.</p>
<a name="RISC-2d-V-seems-to-have-made-its-case"></a>
<h2>RISC-V seems to have made its case</h2>
<p>There have been many news articles the last year or so about RISC-V I won’t
bother regurgitating. From a FOSS perspective though, there have not been any
significant contributions outside the core itself being permissively licensed.
News that companies like WDC are, wisely, dropping Arm and rolling their own
is clearly important (especially to Arm) but these chips are entirely
proprietary and the change helps only WDC save money. It has no visible
implication for FOSS users since they use all their chips internally.</p>
<p>Likewise there are presumably going to be customers for Si-Five’s “supported
and characterized” 64-bit HDL, but existing locked-down devices simply moving
from Arm to RISC-V to save the vendor money also does nothing for the FOSS
commons.</p>
<a name="Celerity"></a>
<h3>Celerity</h3>
<p>Celerity:</p>
<p><a href="https://cseweb.ucsd.edu/~mbtaylor/papers/Celerity_CARRV_2017_paper.pdf">Celerity paper</a></p>
<p>have a cool use for RISC-V basically making a GPU-type architecture with vast
numbers (496 in the first iteration) of generic 32-bit RISV-V nodes inside and
five 64-bit RISC-V nodes to act as controllers.</p>
<p><img src="/2018/02/01/celerity1.png" alt="Celerity Architecture" /></p>
<p>This is a contribution to the FOSS commons</p>
<p><a href="http://opencelerity.org">Celerity.org</a></p>
<p>and the design is FOSS.</p>
<a name="RISC-2d-V-with-OoO-and-competitive-performance"></a>
<h3>RISC-V with OoO and competitive performance</h3>
<p>The canonical RISC-V designs have a 4-stage fixed pipeline scheme, ie, no
OoO / speculation putting the 64-bit design on a par with an RPi3 performance-
wise (of course it depends on process). However Esperanto (IP vendor) showed a
slide on an OoO 64-bit RISC-V design for use “at the top end”, ie, a modern
Armv8 competitor:</p>
<pre><code>ET-Maxion RISC-V Processor
- ET-Maxion will be the highest single thread performance 64-bit RISC-V
processor
- Allow RISC-V to be positioned alongside highest performance processors.
- Enable companies to go RISC-V from top to bottom.
- Reduces threat of retaliation by eliminating need to go to another
architecture at high end.
- Provide a viable high-end alternative for companies wanting to make the
transition to RISC-V.
Performance goals:
- Single thread integer performance comparable to the best IP cores
available from market leaders.
- Great Linux performance to run OS and applications.
Technical features:
- 64-bit RISC-V RV64GC instruction set
- Starting from BOOM v2, but expect substantial changes
- Out-of-order pipeline
- Multiple levels of cache
- Multiprocessor support
- Optimized for 7nm CMOS
- Will be used in Esperanto's products and made available as a licensable core.
</code></pre>
<p><a href="http://riscv.tokyo/2017/download/154/">Esperanto ET-Maxion</a>, see p8</p>
<p>“Reduces threat of retaliation by eliminating need to go to another
architecture at high end” eh… whatever can they mean :-)</p>
<p>These are proprietary implementations of the permissively-licensed RISC-V
design, so they are not for free. However, since they are based on free stuff,
they are going to be a lot cheaper than Arm’s proprietary-only designs.</p>
<p>This OoO design is perhaps the reason for the slightly careful phrasing during RISC-V’s
<a href="https://riscv.org/2018/01/more-secure-world-risc-v-isa/">ritual humiliation</a> of
Arm and Intel over Spectre, they could only say “no announced RISC-V silicon” suffers
from Spectre. I guess Esperanto were all over their design when they learned
about the generic vulnerability.</p>
<p>The general feeling at the meeting was RISC-V was here and has made its case.
There were two Westerners sitting together way at the back arms folded,
looking miserable.</p>
<p>Certainly it’s an uphill battle for Arm to continue asking for money - per-chip
money as well as upfront - when there is a permissively-licensed solution in
production by their customers' competitors. Again the main problem for RISC-V is
you can go to Arm and rent almost all the related IPs that are guaranteed to
play well together, including GPU, but RISC-V just has the CPU part of the puzzle.</p>
<a name="Libero-for-PolarFire-on-Fedora-27"></a>
<h2>Libero for PolarFire on Fedora 27</h2>
<p>I wrote to SiFive while sitting in the audience to register to get a preview
Unleashed board, and was added to the list. These are supposed to be coming
around March, so after completing a consultancy job I thought it was time to
check out the Microsemi FPGA toolchain, since installing these binary-only crapfests are usually several nightmares chained together.</p>
<a name="Effect-of-free-but-non-2d-Free-IP-on-FOSS-projects"></a>
<h3>Effect of free but non-Free IP on FOSS projects</h3>
<p>I talked to Ted Speers from Microsemi during the lunch Q&A there,
about a problem specific to FOSS FPGA designs, that the IPs provided with the
FPGA toolchain are restrictively licensed and nonredistributable. He gave a
generic “I don’t know what you are talking about but I will look into it”
answer and I heard no more.</p>
<p>This is a serious problem when you want to provide
full sources, because, eg, on the HDMI analyzer project I have written about
before, it uses a source-level instantiation of the Xilinx IP DMAC as part of
itself… this is free to instantiate but under a nonredistributable license.
You don’t care if you are producing a proprietary product, because the source
is never distributed. But this actually stops you making FOSS projects using
Zynq.</p>
<p>I don’t hold out much hope for this with Microsemi’s toolchain because the
Synopsys stuff wants to download its IPs into a “vault”. But I did not get far
enough yet to find out the licensing situation for FOSS for their pieces or
even what pieces they have.</p>
<a name="Download-and-install"></a>
<h3>Download and install</h3>
<p>I registered and downloaded a 6.3GB (!) toolchain tarball from their website, it
unpacked into two zipfiles (2 and 4 GB respectively), a Java installer and a
README.</p>
<p><a href="http://soc.microsemi.com/download/reg/download.aspx?p=f=LiberoSoC_PolarFire_v2_LIN">Download Page</a></p>
<p>The Java app insisted I scroll down to the bottom of the license, but that was
not entirely successful under openjdk…</p>
<p><img src="/2018/02/01/libero-license.png" alt="Libero Java installer license" /></p>
<p>It worked OK after that to install… it took about 15 minutes spent at 4% then 100% and done. It didn’t install any DE / GUI link to start the thing that Gnome knew, and the README that was unpacked didn’t give any clue either.</p>
<a name="Motif-library-problem-1"></a>
<h3>Motif library problem 1</h3>
<p>Trying <code>./Libero_PolarFire_v2.0/Libero/bin/libero</code> got me</p>
<pre><code>Error: Could not locate the Motif library in LD_LIBRARY_PATH
</code></pre>
<p>and exit. Googling the error message found this… guys why not point to it
in the README dropped by the installer?</p>
<p><a href="https://www.microsemi.com/document-portal/doc_view/132361-how-to-set-up-your-linux-environment-for-libero">132361-how-to-set-up-your-linux-environment-for-libero pdf</a></p>
<a name="Licensing-problem-1"></a>
<h3>Licensing problem 1</h3>
<p>So before looking at the immediate error, this informs me
“<strong>Step 1—Download License Daemons, License
File, and Set Up Licensing on License Server</strong>”… this again. It’s again (as the Lattice tools) a network-MAC locked
license, free as in beer for the $0 toolchain, but not free as in blood pressure. Of
course with Linux you can force your MAC to whatever, so pointless. Clearly Synopsys are to blame since they are forcing all the vendors into that madness.</p>
<p>The setup is really messy… you have to download a “license server daemon” for your platform. In the download are 8 binaries, no docs</p>
<pre><code>$ ls -l Linux_Licensing_Daemon/
total 13944
-rwxrwxr-x. 1 agreen agreen 1190636 Sep 17 2016 actlmgrd
-rwxrwxr-x. 1 agreen agreen 1221728 Sep 17 2016 lmgrd
-rwxrwxr-x. 1 agreen agreen 1065316 Sep 17 2016 lmhostid
-rwxrwxr-x. 1 agreen agreen 1065316 Sep 17 2016 lmutil
-rwxrwxr-x. 1 agreen agreen 1203448 Apr 23 2016 mgcld
-rwxrwxr-x. 1 agreen agreen 6662892 Apr 23 2016 snpslmd
-rwxrwxr-x. 1 agreen agreen 402432 Apr 23 2016 syncad
-rwxrwxr-x. 1 agreen agreen 1448880 Apr 23 2016 synplctyd
</code></pre>
<p>It doesn’t say how to run them. The next step is get a license code for your MAC …which one, this laptop has both USB ethernet and WLAN… it turned out to be better than the Lattice license stuff though, since it worked with a network device named something other than eth%d.</p>
<a name="Broken-survey-on-Microsemi-site"></a>
<h3>Broken survey on Microsemi site</h3>
<p>But the site felt it was all going too smoothly, and wanted me to fill out a mandatory survey.</p>
<pre><code>Please take a moment to take a very brief 3 question survey. Upon completion, you can obtain or register for a license. Thank You. Click the button below.
</code></pre>
<p>An empty popup appeared when I clicked the button, Firefox reports</p>
<pre><code>Attempt to set a forbidden header was denied: Connection prototype.js:683
</code></pre>
<p>Okay… break out Chrome, go to microsemi.com and try to login… not known. soc.microsemi.com… not known. Exact same URL as in firefox:</p>
<p><img src="/2018/02/01/web-error.png" alt="Microsemi web error" /></p>
<p>Back up to http://soc.microsemi.com/Portal… it will let me log in.</p>
<p>In Chrome the same error in their JS for the survey is seen</p>
<pre><code> Refused to set unsafe header "Connection"
</code></pre>
<p>but it doesn’t regard it as fatal, so the survey appears. Lesson here, you can only get a license with Chrome due to bugs on their site.</p>
<p>In the survey, they ask what FPGA family you will use, but do not list PolarFire.</p>
<a name="Unclear-license-choices-2c--max-1-year"></a>
<h3>Unclear license choices, max 1 year</h3>
<p>After the buggy survey, the following mysterious choices appear</p>
<pre><code>851 - PolarFire Seminar Node Locked License for Windows PC
796 - Libero 60 days Evaluation Floating License for Windows or Linux Server
760 - Libero 60 days Evaluation Node Locked License for Windows
Free 1 Year Licenses
798 - Libero Silver 1 Year Floating License for Windows/Linux
799 - Libero Silver 1 Year Node-lock License for Windows
644 - Synopsys Synphony Model Compiler ME. Requires MATLAB/Simulink from Mathworks.
</code></pre>
<p>So… I could choose a 60-day license for a 1 year license? There is only one 1-year option for Linux mercifully. Then it wanted the MAC address… I pasted it in but it was truncated, the web designer had limited the input field to 12 chars, no matter that MACs are reported by all OSes with : delimiters.</p>
<p>Then it told me my license will arrive by email within 45 minutes. Well… the PDF says “Download the license file to the HOME directory of the user who will be installing and administering licensing for Libero” so I will try that when it comes.</p>
<a name="Motif-library-problem-2"></a>
<h3>Motif library problem 2</h3>
<p>Back to the motif thing… so it turns out the 6.3GB of stuff is all 32-bit. So I assumed its problem is simply I did not have the i686 motif package installed. However… from the PDF…</p>
<pre><code>Libero Linux tools expect to see the libXm.so.3 package
of the Motif Library. Different versions of OPEN
Motif could potentially install libXm.so.4 or others
that are not compatible with Libero.
</code></pre>
<p>…so they only support outdated libXm… on Fedora 27…</p>
<pre><code>$ rpm -ql motif.i686 | grep Xm
/usr/lib/libXm.so.4
/usr/lib/libXm.so.4.0.4
</code></pre>
<a name="Motif-library-workaround-and-libz-problem"></a>
<h3>Motif library workaround and libz problem</h3>
<p>Well, sometimes actually it doesn’t really care if it doesn’t use any api that changed with the SONAME bump. So to see what happened next:</p>
<pre><code> $ sudo ln -sf /usr/lib/libXm.so.4 /usr/lib/libXm.so.3
</code></pre>
<pre><code> $ Libero_PolarFire_v2.0/Libero/bin/libero
/projects/polarfire/Libero_PolarFire_v2.0/Libero/bin/libero_bin: /projects/polarfire/Libero_PolarFire_v2.0/Libero/lib/libz.so.1: no version information available (required by /usr/lib/libpng16.so.16)
/projects/polarfire/Libero_PolarFire_v2.0/Libero/bin/libero_bin: /projects/polarfire/Libero_PolarFire_v2.0/Libero/lib/libz.so.1: no version information available (required by /usr/lib/libpng16.so.16)
/projects/polarfire/Libero_PolarFire_v2.0/Libero/bin/libero_bin: relocation error: /usr/lib/libpng16.so.16: symbol inflateReset2, version ZLIB_1.2.3.4 not defined in file libz.so.1 with link time reference
</code></pre>
<p>So in other words Fedora libpng wants to confirm the version of libz it is being bound to is the one it expects, there’s no version info in the one from this app.</p>
<a name="libz-workaround"></a>
<h3>libz workaround</h3>
<p>I replaced the stock libz in the app with a symlink to F27 i686 libz, after backing it up</p>
<pre><code>$ cp Libero_PolarFire_v2.0/Libero/lib/libz.so.1 .
$ ln -sf /lib/libz.so.1 Libero_PolarFire_v2.0/Libero/lib/libz.so.1
</code></pre>
<p>That gets me</p>
<pre><code> $ Libero_PolarFire_v2.0/Libero/bin/libero
License Checkout Error: Microsemi License Error [-1,359]: Cannot locate license file.
Use LM_LICENSE_FILE to specify a different license file.
</code></pre>
<p>Well… that is expected. In the meanwhile the license file arrived.</p>
<a name="License-file-clash"></a>
<h3>License file clash</h3>
<p>So the license is an attachment in the email called “License.dat” that should go
in my home dir. But thanks to every FPGA vendor having been infected with this license madness by their upstream tools vendor, I already have a license.dat there for the Lattice tools.</p>
<p>I renamed it and saved the Microsemi one there. Just to make sure I tried the libero app again to see if it would pick it up, but it gives the same error. It turns out later it should actually go in ./flexlm according to the docs.</p>
<a name="Problems-with-documenation-outdated-for-license-process"></a>
<h3>Problems with documenation outdated for license process</h3>
<p>The PDF says “unzip License.dat”… but it is not a zip file, it is a text file in there actually as it arrived in the email. I guess at one point they sent them zipped (without a .zip suffix?)… let’s pretend that never happened.</p>
<p>Then you must edit this text file to put what seems to be license server coodinates at the top of it, localhost port 1702 it seems, and then really crazily must give paths to these “linux license daemon” binaries downloaded before, not one path but three, for “actlmgrd”, “mgcld”, and “snpslmd”.</p>
<p>Then it says “Replace the <XXXXXXXXXXXX> in the first line with the MAC-ID you have obtained from the ifconfig command”…. the License.dat file was delivered with my MAC already in there…</p>
<a name="Problems-with-binary-license-daemon-loader"></a>
<h3>Problems with binary license daemon loader</h3>
<p>Next is “/home/<caeadmin>/Linux_Licensing_Daemon/lmgrd -c /home/<caeadmin>/flexlm/
License.dat -log /tmp/lmgrd.log ”… okay well I did not save those to my home dir but never mind. When I try to run lmgrd, I get</p>
<pre><code>bash: ./Linux_Licensing_Daemon/lmgrd: No such file or directory
</code></pre>
<p>But it is really there.</p>
<pre><code> $ file Linux_Licensing_Daemon/lmgrd
Linux_Licensing_Daemon/lmgrd: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-lsb.so.3, for GNU/Linux 2.6.9, stripped
</code></pre>
<p>It has a nonstandard loader told in the ELF file… I forced a symlink to the normal 32-bit loader</p>
<pre><code> $ sudo ln -sf /lib/ld-linux.so.2 /lib/ld-lsb.so.3
</code></pre>
<a name="Misleading-error"></a>
<h3>Misleading error</h3>
<p>Now it starts and spews a load of things related to its purpose in life licensing things, which is another way to say, stopping things from working. Amongst them:</p>
<pre><code>02/01/2018 13:51:50 (snpslmd) Error: Incompatible vendor daemon found.The vendor daemon <actlmgrd> is not supported in <SCL_11.11.1> version.
Error: Please upgrade to the latest SCL version. Go to http://www.synopsys.com/licensing for more information.
02/01/2018 13:51:50 (snpslmd) Error: Incompatible vendor daemon found.The vendor daemon <mgcld> is not supported in <SCL_11.11.1> version.
Error: Please upgrade to the latest SCL version. Go to http://www.synopsys.com/licensing for more information.
</code></pre>
<p>What this error actually means is “you did not hack these env vars in the shell you are trying to start libero from”:</p>
<pre><code> $ export LM_LICENSE_FILE=1702@localhost:$LM_LICENSE_FILE
$ export SNPSLMD_LICENSE_FILE=1702@localhost:$SNPSLMD_LICENSE_FILE
</code></pre>
<a name="Success"></a>
<h3>Success</h3>
<p>Finally it can start up on Fedora 27, at least as far as its “hello” page</p>
<p><img src="/2018/02/01/libero.png" alt="Microsemi web error" /></p>
<a name="What-did-we-learn-this-time-3f-"></a>
<h2>What did we learn this time?</h2>
<p><img src="/2018/02/01/migimaru.png" alt="Migimaru" /></p>
<table>
<thead>
<tr>
<th>Cause</th>
<th>Hack</th>
</tr>
</thead>
<tbody>
<tr>
<td>openjdk incompatibility</td>
<td>Scroll installer license page with scrollbar, not mouse scrollwheel</td>
</tr>
<tr>
<td>outdated motif library</td>
<td>$ sudo ln -sf /usr/lib/libXm.so.4 /usr/lib/libXm.so.3</td>
</tr>
<tr>
<td>weirdo libz in app incompatible<br>with Fedora libpng</td>
<td>$ cp Libero_PolarFire_v2.0/Libero/lib/libz.so.1 .<br>$ ln -sf /lib/libz.so.1 Libero_PolarFire_v2.0/Libero/lib/libz.so.1</td>
</tr>
<tr>
<td>License daemons linked <br>against weirdo loader</td>
<td>$ sudo ln -sf /lib/ld-linux.so.2 /lib/ld-lsb.so.3</td>
</tr>
<tr>
<td>outdated docs</td>
<td>Ignore License.dat supposedly being a zip file</td>
</tr>
<tr>
<td>outdated docs</td>
<td>Ignore need to put MAC in License.dat</td>
</tr>
<tr>
<td>Buggy website script</td>
<td>Use Chrome to fetch license</td>
</tr>
</tbody>
</table>
MIPI I3Chttps://warmcat.com/2017/09/05/mipi-i3c.html2017-09-05T09:56:25+08:00<a name="MIPI-I3C"></a>
<h1>MIPI I3C</h1>
<p><img src="/2017/09/05/i23c.gif" alt="i23c" /></p>
<p>I3C is a new MIPI standard aimed at replacing I2C and SPI in low-cost devices. But right now, the actual standard is only available for MIPI members, not mere mortals. This article is a summary of what I could find out about it.</p>
<a name="I2C-and-SPI"></a>
<h2>I2C and SPI</h2>
<p>I won’t go into I2C or SPI since interested readers will already know those well, except to note:</p>
<ul>
<li><p>I2C is a slow 2-wire + GND, multidrop, Open Drain, interface, and SPI a faster point-to=point 3 / 4 wire + GND interface with separate select signals per peer.</p></li>
<li><p>I2C starts at 100kbps and extends to 3.4MHz; SPI can be clocked much faster with eg, serial flashes that can be clocked at 80MHz and above.</p></li>
</ul>
<p><img src="/2017/09/05/i2c.png" alt="i2c" /></p>
<a name="I3C-base-signalling"></a>
<h2>I3C base signalling</h2>
<ul>
<li><p>I3C does not have anything in common with other MIPI low-voltage signalling standards like PCIe, DSI, CSI, UniPRO, which use low-voltage differential signalling. The signalling is single-ended with HCMOS style levels.</p></li>
<li><p>I3C is similar to, and is mostly backwards-compatible with, I2C with some extra “modes”, the same way SD Card interface can negotiate faster modes.</p></li>
<li><p>Like I2C there is an explicit clock signal and a data line, but the clock signal is defined to be <strong>push-pull</strong> (< 0.3Vdd = 0, > 0.7Vdd = 1) in all new modes</p></li>
<li><p>Therefore I3C does not support the intrinsic level-shifting capability of old I2C (circuitry at different Vdd could communicate because the two signal levels were “nearly 0V” or “something else”)</p></li>
<li><p>Pullups are no longer needed if you are not dealing with legacy I2C-only slave devices on the bus</p></li>
<li><p>I2C “Clock stretching”, where a slave could delay the clock until it was ready, has been eliminated</p></li>
<li><p>Otherwise I3C is close enough to I2C if you reduce the clock accordingly, you can use it with most existing legacy I2C devices.</p></li>
<li><p>The “Base” mode, I3C SDR, allows up to 12.5MHz with either OC or Push-Pull data drive, it’s very close to “fast I2C”.</p></li>
<li><p>Slaves may detect an idle bus and issue a synchronous interrupt on the bus with contention resolution. The slave transitions Data to do it, only the master provides the clock.</p></li>
</ul>
<a name="I3C-backwards-compatibility-with-I2C"></a>
<h2>I3C backwards compatibility with I2C</h2>
<p>I2C devices have been around since the early 1980s, and many mature slave devices are simply not going to be updated for I3C. So “enough” backwards compatibility in the master was critical. Few I2C devices use clock stretching and the loss of it in I3C can be worked around by slowing the clock from the master side if needed.</p>
<p>The level shifting is more of a loss for some cases, but even then I2C’s Open Drain scheme was always difficult to buffer, needing specialized chips, eg <a href="http://www.ti.com/product/PCA9306">PAC9306</a>. If both clock and data are push-pull now, generic level-shifting is possible. Although the latency will increase, potentially impacting the max clock rate.</p>
<p>The “faster cock modes” even of old I2C rely on I2C devices that can’t handle them rendering them invisible using their “glitch protection”… if a signal doesn’t maintain its state for, eg, 50ns it is ignored by the slave.</p>
<p>But what about newer slave devices that come out with I3C interfaces, when all the masters available today are I2C-only? It’s unclear if I3C slaves can be used by I2C masters… the signalling is compatible but there is not enough info about it to understand if, eg, I3C slaves can present a oldstyle I2C address by default.</p>
<p>10-bit address mode slaves are not supported. But these all have 7-bit fallback modes anyway.</p>
<a name="I3C-addressing"></a>
<h2>I3C addressing</h2>
<ul>
<li><p>I3C devices get standardized “characteristics registers” and a 48-bit “mac address”</p></li>
<li><p>Devices are “discovered” dynamically</p></li>
<li><p>Devices may join and leave at any time, and are discovered and dynamically addressed accordingly</p></li>
<li><p>There’s still a 7-bit address, but this operates dynamically as devices are discovered like the USB device address; fixed I2C 7-bit addresses are eliminated unless it’s a legacy device.</p></li>
</ul>
<a name="I3C-defined-commands-and-error-management"></a>
<h2>I3C defined commands and error management</h2>
<ul>
<li><p>DSI-style “command mode” with standardized commands and payloads, may be broadcast to all peers or addressed to a single peer - this is the basis of the discovery protocol</p></li>
<li><p>Both master and slave can report standardized “errors”, including parity and CRC</p></li>
</ul>
<a name="I3C-SDR"></a>
<h2>I3C SDR</h2>
<ul>
<li><p>SDR / Single Data Rate signalling is I2C signalling with push-pull clock and optional push-pull or Open Drain (for backwards compatability with I2C) data</p></li>
<li><p>it’s specified for up to 12.5MHz clock, although how acheiveable that is depends on the capacitance of the loads, how many loads, wiring length and allowable error rate.</p></li>
</ul>
<a name="I3C-HDR"></a>
<h2>I3C HDR</h2>
<p>The I3C bus can enter and exit sigher speed modes using a special SDA-only sequence of repeated STOP / STARTs (modern I2C devices have something similar where you can send them a special command that transitions to 3.4MHz, but the I3C scheme is baked into the protocol). I3C devices get a broadcast message saying we’re going to transition to high speed (EnterHDRx) and then the SDA sequence. I2C slaves see STARTs and STOPs and then ignore the fast clocking on SCK.</p>
<p>The EnterHDRx message can tell the I3C devices to enter either “DDR” mode or “TSL/TSP” modes… this is extensible since new modes can be sent on EnterHDRx and devices that don’t understand them just ignore until they see the SDA-only exit pattern.</p>
<a name="I3C-HDR-DDR"></a>
<h3>I3C HDR DDR</h3>
<p><img src="/2017/09/05/i3c-ddr.png" alt="i3c-ddr" /></p>
<p>In DDR mode, a new SDA is samples on each SCL clock edge, and since START and STOP are then no longer possible, a structured packet, which includes parity, is used to communicate.</p>
<p>When finished using the high speed mode, the SDA-only exit sequence is sent and the bus reverts to SDR mode.</p>
<p>Normally, DDR requires a PLL, but I2C has traditionally been parseable using asynchronous logic alone. It’s not clear from the available data if something else about the sequencing mandates a PLL to recover multiples of the clock rate, and then implies a predictable, phase-locked clock on the wire, making bit-banging impossible.</p>
<a name="I3C-HDR-TSL--2f--TSP--28-ternary-29-"></a>
<h3>I3C HDR TSL / TSP (ternary)</h3>
<p>As well as DDR above, where SCL is still basically controlling sampling of SDA, I3C supports ternary coding modes… in this mode, SDA and SCL are treated as two data bits which are sampled together. However <strong>between</strong> each ternary symbol, at least one bit must change state.</p>
<p><img src="/2017/09/05/i3c-tsx.png" alt="i3c-tsx" /></p>
<p>Another way of looking at it is from each 2-bit symbol, there are three possible next symbols… 00 may become 01, 10 or 11 and still meet the requirement for at least one bit to change state.</p>
<p>We have to wait for more info from the standard to describe exactly how, but two 2-bit symbols together can encode three bits, with the overhead used for the clocking.</p>
<p>TSL stands for Ternary Symbol Legacy as opposed to TSP for “Pure”, so I guess “Legacy” is less efficient in an attempt to avoid bus states that old I2C might misinterpret. But it’s also not explained yet that I could find. This efficency loss is likely the reason the max bit rate is “over 30Mbps” in the marketing claims, not “37.5Mbps”</p>
<table>
<thead>
<tr>
<th>mode</th>
<th>rate</th>
<th>max throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>SDR</td>
<td>1 bit / CLK</td>
<td>12.5Mbps</td>
</tr>
<tr>
<td>DDR</td>
<td>2 bit / CLK</td>
<td>25Mbps</td>
</tr>
<tr>
<td>TSx</td>
<td>3 bit / “CLK”</td>
<td>37.5Mbps (“over 30Mbps”)</td>
</tr>
</tbody>
</table>
<p>(There is no “clock” in ternary mode, but it transfers 3 bits in 2 “edges” on either / both SCK + SDA).</p>
<a name="I3C-Synchronous-bus-reservation"></a>
<h2>I3C Synchronous bus reservation</h2>
<p>The specification also defines USB-type bus reservation, so you can define that you will have bandwidth at a certain time to talk to a certain device.</p>
<a name="What-did-we-learn-this-time-3f-"></a>
<h1>What did we learn this time?</h1>
<p><img src="/2017/09/05/kaiman.png" alt="kaiman" /></p>
<ul>
<li><p>I3C is a more-or-less backwards-compatible enhancement to I2C</p></li>
<li><p>It offers DDR and Ternary modes to get >30Mbps from a ~12.5MHz clock, without using LVDS</p></li>
<li><p>Because it dispenses with I2C pullups, simply by mandating push-pull for newer modes, it can claim huge power savings over I2C. But those savings don’t exist compared to, eg, SPI.</p></li>
<li><p>It claims to replace SPI, but it provides no backwards-compatibility and the max clock rate is slower than SPI provides. SPI remains a competitor to I3C. Still, it will likely push SPI out of any usage where I3C will do.</p></li>
<li><p>I3C takes ideas from DSI (command packets) and USB (bandwidth reservation and discoverability / hotplug), but the bandwidth reservation is just a controller feature not baked in.</p></li>
<li><p>AFAICT you can still bitbang it, although we have to see about DDR and if it gets implemented with a PLL in the slave devices</p></li>
<li><p>I2C backwards-compatibility lacks support for 10-bit addresses, inherent level-shifting, and the slave device holding the master clock down. Otherwise it encompasses I2C compatibility, athough the whole bus will slow back down to 100 / 400kHz when you talk to them.</p></li>
<li><p>There aren’t any I3C devices you can buy off the shelf at digikey or mouser yet. But it’s a safe bet I3C will replace I2C-only interface units in SoCs shortly.</p></li>
<li><p>Slave device vendors will stick with I2C where I3C features are not needed; I3C masters can integrate them on an I3C bus. Slave devices that suffered from the slow max clock rate of I2C will likely move to I3C. Slave devices that only used SPI for the better clock rate, but the speed they needed is now available with I3C, will also likely switch to I3C.</p></li>
</ul>
SLA 3D printinghttps://warmcat.com/2017/08/18/sla-3d-printing.html2017-08-21T15:14:25+08:00<p><img src="./sws.jpg" alt="SLA enclosure" /></p>
<a name="Why-SLA-3D-printing-3f-"></a>
<h1>Why SLA 3D printing?</h1>
<p>Making enclosures for electronic devices has always been a big problem. Back in the late 90s I made and sold nonvolatile memory emulators, I ended up having custom mild steel cases fabricated, stove enamelled, and then silk screened. It was successful, mainly due to having a very good partners to do the metalwork and finishing, but this was a huge layer of difficulty in designing, fabricating, managing and stocking the physical cases, when my main focus was the electronics inside.</p>
<p>One way or another most electronics will need some kind of enclosure: if you don’t want to go down the metalwork road, for low volume the choices used to be grim… you can buy standard injected moulded plastic cases and machine them, but no matter what you do, they have “that look” that is the opposite of looking like a product.</p>
<p>So earlier in the year I started to study 3D printing. This article will discuss the limitations, advantages and gotchas that I found and my results using SLA.</p>
<a name="Basics-of-extrusion-and-SLA-printers"></a>
<h1>Basics of extrusion and SLA printers</h1>
<a name="Extrusion"></a>
<h2>Extrusion</h2>
<p>Most of the 3D printers on sale use filament extrusion, basically they move a heated head around in three axes, and deposit 0.4mm balls of molten plastic as they go to make the structure.</p>
<p>That sums them up: <strong>they place 0.4mm balls of molten plastic</strong>. You can’t have feaures involving features less than 0.4mm, no matter that the specs may show 25 micron (0.025mm) stepper resolution (usually only in some axes… beware). If you are making a continuous surface, you can precisely make some ramps or curves by placing your 0.4mm blobs at sub-0.4mm resolution and for some cases that works well. But it cannot define detail, repeatability in terms of layer noise is not always good and the commonly-used PLA material is about as brittle as spun sugar for thin walls. The material is hot and somewhat viscous and tacky when it is laid.</p>
<p>Blobs cannot be placed closer than the blob pitch of 0.4mm. So you can see the high resolutions quoted are pretty much unrelated to the ability to place material to form the print and are largely meaningless. Although the head accuracy may be high, the placed hot plastic blobs do not always act in a repeatable way depending on what other blobs are around them or not.</p>
<p>Heat as part of the process makes repeatability difficult unless the environment is completely controlled for, eg, if an air conditioner starts running partway through, or there are drafts changing the temperature at the heater head or filament heat over the hours the print is being done.</p>
<p>They are also slow, complex prints covering large volumes of plastic may take days and increasing the placement precision makes them even slower.</p>
<a name="SLA"></a>
<h2>SLA</h2>
<p>SLA (originally, <a href="https://en.wikipedia.org/wiki/Stereolithography">StereoLithography</a>) is not a new idea, it dates back long enough for the original patents (from 1984) to have expired. So there is a new breed of SLA printers around from a few manufacturers. Their scheme is completely different to the extrusion printers… there is no extrusion, no heating and the platform moves only in one (Z) axis. Resolution is high, and repeatability / layer noise is excellent.</p>
<p>The trick is there is a vat of opaque photosensitive resin with a transparent film at the bottom. The resin has the consistency somewhat more viscous than olive oil, and has strong surface tension.</p>
<p>Underneath the vat, pointing up into the transparent film and the resin reservoir above it, is a light source at a wavelength which causes the resin to solidify.</p>
<p><img src="./sla.png" alt="SLA printer structure" /></p>
<p>In my SLA printer, that light source is ultraviolet and goes via a 1080p resolution DLP (micromirror) projector on to the transparent vat base. So it effectively projects a 1920 x 1080 px black-and-white image into the build area for each layer.</p>
<p><img src="./seq.png" alt="SLA process" /></p>
<p>The DLP projects one global image with pixels illuminated with ultraviolet light where the plastic is meant to be and pixels with no light where there should be no plastic. Then the platform moves up, peeling the solidified resin off the special film on the base, and letting the remaining resin flood into covering the base again, before moving down again 50 microns higher than before. The process repeats with the DLP showing the right places for solid resin for each layer, each layer solidifying on the last, until the print is completed.</p>
<p>The original SLA printers moved a single laser around to modulate which parts of the resin solidified. Printers like my SLA printer which uses a DLP to illuminate a whole layer at once are also often called “DLP printers”.</p>
<a name="Ancilliaries-for-extrustion-and-SLA"></a>
<h1>Ancilliaries for extrustion and SLA</h1>
<a name="Considerations-for-extruder-bases"></a>
<h2>Considerations for extruder bases</h2>
<p>An issue common to all 3D printers is the print must be based on one plane; there is a base somewhere and the print adheres to that during fabrication so it stays in one place. Extrusion printers will “lose” the print if the platform the print is based on is not level with regards to where the print head is trying to lay the next layer of gloop. This failure mode is known as “spaghetti”, since the extruder ends up spewing plastic into thin air on the non-level side, making exotic spun sugar creations without any contact to the base and unrelated to what you were trying to fabricate.</p>
<p>For that reason, extrusion printers increasingly offer base levelling or auto-levelling features.</p>
<p>Another extrusion issue is making the extruded plastic adhere to the base in the first layer. Otherwise the print will move around and lead to spaghetti again. This has led to “heated bases” which keep the plastic a little tacky and more willing to adhere.</p>
<a name="SLA-bases"></a>
<h2>SLA bases</h2>
<p>By contrast SLA printers do not need a completely level base or any heating. Instead they print the first few layers on to the base as a “raft” of solid resin, on which the actual print attaches. For printing the raft, they use a longer exposure time which penetrates further to solidify more resin. This evens out any non-levelness in the base by reaching further through the resin to the base.</p>
<a name="Printing-supports"></a>
<h2>Printing supports</h2>
<p>Neither SLA nor extrusion printers can print stuff in midair. The design being printed must make allowance for gravity and add in thin support rods coming up (or down for SLA) from the base the “midair” stuff can attach to. Software can automate much of the support placement, but as we will discuss, not all of it, and manual touchup adding supports bearing in mind the weight of parts of the structure is necessary.</p>
<p>In the SLA case, the base + any partial print is constantly being asked to peel off newly exposed layers from the tank film, without distorting itself. So supports must be added back to the base with this repeated force in mind.</p>
<p>There are also considerations about how to separate the print from the base after it’s done that mean supports are effectively always required all over the area of the print facing the base, as separators enabling print detach.</p>
<p>These printing supports on the exterior of the print affect the print finish unfortunately.</p>
<p>Dual-head extrusion printers can put a water-soluable filament on one head, and use it as mass support structure. After the print completes, you can wash out the supports. The results from this are good, but printing with dual head is unbearably slow for nontrivial designs: prints can take days.</p>
<p>SLA has a certain amount of overhang it can print without needing support, the default is 45 degrees. However there are other realities about the mass being printed and the characteristic repeated stress where the already-printed parts of the design are between the layer somewhat adhered to the tank film, and the base trying to lift it away, that means supports will be required back to the base where there is significant mass.</p>
<a name="Slicing"></a>
<h2>Slicing</h2>
<p>After the 3D design has had the supports added, it is postprocessed into “slices”, these are the layers that will actually be printed. At 50 micron z resolution, there are 20 layers per mm, so eg a design that is 5cm high will be looking at 1000 layers.</p>
<p>With extrusion printing, how long a layer takes depends on how much there is to be printed on it; it has to spend time both laying the extruded plastic and moving between areas with plastic. But with SLA, the time taken to print a layer is constant (roughly 2.5cm / hr with my printer), it’s just how long any resin needs to be exposed to the curing light and how long the base takes to move up to peel off the layer and down again positioned for the next one. That is interesting because it means it costs nothing timewise to pack the print area with copies of the print with SLA.</p>
<a name="Software-stack-for-design-and-postprocess"></a>
<h1>Software stack for design and postprocess</h1>
<p>There are FOSS pieces available for the whole software stack except the printer driving part; at least, I used the (free as in beer) Linux printer driver part and didn’t try to replace it yet. You might find or know better solutions, I am just listing what works for me at the moment.</p>
<a name="OpenSCAD"></a>
<h3>OpenSCAD</h3>
<p><a href="http://www.openscad.org/">OpenSCAD</a> is basically CAD from a programmer’s perspective. Your CAD design is textual source code which is rendered interactively. It has quirks, and some operations cause it to take minutes or hours to render small designs, but as a coder I was immediately at home with it.</p>
<p>It’s in most distros and is easy to get started with. The basic idea is you place and perform operations on 2D extruded or naturally 3D shapes using source code.</p>
<p><img src="./openscad.png" alt="x" /></p>
<p>After things look good in the dynamic preview render (which is sometimes very slow), you can export to STL, a standardized 3D format everything else can eat. However this export process can literally take hours.</p>
<p>In summary OpenSCAD is very cool for coders to use, but once you get past a certain point, it is simply too slow.</p>
<a name="FreeCAD"></a>
<h3>FreeCAD</h3>
<p><a href="https://www.freecadweb.org/">FreeCAD</a> looks like the next best bet. This is more of a traditional CAD system. It was possible to import 3D models from OpenSCAD, and it has an OpenSCAD “mode” that might be workable.</p>
<a name="Meshlab-and-Fast-STL-Viewer"></a>
<h3>Meshlab and Fast STL Viewer</h3>
<p>These are not necessary but allow you to combine and view STL, if you are discussing the design with someone it’s very convenient to be able to be able to rapidly point things out before it exists; OpenSCAD itself is only really able to do this for simpler designs. But you must sit through the STL export in OpenSCAD before these can do anything. Meshlab is FOSS available in most distros and Fast STL Viewer is in the Android “Play” Store.</p>
<p><img src="./meshlab.png" alt="x" /></p>
<a name="Flashprint"></a>
<h3>Flashprint</h3>
<p>Once you exported to STL, actually <a href="http://www.flashforge.com/support-center/flashprint-support/">flashprint</a> the “free as in beer” binary-only Linux version of the printer manufacturer’s app / driver handles everything else.</p>
<p>It imports one or more STL from your CAD application, and lets you arrange them on the build plate volume. Once arranged, you can automate support addition and manually add supports (which you will need to do).</p>
<p>Once the supports are in a good shape, you can slice the print…</p>
<p><img src="./flashprint-slicer.png" alt="x" /></p>
<p>…and send it over WIFI to the printer (it’s not documented, but after running nmap against the printer, it listens on port 3333). It also tracks print progress while printing.</p>
<p>The site provides a .deb, however you can unpack this and copy the directory structure inside the archive to / (after checking it won’t overwrite anything unexpected…). The guts of it are in /usr/share with one file in /etc. Currently it has one compatability problem with modern Linux: a required icon overlay (that has no equivalent shortcuts) only appears in Xorg, not Wayland.</p>
<p>Despite this isn’t ideal having a non-FOSS (if native Linux) bit at the end, the interesting thing about this stack is <strong>you can get all the way to the end of your design action for $0</strong>, without having a printer. So if you are curious about any of this, I encourage you to install the related tools and give it a try.</p>
<a name="Part-2:-Flashforge-DLP-printer"></a>
<h1>Part 2: Flashforge DLP printer</h1>
<a name="Unsuccessful-experiment-with-extrusion"></a>
<h1>Unsuccessful experiment with extrusion</h1>
<p>While trying to calibrate my understanding of the technology options, I had an early version of my enclosure printed in PLA using a service here in Taipei, via <a href="https://3dhubs.com">3D Hubs</a>.</p>
<p>The print took a couple of days (longer than promised) and the quality was pretty bad.</p>
<p><img src="./pla3.jpg" alt="x" /></p>
<p>There were many support structures inside the cavity of the enclosure, and they took hours to clean out. Even after taking care, the enclosure snapped in several places when it met the force needed to remove the supports. There was visble Z axis noise in the layering that also weakened it.</p>
<p><img src="./pla2.jpg" alt="x" /></p>
<p>The design was in two halves that should meet up, but the quality of the PLA print made that impossible. And clearly it was a systemic, structual problem with the process.</p>
<p><img src="./pla1.jpg" alt="x" /></p>
<p>I expected to have to go through a learning curve, but looking at the results I had already learned that neither that technology nor the “3D as a service” model was going to produce anything useful; I needed better technology and needed it on hand.</p>
<p>I discussed the situation with the owner of the local “maker” place who did the print via 3D Hubs, he recommended that SLA might make more sense for my small and light enclosure. He showed me a sample used for microfluidics which was much more detailed and delicate.</p>
<a name="Looking-for-a-Flashforge--22-hunter-22-"></a>
<h1>Looking for a Flashforge “hunter”</h1>
<p>After looking around, only <a href="http://www.flashforge.com/">Flashforge</a> had native Linux support, so I went to see a retailer in Guanghua, Taipei’s electronics district.</p>
<p>The lady there had no English, but she informed me by typing it on a calculator it would cost NTD170,000, which is around GBP4300. The US RRP was USD3450 / GBP2700… I don’t know if that was just “foreigner tax” or some general feeling that imported goods should be expensive. Googling, I could buy the same thing in the UK for GBP2900, even accounting for the 15% import duty and GBP100 shipping it was clearly a loss of GBP1000 to buy it from the guys in Guanghua. So I had it imported, which took a week. Parsimony FTW.</p>
<a name="What-is-the-printer-like-3f-"></a>
<h2>What is the printer like?</h2>
<p>It’s about the same size as a laser printer or so, a little taller. There is a smoked perspex hood to reduce the amount of UV light escaping from it and to reduce the amount of ambient light solidifying the resin accidentally.</p>
<p><img src="./hunter.jpg" alt="x" /></p>
<p><img src="./hunter-open.jpg" alt="x" /></p>
<a name="What-is-the-resin-like-3f-"></a>
<h2>What is the resin like?</h2>
<p>The resin and the printer with resin in is not suitable for being placed in your office or work-room for long-term cohabitation. The resin smells and will give you a headache. Mine ended up living in a bathroom which to my relief turned out to be a “wife-compatible” solution.</p>
<p>Printed works are not so objectionable after a day or two and just feel like plastic parts.</p>
<p>Since the vat is exposed at the top, I put some flat cardboard with a weight on top on it when not in use to stop the resin gradually hardening.</p>
<a name="What-are-the-prints-like-3f-"></a>
<h2>What are the prints like?</h2>
<p>The prints are a little bit soft when they come out of the printer. That’s not a bad thing because it is easy to snap off the supports, since they also have a conical taper where they meet the model. Over the course of a day or two in normal light the prints become more solid and brittle.</p>
<a name="How-messy-is-the-process-3f-"></a>
<h2>How messy is the process?</h2>
<p>It can be messy. You can clean off resin that’s not completely solidified using ethanol.</p>
<a name="SLA-printing-consumables"></a>
<h2>SLA printing consumables</h2>
<p>The Flashforge SLA printer comes with 500ml of grey resin included… but starting from scratch this isn’t going to last you long for reasons we will get to.</p>
<p>Although this is one of two regular consumables the SLA printer needs, obviously the resin is the major consumable since it actually becomes the end product. There is some proportion of wasted resin, which goes on the supports and the base raft each time, but the proportion wasted depends on exactly what you are printing. For my design it seems to be broadly around 30%. If you are making individual prototypes, that is not too painful.</p>
<p>However if you want to buy official resin from the Flashforge retailer at Guanghua, your pain will be magnified, since the lady with the calculator wanted NTD7000 (USD230) for another 500ml official resin.</p>
<p>Luckily there are aftermarket generic resin suppliers in the form of <a href="http://www.funtodo.net/">Funtodo</a>. These are represented in Taiwan by <a href="http://3dmart.com.tw">3D Mart</a> and have an office with stocks a few miles from me elsewhere in New Taipei City. Their pricing is NTD2200 (USD72) for 1000ml of white resin, ie, 1/6th of the price of the lady at Guanghua per ml. Their stuff works fine, although be aware there are settings in the print dialogue to select which resin you are using and adapt exposure time etc.</p>
<a name="The-other-SLA-consumable"></a>
<h2>The other SLA consumable</h2>
<p>As a nod to the fact you will probably need it quicker than you expect, Flashforge also ship a spare of a second consumable, which is the vat film. The vat film has a special composition (FEP) that is highly transparent but the solidified resin can easily peel off from.</p>
<p>I didn’t ask the Gunaghua lady how much that was since it was before I had a printer and I didn’t understand I would need it. Since you can <a href="https://flexvat.com/products/fep-film">buy it online</a> for USD7.50 a sheet, I guess no point asking them since it won’t be competitive. I bought 5 feet of it which should see me through the foreseeable future.</p>
<p>The vat film is held tensioned in place by a dozen bolts, and changing the film is quite fiddly, the process takes an hour or two. The USD7.50 sheets do not have holes cut for the bolts to go through, but you can use the old sheet as a pattern and cut them carefully with a scalpel. The FEP film has a thin plastic adhered to both sides you peel off right before fitting it, since it must not be contaminated by fingerprints.</p>
<a name="Dangers-of-failed-prints"></a>
<h1>Dangers of failed prints</h1>
<p>Unless you are printing some canned ready-to-print model, failures are likely in your first few prints. The automatic placement of support structures is not smart enough to realize when it needs to deviate from just dropping supports on a regular grid driven by where parts of your design exceed 45 degree overhang.</p>
<p>Failed prints are potentially very bad news with SLA.</p>
<a name="Typical-failure-modes"></a>
<h2>Typical failure modes</h2>
<p>The usual, direct reason prints fail is the support structures are inadequate for the force of the repeated layer peeling and the weight of parts of the design. But even adequate support structures may fail for some other root cause, causing considerable headscratching.</p>
<p>When the support structures fail, it’s not pretty. At a minimum, the unanchored part of the print “flaps around” as the rest of the layers merge into it.</p>
<p><img src="./out2.jpg" alt="x" /></p>
<p>More typically, bits of the print literally fall off, and therein lies the danger.</p>
<p>The build platform will be continuously coming within 50 microns of the vat film by design, if there are solid pieces in the resin solution they will be pressed hard into the film by the metal build plate, and damage it. When damaged, the resin will leak through the hole and cause a region of the transparent plate to become somewhat opaque. The DLP light can then no longer get through to the resin on top of the hole, creating a “weak region” for printing.</p>
<p>This damage is not easy to guess or see through the opaque resin. Here is a damaged FEP film I eventally removed when I ran out of alternative ideas. The glass bottom of the tank had a whitish halo of solidified resin that had leaked under the film and become sandwiched between the film and glass, ruining any printing in that area.</p>
<p><img src="./broken-film.jpg" alt="x" /></p>
<p>What makes it more problematic is you cannot even see that a print is failing until the print is more than an inch high, since the work is hidden by the vat and opaque resin until then.</p>
<p>I disassembled the tank and cleaned the glass with Ethanol and it recovered very well. With the new FEP film in, and adding many more supports in the areas that tended to fail (edges, areas with a lot of mass, corners) the success rate is now pretty reasonable.</p>
<a name="Resin-separation"></a>
<h2>Resin separation</h2>
<p>If left still for a day or two, the dye and the (clear-ish) resin will separate like oil and water. The dye will sink to the bottom and the resin float on top like the oil. You can’t print like that since the light will only reach the dye at the bottom.</p>
<p><img src="./vat-sep.jpg" alt="x" /></p>
<p>It’s necessary to remove the resin tank and hold it at an angle so the surface tension will clear the FEP film, after after doing that for each side rock it gently from side to side until the dye mixes again.</p>
<p><img src="./vat-mix.jpg" alt="x" /></p>
<p>You can also get an idea of the strong surface tension of the resin from the above.</p>
<a name="Safety-tips"></a>
<h2>Safety tips</h2>
<ul>
<li><p>Wear disposable gloves when postprocessing the print</p></li>
<li><p>Wear safety glasses when washing the print in alcohol and removing the supports… otherwise it will splash in your eyes</p></li>
</ul>
<a name="Apparently-unresolvable-issues-with-SLA"></a>
<h1>Apparently unresolvable issues with SLA</h1>
<a name="Not-commutative-in-the-Z-axis"></a>
<h2>Not commutative in the Z axis</h2>
<p>Printing the two halves of my concave model at 0 degrees and at 180 degrees gets different results in terms of linearity of the edges and how orthogonal the resulting form is.</p>
<p>It I print it with the convex side facing the base, with support structures against the outside of the model, results are much better than if I print it with the concave side facing the base and support structures entering the concave side.</p>
<p>This is a serious problem for me because it means I can only have reasonably straight edges that should mate when I print in the orientation that leaves dents in the external faces of the enclosure from the supports.</p>
<a name="Resin-shrinkage"></a>
<h2>Resin shrinkage</h2>
<p>The resin hardens over a few days and shrinks a few percent in volume while becoming more brittle. Basically the results from SLA printing cannot bear any mechanical stress, the material will crack and tear.</p>
<p>That is OK for some purposes but even a low stress mounting hole will just crack off. It puts a limit on what you can use the models for.</p>
<a name="Conclusion"></a>
<h1>Conclusion</h1>
<p>My SLA printer is able to turn out models matching the CAD with good detail, but only under conditions like picking a side that will look good. The variation in things like mating edges is quite low but again you may have to orient the design on the base in one direction or another to make sure about that.</p>
<p>It’s certainly true that being able to fit the enclosures to PCBs and hold it in your hand is something SLA can do, and if it is to prove the CAD or use it to show it to others for manufacture with a different process that is a huge advantage. However the resin material is not hardy enough to ship at all. So it is effectively ony useful as reducing risk before processing the design another way.</p>
<p>I hope this has been interesting and maybe helpful for people considering 3D printing. If you have any comments or suggestions for improvements please drop me a line at <a href="mailto:andy@warmcat.com">andy@warmcat.com</a>.</p>
Implementing ssh and scp serving with libwebsocketshttps://warmcat.com/2017/08/13/impementing-ssh-and-scp-with-lws.html2017-08-13T17:40:41+08:00<a name="Implementing-ssh-and-scp-serving-with-libwebsockets"></a>
<h1>Implementing ssh and scp serving with libwebsockets</h1>
<a name="The-many-layers-of-ssh"></a>
<h2>The many layers of ssh</h2>
<p><img src="./flow.png" style="float:left"> Recently I wrote a protocol plugin for libwebsockets that implemented an ssh
server: this is cross-platform but in the first case runs on ESP32. I wasn’t
expecting it to be simple, but since I only planned to implement the best
crypto rather than all options, it seemed like it should be manageable.</p>
<p>It did prove manageable, but getting something able to come up on a vitrual pty and act like a
normal ssh session required a pretty hairy amount of implementation, even
though I could rely on BSD-licensed bits of mbedtls and OpenSSH for crypto primitive pieces.</p>
<p>Although I generally could have described how SSH works before embarking on
this, the gritty details are quite interesting and involve a lot of stuff I had no idea about. And as a special bonus I’ll describe the scp protocol, which it turns out I really had no idea about how it actually works.</p>
<a name="SSH-Formal-Definition"></a>
<h2>SSH Formal Definition</h2>
<p>SSH is described in a bunch of RFCs, these are the main ones</p>
<table>
<thead>
<tr>
<th>RFC</th>
<th>Scope</th>
<th>URL</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFC4250</td>
<td>SSH Assigned Numbers</td>
<td><a href="https://www.ietf.org/rfc/rfc4250.txt">https://www.ietf.org/rfc/rfc4250.txt</a></td>
</tr>
<tr>
<td>RFC4251</td>
<td>Architetcure</td>
<td><a href="https://www.ietf.org/rfc/rfc4251.txt">https://www.ietf.org/rfc/rfc4251.txt</a></td>
</tr>
<tr>
<td>RFC4252</td>
<td>Authentication</td>
<td><a href="https://www.ietf.org/rfc/rfc4252.txt">https://www.ietf.org/rfc/rfc4252.txt</a></td>
</tr>
<tr>
<td>RFC4253</td>
<td>Transport Layer</td>
<td><a href="https://www.ietf.org/rfc/rfc4253.txt">https://www.ietf.org/rfc/rfc4253.txt</a></td>
</tr>
<tr>
<td>RFC4254</td>
<td>Connection Protocol</td>
<td><a href="https://www.ietf.org/rfc/rfc4254.txt">https://www.ietf.org/rfc/rfc4254.txt</a></td>
</tr>
<tr>
<td>curve25519-sha256@libssh.org</td>
<td> Key exchange protocol</td>
<td><a href="https://git.libssh.org/projects/libssh.git/tree/doc/curve25519-sha256@libssh.org.txt">https://git.libssh.org/projects/libssh.git/tree/doc/curve25519-sha256@libssh.org.txt</a> – references <a href="https://tools.ietf.org/html/rfc5656">https://tools.ietf.org/html/rfc5656</a></td>
</tr>
</tbody>
</table>
<p>The protocol is very well designed it seems to me, and it was interesting that
stuff like transmit windows, and connection muxing found in SSH has much later
appeared in HTTP/2.</p>
<p>Despite it is largely well-documented, for some ambiguities I had to study the openssh sources and / or watch what the openssh client wanted to do to figure out the whole flow.</p>
<a name="Overview"></a>
<h2>Overview</h2>
<p>The negotiation proceeds through specific stages</p>
<ul>
<li>1: Version exchange (unencrypted)</li>
<li>2: Crypto suite negotiation (unencrypted)</li>
<li>3: Key exchange</li>
<li>4: User authentication</li>
<li>5: Channel requests</li>
</ul>
<a name="Step-1:-Version-exchange"></a>
<h2>Step 1: Version exchange</h2>
<p>The first move on each side is to send a short string confirming that each side
can talk a version of SSH that the peer can communicate with. The string must
begin with <code>SSH-2.0</code>, afterwards is an opaque application / version string with
no special format. For example on the OpenSSH server on my machine, it’s</p>
<pre><code>SSH-2.0-OpenSSH_7.5
</code></pre>
<p>These strings are kept by each side along with a lot of other information sent and received later in the negotiation for use in a ‘shared secret’ hash used later.</p>
<a name="Step-2:-Crypto-suite-Negotiation"></a>
<h2>Step 2: Crypto suite Negotiation</h2>
<p>The next move is both sides issue lists of what crypto they support and are
willing to use. The packet is like this, <strong>unencrypted</strong>:</p>
<table>
<thead>
<tr>
<th>Type</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>byte</td>
<td>SSH_MSG_KEXINIT</td>
</tr>
<tr>
<td>byte[16]</td>
<td>cookie (random bytes)</td>
</tr>
<tr>
<td>name-list</td>
<td>kex_algorithms</td>
</tr>
<tr>
<td>name-list</td>
<td>server_host_key_algorithms</td>
</tr>
<tr>
<td>name-list</td>
<td>encryption_algorithms_client_to_server</td>
</tr>
<tr>
<td>name-list</td>
<td>encryption_algorithms_server_to_client</td>
</tr>
<tr>
<td>name-list</td>
<td>mac_algorithms_client_to_server</td>
</tr>
<tr>
<td>name-list</td>
<td>mac_algorithms_server_to_client</td>
</tr>
<tr>
<td>name-list</td>
<td>compression_algorithms_client_to_server</td>
</tr>
<tr>
<td>name-list</td>
<td>compression_algorithms_server_to_client</td>
</tr>
<tr>
<td>name-list</td>
<td>languages_client_to_server</td>
</tr>
<tr>
<td>name-list</td>
<td>languages_server_to_client</td>
</tr>
<tr>
<td>boolean</td>
<td>first_kex_packet_follows</td>
</tr>
<tr>
<td>uint32</td>
<td>0 (reserved for future extension)</td>
</tr>
</tbody>
</table>
<p>The crypto algorithms are well-known strings, like
<code>curve25519-sha256@libssh.org</code>. They are defined to be listed in order of
preference by each side.</p>
<p>Because it’s unencrypted, <strong>it’s possible for an intermediary to mess with this part of the negotiation</strong>. The “man-in-the-middle” can’t downgrade the netgotiation to crypto that both sides are not already willing to use, but it can downgrade the negotation to the crappiest crypto each side is willing to use, by removing or corrupting the better options from this packet.</p>
<blockquote><p>So there is a lesson here already, <strong>disable crappy crypto in all your ssh servers</strong>. For openssh, you can specify which KEX, ciphers and MACs are allowed, by editing <code>/etc/ssh/sshd_config</code> to include this:</p></blockquote>
<pre><code> KexAlgorithms curve25519-sha256@libssh.org
Ciphers chacha20-poly1305@openssh.com
MACs hmac-sha2-512
</code></pre>
<blockquote><p>For safety, when doing this to a remote server, leave a second logged-in ssh session to the server active when you edit the config file and restart the ssh server, so you can recover if there are problems. Existing ssh sessions do not get closed when sshd restarts or dies.</p></blockquote>
<p>In my ssh server implementation, only one set of crypto is supported:</p>
<table>
<thead>
<tr>
<th>Function</th>
<th>Crypto</th>
</tr>
</thead>
<tbody>
<tr>
<td>KEX</td>
<td><code>curve25519-sha256@libssh.org</code></td>
</tr>
<tr>
<td>Server host key</td>
<td><code>ssh-rsa</code></td>
</tr>
<tr>
<td>Encryption</td>
<td><code>chacha20-poly1305@openssh.com</code></td>
</tr>
<tr>
<td>MAC</td>
<td><code>(implicit in chacha20)</code></td>
</tr>
<tr>
<td>Compression</td>
<td><code>none</code></td>
</tr>
</tbody>
</table>
<p>These are all currently considered safe choices, with suitable key sizes (I support 4Kbit RSA keys).</p>
<p>Both sides issue their lists, and each side chooses the first matching crypto
string from both sides (or fails the negotiation if no matches for
everything).</p>
<p>Assuming there is some common ground for each part, then each side must send an <code>SSH_MSG_NEWKEYS</code> packet to mark the point that communication in that direction is switching to the selected
cipher so we can move on to the Key Exchange part. From the point each side sends <code>SSH_MSG_NEWKEYS</code> then communication is encrypted.</p>
<a name="Step-3:-Key-Exchange--28-KEX-29-"></a>
<h2>Step 3: Key Exchange (KEX)</h2>
<p>Once the sides have explained their capabilities and arrived at a mutually
usable suite of crypto, the next move is to set up some “ephemeral keys”
with which to perform the rest of the crypto key exchange.</p>
<p>The choice of KEX is intimately connected to historic doubts about the “NIST curves” required for use with RFC5656, the “offical” Elliptic Curve Crypto KEX method. The affected curves are any with the name “nist” in them, and the affected KEX protocol names begin “ecdsa-”. These are widely considered to be unsafe.</p>
<p>In response to what became generally assumed about parts of RFC5656 being unsafe due to unexplained magic in the ECC computation + selection effectively backdooring ssh communication using it, an alternative ECC standard roughly following RFC5656 but using a different curve, eliminating the unexplained magic and slightly streamlining the implementation was very rapidy produced in <code>curve25519-sha256@libssh.org</code> KEX protocol, which is widely considered a safe choice.</p>
<p>It’s this KEX method my implementation supports. The flow is:</p>
<ul>
<li><p>Both sides generate their own ephemeral 256-bit public and private curve25519 key.</p></li>
<li><p>The client sends <code>SSH_MSG_KEX_ECDH_INIT</code> along with his ephemeral public key.</p></li>
<li><p>The server computes a “shared secret” using ECC</p></li>
<li><p>The server generates a hash from the concatenation of various elements available to both sides from the earlier negotiation, and signs the hash with the “shared secret”</p></li>
<li><p>The server returns <code>SSH_MSG_KEX_ECDH_REPLY</code> along with the server’s ephemeral public key and its non-ephemeral ‘server key’</p></li>
<li><p>The client also computes the “shared secret” and generates the same concatenated set of elements and the server did and hashes it: this is used to validate the server’s signature on the hash. If all is well the client accepts the connection.</p></li>
</ul>
<p>The actual information in the data hashed by both sides to form the “exchange hash” consists of:</p>
<pre><code> string V_C, client's identification string (CR and LF excluded)
string V_S, server's identification string (CR and LF excluded)
string I_C, payload of the client's SSH_MSG_KEXINIT
string I_S, payload of the server's SSH_MSG_KEXINIT
string K_S, server's public host key
string Q_C, client's ephemeral public key octet string
string Q_S, server's ephemeral public key octet string
mpint K, shared secret
</code></pre>
<p>After both sides accept the KEX, both sides:</p>
<ul>
<li>have the peer’s public key</li>
<li>know the peer has the private key matching the public key they sent</li>
<li>have the exchange hash (which hashed the “shared secret” that was never explicitly sent)</li>
</ul>
<p>The client is also able to apply checks to the server’s public key, eg, to see if it matches the key it was given last time it connected to the same hostname.</p>
<p>Further hashes concatenating on the exchange hash is then used by both sides to initialize the actual crypto algorithm, which is different from <code>curve25519-sha256@libssh.org</code> used to get us this far. In our case, we only support <code>chacha20-poly1305@openssh.com</code>. The list of initializations using hashes on the exchange hash is</p>
<ul>
<li><p><strong>Initial IV client to server</strong>: HASH(K || H || “A” || session_id)
<em>(Here K is encoded as mpint and “A” as byte and session_id as raw
data. “A” means the single character A, ASCII 65)</em>.</p></li>
<li><p><strong>Initial IV server to client</strong>: HASH(K || H || “B” || session_id)</p></li>
<li><p><strong>Encryption key client to server</strong>: HASH(K || H || “C” || session_id)</p></li>
<li><p><strong>Encryption key server to client</strong>: HASH(K || H || “D” || session_id)</p></li>
<li><p><strong>Integrity key client to server</strong>: HASH(K || H || “E” || session_id)</p></li>
<li><p><strong>Integrity key server to client</strong>: HASH(K || H || “F” || session_id)</p></li>
</ul>
<p>At this point, the negotiated crypto algorithm is initialized, the KEX algorithm is done and the KEX instantiation can be destroyed.</p>
<p>Finally, after all this effort, each side sends a <code>SSH_MSG_NEWKEYS</code> indicating to the peer that the sender is implementing the crypto algorithm and keys from now on, ie, is transitioning to an encrypted channel.</p>
<a name="Step-4:-User-authentication"></a>
<h2>Step 4: User authentication</h2>
<p>The KEX got us to the point we can talk in an encrypted channel. But it did nothing about authenticating the client to the server. A malicious client can get this far, same as any browser will set up a TLS channel before authentication with the website.</p>
<p>The next step is the client sends <code>SSH_MSG_USERAUTH_REQUEST</code>… this contains a <code>method name</code> field which may be <code>publickey</code>, <code>password</code>, <code>hostbased</code> or <code>none</code>. In my implementation only <code>publickey</code> is supported, and only the key algorithm <code>ssh-rsa</code>… these are the most common keys in use today and key size may be 4096 bits. It also indicates the user name on the server it is trying to authenticate with the client key, and which service the client wants from the server.</p>
<p>“ssh-rsa” and the client’s public key is sent along with the packet. If the server sees nothing wrong so far, he will respond with SSH_MSG_USERAUTH_PK_OK and echo back the public key type and the public key blob itself… it does this to make it unambiguous as to which SSH_MSG_USERAUTH_REQUEST it is responding to, since the client may pipeline several.</p>
<p>The client then collates a bunch of concatenated data which both sides have access to</p>
<pre><code> string session identifier
byte SSH_MSG_USERAUTH_REQUEST
string user name
string service name
string "publickey"
boolean TRUE
string public key algorithm name
string public key to be used for authentication
</code></pre>
<p>and signs the hash of it with its private RSA key. Lastly it sends the <code>SSH_MSG_USERAUTH_REQUEST</code> again, this time with the computed signature attached.</p>
<p>The server can use the client’s public RSA key to confirm it has the matching private key and the signature checks out. If so, it responds with <code>SSH_MSG_USERAUTH_SUCCESS</code> and the authentication is completed.</p>
<p>At this point the server may send <code>SSH_MSG_USERAUTH_BANNER</code> with some “motd” type text. Logging into my ESP32 device over ssh gives this banner:</p>
<pre><code>|\---/| Secure Wireless Serial Interface: ID 05D769
| o_o | SSH Terminal Server
\_^_/ Copyright (C) 2017 Crash Barrier Ltd
</code></pre>
<a name="Step-5:-Channel-requests"></a>
<h2>Step 5: Channel requests</h2>
<p>Now the link is encrypted and the client using the link has been authenticated, the client is allowed to ask for a wider range of things from the server.</p>
<p>ssh is a very flexible protocol, but the most typical request is for a “terminal” via an ssh client. First the client must acquire a “channel”, using <code>SSH_MSG_CHANNEL_OPEN</code>. In ssh, one authenticated link may have multiple channels of different types operating within it with unambiguous multiplexing due to each channel having a channel index number assigned at open time. The channels also have a “tx window” budget associated with them, they are given a certain amount they can send when they are opened, and the remote peer must allow them more using an explicit <code>SSH_MSG_CHANNEL_WINDOW_ADJUST</code> message telling them how much more they may transmit.</p>
<blockquote><p>Both the multiplexing and tx window concept turned up many years later in the definition of HTTP/2. This is notable because in a not very alternate universe we would not have a web based on TLS + HTTP but we could have had HTTP/2 features many years earlier with a web built on ssh protocol.</p></blockquote>
<p>The “type” of the channel decides on the meaning of the data sent on the channel; different types of channel send completely different protocol data inside. Defined channel requests are:</p>
<ul>
<li><strong>pty-req</strong>: pseudo-tty</li>
<li><strong>x11-req</strong>: x11 tunnel</li>
<li><strong>env</strong>: environment variables</li>
<li><strong>shell</strong>: spawn a server shell with stdin/out/err wired to ssh</li>
<li><strong>exec</strong>: execute server process with stdin/out/err wired to ssh</li>
<li><strong>subsystem</strong>: run a defined subsystem, eg, sftp</li>
<li><strong>window-change</strong>: size of the client window has changed</li>
<li><strong>xon-xoff</strong>: soft flow control</li>
<li><strong>signal</strong>: send a signal to server, eg, SIGINT</li>
<li><strong>exit-status</strong>: retreive exit status of previous “exec” command</li>
<li><strong>exit-signal</strong>: find out if previous “exec” command died on a signal</li>
</ul>
<p>For ssh being used as a terminal, the client must ask for a <code>pty-req</code> type of channel, where pty is a Pseudo-TtY or logical terminal emulation channel. When established, this channel passes a complex terminal emulation protocol.</p>
<p>The ssh client also then passes <code>env</code> requests to configure a few environment variables, and then a <code>shell</code> request to wire the ssh channel up to a server shell.</p>
<p>In my case I handle these commands but the ssh connection is actually backed by a UART. So there is no actual shell spawned, and the environment vars are ignored. Instead the UART ringbuffers are wired up to the ssh channel and the remote ssh client sends and receives on that instead.</p>
<a name="scp"></a>
<h2>scp</h2>
<p>After all this was working for ssh client connections, I also wanted to support simple file transfers over scp, since that is the most “natural” way to communicate with the remote side for sending files.</p>
<p>There’s very little documentation of how that is supposed to work.</p>
<p>Running <code>scp abc root@mydevice:/def</code> opens a channel and requests to <code>exec</code> on it <code>scp -t /def</code>.</p>
<p>On a real server, it would run <code>scp</code>, but the <code>-t</code> flag is not documented. On ESP32, there is no shell or scp process that can run. After accepting the request and setting a flag on the channel to say it is in “scp mode”, scp sent us some textual “headers” down the channel to set up the transfer; looking at the openssh scp sources I found the format is (mmmm is an octal file mode like 0755)</p>
<ul>
<li>“Dmmmm 0 dirname” - start of copy directory level</li>
<li>“E” - end of copy directory level</li>
<li>“Cmmmm length filename” - start copy file</li>
<li>“Tmtime 0 atime 0” - modification and access times for file</li>
</ul>
<p>For a simple <code>scp abc root@mydevice:/def</code>, scp sends only the C command, a terminating <code>\x0a</code> and then the payload of the file <code>abc</code>. Then it sends <code>SSH_MSG_CHANNEL_EOF</code> to which we respond with <code>SSH_MSG_CHANNEL_CLOSE</code> to end the connection cleanly.</p>
<p>The implementation is complicated a bit by having to deal with RX flow control due to the small UART ringbuffers, but lws helps a lot here.</p>
<a name="What-did-we-learn-this-time"></a>
<h2>What did we learn this time</h2>
<p><img src="./nikaido.png" align="center"></p>
<ul>
<li><p>SSH protocol was way ahead of its time</p></li>
<li><p>SSH crypto and functionality instead of http + ssl tunnel would have gotten us http/2 from the start</p></li>
<li><p>It’s possible to implement selected “best of breed” crypto suite elements in a very constrained device</p></li>
<li><p>Libwebsockets + bytewise state machines can implement everything needed (in my case this also includes in-browser JS terminal backed by wss)</p></li>
<li><p>Implementing this as a lws protocol handler means it can easily coexist in a single event loop; on very small targets like ESP32 this means it can be implemented painlessly.</p></li>
<li><p>Although lws already supports “natural” (for developers and users) protocols like TLS + https and wss (secure websockets), this is the first time to my knowledge something as “natural” as <code>ssh</code> has been implemented on a constrained target like ESP32. Using a wireless device via <code>ssh</code> and <code>scp</code> from your terminal using normal ssh keys and with the same level of security expected from a server ssh connection is very convenient.</p></li>
</ul>
Mailman and captchahttps://warmcat.com/2017/08/12/mailman-captcha.html2017-08-12T07:01:05+02:00<a name="Mailman-and-captcha"></a>
<h1>Mailman and captcha</h1>
<p>The libwebsockets.org mailing list signup page (at <a href="https://libwebsockets.org/mailman/listinfo/libwebsockets">https://libwebsockets.org/mailman/listinfo/libwebsockets</a> ) has been targeted by a botnet trying to use automated signups via google.</p>
<p>Nothing made it to the list, but the mail server is filled with doomed attempts to verify against the generated emails, eg</p>
<pre><code>S'sqoonart+yjjzxqku@outlook.com'
S'sqoonart+wziudy@outlook.com'
S'sqoonart+clkqaj@outlook.com'
S'sqoonart+xxohsyye@outlook.com'
S'sqoonart+uvubjuwp@outlook.com'
S'sqoonart+ezpikxd@outlook.com'
S'sqoonart+eaiqnjs@outlook.com'
S'sqoonart+mvpboqse@outlook.com'
S'sqoonart+tiai@outlook.com'
S'sqoonart+ieamtf@outlook.com'
S'sqoonart+prdpbmp@outlook.com'
</code></pre>
<p>looking closer they’re being generated by signups from a wide range of IPs POSTing the mailman signup form with nonsense names and passwords.</p>
<p>This causes our server to make a lot of bad requests to the mail hosts (in good faith). So it seems we should enable a captcha on the signup page.</p>
<a name="No-captcha-support-in-mailman"></a>
<h2>No captcha support in mailman</h2>
<p>Mailman does not support captcha, despite botnets are scanning the net looking for mailman signup pages to spam. I guess Hyperkitty has taken over dev interest, but I am okay with mailman. Googling around found this page</p>
<p><a href="https://www.dragonsreach.it/2014/05/03/adding-recaptcha-support-to-mailman/">https://www.dragonsreach.it/2014/05/03/adding-recaptcha-support-to-mailman/</a></p>
<p>Where the author has already suffered this problem in 2014 and he provides a somewhat corrupted patch and info on how to patch mailman… this is a bit painful since we are patching distro python that is subject to being overwritten by package upgrades. But since mailman itself doesn’t want to support captcha it is the only choice.</p>
<p>The rest of this post is about how to actually do that successfully, based on Andrea Veri’s original blog post.</p>
<a name="Broken-package-for-python-2d-recaptcha-2d-client"></a>
<h2>Broken package for python-recaptcha-client</h2>
<p>The first problem following those instructions is the dependent package python-recaptcha-client that it relies on cannot be recognized as something you can include from Python. In fact as pointed out at <a href="http://mailman.9.n7.nabble.com/Mailman-2-1-23-and-reCAPTCHA-td46468.html#a46474">http://mailman.9.n7.nabble.com/Mailman-2-1-23-and-reCAPTCHA-td46468.html#a46474</a> you must perform:</p>
<pre><code> $ sudo touch /usr/lib/python2.7/site-packages/recaptcha/__init__.py
</code></pre>
<p>to provide the missing indication that the content is actually a python package at all; the Fedora package has the <code>__init.py__</code> in a subdir, which causes python to ignore it.</p>
<a name="Broken-patch"></a>
<h2>Broken patch</h2>
<p>The next problem is the patch has been mangled…. quoted items in angle-brackets have been snipped. This isn’t just an html rendering issue: they are missing in the page source on Andrea’s site. The fixed patch is here</p>
<pre><code>--- listinfo.py 2017-08-12 04:12:29.953487616 +0200
+++ listinfo.py 2017-08-12 04:53:00.071483277 +0200
@@ -23,6 +23,7 @@
import os
import cgi
import time
+import sys
from Mailman import mm_cfg
from Mailman import Utils
@@ -32,6 +33,8 @@
from Mailman.htmlformat import *
from Mailman.Logging.Syslog import syslog
+from recaptcha.client import captcha
+
# Set up i18n
_ = i18n._
i18n.set_language(mm_cfg.DEFAULT_SERVER_LANGUAGE)
@@ -227,6 +230,7 @@
replacements['<mm-displang-box>'] = displang
replacements['<mm-lang-form-start>'] = mlist.FormatFormStart('listinfo')
replacements['<mm-fullname-box>'] = mlist.FormatBox('fullname', size=30)
+ replacements['<mm-recaptcha-javascript>'] = captcha.displayhtml(mm_cfg.RECAPTCHA_PUBLIC_KEY, use_ssl=True)
# Do the expansion.
doc.AddItem(mlist.ParseTags('listinfo.html', replacements, lang))
--- subscribe.py 2017-08-12 04:14:44.143487376 +0200
+++ subscribe.py 2017-08-12 04:45:08.608484119 +0200
@@ -32,6 +32,9 @@
from Mailman.UserDesc import UserDesc
from Mailman.htmlformat import *
from Mailman.Logging.Syslog import syslog
+from recaptcha.client import captcha
+
+
SLASH = '/'
ERRORSEP = '\n\n<p>'
@@ -122,6 +125,16 @@
os.environ.get('HTTP_X_FORWARDED_FOR',
os.environ.get('REMOTE_ADDR',
'unidentified origin')))
+
+ captcha_response = captcha.submit(
+ cgidata.getvalue('recaptcha_challenge_field', ""),
+ cgidata.getvalue('recaptcha_response_field', ""),
+ mm_cfg.RECAPTCHA_PRIVATE_KEY,
+ remote,
+ )
+ if not captcha_response.is_valid:
+ results.append(_('Invalid captcha'))
+
# Are we checking the hidden data?
if mm_cfg.SUBSCRIBE_FORM_SECRET:
now = int(time.time())
</code></pre>
<p>Change dir to <code>/usr/lib/mailman/Mailman/Cgi</code> (for Fedora) before applying the patch.</p>
<p>This patch is correct against mailman-2.1.21.</p>
<p>You also need to modify the html and add your captcha keys to env vars in the <code>mm_cfg.py</code> as pointed out in the original article.</p>
<a name="Troubleshooting"></a>
<h2>Troubleshooting</h2>
<p>If problems are coming, at least on Fedora, although mailman puts out some scary “low level error” html, it also puts the details / backtrace down <code>/var/log/mailman/error/</code>.</p>
<a name="Maintaining"></a>
<h2>Maintaining</h2>
<p>Once it’s working, this is actually very fragile against updated mailman package from Fedora. In the abscence of a better idea I disabled updating mailman by creating an <code>/etc/yum.conf</code> containing</p>
<pre><code>exclude=mailman
</code></pre>
<p>and keep an eye out when updating for mailman getting listed as excluded.</p>
Let's play, "What's my ESD rating"2016-11-21T00:00:00+08:00https://warmcat.com/2016/11/21/Let's-play-guess-my-ESD-protection<h2 id="let-39-s-play-quot-what-39-s-my-esd-rating-quot">Let's play, "What's my ESD rating"</h2>
<p>So, differential transceivers are widely used in harsh environments, over long distances, to carry data with good integrity, and because of the distances implying outside cable runs, extremely good ESD protection.</p>
<p>For example, MCP2542 (<a href="http://ww1.microchip.com/downloads/en/DeviceDoc/20005514A.pdf">http://ww1.microchip.com/downloads/en/DeviceDoc/20005514A.pdf</a>) features 13kV protection on the business end of it, and Linear Technology has LTC 2875 (<a href="http://cds.linear.com/docs/en/datasheet/2875f.pdf">http://cds.linear.com/docs/en/datasheet/2875f.pdf</a>) with no less than 25kV ESD protection.</p>
<p>By contrast yer average chip has at least 2kV or more commonly 4kV protection, just so it can make it through ESD succeptability testing without failing.</p>
<p>Chips involved in long external wiring need to especially take care about ESD, since the chance they are near to extreme events like lightning strikes increases.</p>
<h2 id="enter-isl83490">Enter ISL83490</h2>
<p>The cheapest differential transceivers you can get on Digikey are from Intersil, a semiconductor company with a long history I otherwise have a lot of respect for.</p>
<p>I was surprised to see no ESD rating for that chip in the datasheet: that is HIGHLY unusual...</p>
<p><a href="http://www.intersil.com/content/dam/Intersil/documents/isl8/isl83483-85-88-90-91.pdf">http://www.intersil.com/content/dam/Intersil/documents/isl8/isl83483-85-88-90-91.pdf</a></p>
<p>...since everyone has to concern themselves with ESD protection on external cable runs. But look, if you go to the product page, it completely fails to discuss the ESD protection level provided by the chip</p>
<p><a href="http://www.intersil.com/en/products/interface/serial-interface/rs-485-rs-422/ISL83490.html#overview">www.intersil.com/en/products/interface/serial-interface/rs-485-rs-422/ISL83490.html#overview</a></p>
<p><img src="https://warmcat.com/isl83490-key-features.png" alt="https://warmcat.com/isl83490-key-features.png"></p>
<p>Nor is it mentioned in the "description"</p>
<p><img src="https://warmcat.com/isl83490-parametrics.png" alt="https://warmcat.com/isl83490-parametrics.png"></p>
<p>but has a ton of generic ESD protection app notes!</p>
<p><img src="https://warmcat.com/isl83490-app-notes.png" alt="https://warmcat.com/isl83490-app-notes.png"></p>
<p>In fact EVERY app note except arguably one is related to ESD protection.</p>
<p>So, I registered with Intersil support and asked them: what is the ESD protection level of that chip? I got an answer quickly, but it was</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">ESD In Volt HBM 1000
ESD In Volt CDM 1000
ESD In Volt MM 50
</code></pre></div>
<p>These are barely (possibly, "not even") in the 'sufficient self defense against guy on a carpet' league, let alone "run RS485 between buildings on a dark and stormy night" or even "pass immunity testing" leagues.</p>
<h2 id="crappier-than-74hc">Crappier than 74HC</h2>
<p>I mean if you buy a venerable 74HC chip from TI you get double that</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">ESD occurs when a buildup of static charges on one surface arcs through a dielectric to another surface that has the opposite
charge. If this discharge current flows through an integrated circuit, the high currents can damage delicate devices on the chi
p.
The protection circuits designed by Texas Instruments (TI) operate by shunting any excessive current safely around the
sensitive circuitry on the chip. This provides ESD immunity on inputs and outputs that exceeds MIL-STD-883B, Method 3015
requirements for ESD protection (2000 V, 1500
Ω, and 100 pF).
</code></pre></div>
<p><a href="http://www.ti.com/lit/an/scla007a/scla007a.pdf">www.ti.com/lit/an/scla007a/scla007a.pdf</a></p>
<p>Those chips are not even designed for dangerous long cable runs as this one is.</p>
<p>What's the point of hiding that figure in the datasheet and making me ask, and then this passive-aggressive "ESD ESD ESD" in the App Notes? This is a terrible way to deal with it.</p>
<h2 id="trade-offs">Trade-offs</h2>
<p>It's not in itself illegitimate to have a very cheap, but relatively unprotected chip that needs external transzorbs to go out by itself at night. If the designer is properly informed, and given golden external protection circuit examples, he can reasonably and safely decide that is a good way to spend his money.</p>
<p>But if you hide the miserable truth in this fog of ambiguous app notes, well, THAT SUCKS DONKEY BALLS.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Datasheets are magical documents where what is not said, can be more important that what is said</p></li>
<li><p>You must use your skill and judgment to assess what was not said in the datasheet, or you will quite quickly be in shit creek</p></li>
</ul>
ICE5 FPGA Where did all the LUTs go?2016-11-06T00:00:00+08:00https://warmcat.com/2016/11/06/ICE5-FPGA-where-did-all-the-LUTS-go<h2 id="ice5-fpga-utilization">ICE5 FPGA utilization</h2>
<p>The biggest ICE5UL is "4K" LUTs - but actually that is already highly misleading since</p>
<table><thead>
<tr>
<th>Naming</th>
<th>Actual LUTS</th>
</tr>
</thead><tbody>
<tr>
<td>"1K"</td>
<td>1100</td>
</tr>
<tr>
<td>"2K"</td>
<td>2048</td>
</tr>
<tr>
<td>"4K"</td>
<td><strong>3520</strong></td>
</tr>
</tbody></table>
<p>eh where did the other 500 LUTs go?</p>
<p>So far my HDL going in it has a headline count of 2269 LUT after the Synopsys synthesis tools ran on it.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Input Design Statistics
Number of LUTs : 2269
</code></pre></div>
<p>So there's plenty of room yet, right?</p>
<h3 id="design-legalization">Design Legalization</h3>
<p>Structurally the ICE5 groups his LUTs in 8s in "PLB"s, Programmable Logic Blocks. Intra-PLB LUTS have privileged access to each other, especially in terms of fast carry generation.</p>
<p><img src="https://warmcat.com/ice5-plb.png" alt="https://warmcat.com/ice5-plb.png"></p>
<p>So this structure creates a discontiguity between LUTs that are inside the same PLB and outside. If you instantiate a counter for example, the LUTs doing the bits for the counter need to go in the same PLB to get access to the hardware carry acceleration.</p>
<p>After the generic synthesis from the Synopsys tools - which gives the 2269 LUT figure, there is a "Design Legalization" step. The generic synthesis does not know about PLBs and other special device restrictions, and the "Legalization" step does, so it modifies the synthesized RTL to meet the restrictions by placing the LUTs in PLBs and adding "feedthru LUTs".</p>
<p>No documentation I could find either in the package or in Google discusses the exact set of rules being done during legalization. But the results are really expensive, for my design anyway</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Design Legalization Statistics
Number of feedthru LUTs inserted to legalize input of DFFs : 508
Number of feedthru LUTs inserted for LUTs driving multiple DFFs : 0
Number of LUTs replicated for LUTs driving multiple DFFs : 5
Number of feedthru LUTs inserted to legalize output of CARRYs : 93
Number of feedthru LUTs inserted to legalize global signals : 1
Number of feedthru CARRYs inserted to legalize input of CARRYs : 3
Number of inserted LUTs to Legalize IOs with PIN_TYPE= 01xxxx : 16
Number of inserted LUTs to Legalize IOs with PIN_TYPE= 10xxxx : 20
Number of inserted LUTs to Legalize IOs with PIN_TYPE= 11xxxx : 0
Total LUTs inserted : 643
Total CARRYs inserted : 3
</code></pre></div>
<p>What exactly made the "input of DFFs" 'illegal'? ("Number of feedthru LUTs inserted to legalize output of CARRYs" I can maybe understand, it looks like if you generate any CARRYs, you have to ripple it to the end of the PLB where it has an external carry output. But that's only 93 LUTs.)</p>
<p>Basically it added 646 LUTs to the design, bloating it by 28%. The key post-legalization headline figures become</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Device Utilization Summary
LogicCells : 2945/3520
PLBs : 411/440
</code></pre></div>
<p>If more PLBs are initially required in the design than are available, the tools try to reallocate LUTs to reduce the number of PLBs. There seem to be placement choices that make the PLB count somewhat fluid, even if it was initially overcommitted; presumably this is about merging two partially used PLBs into just one more completely used one. Eg initially (it calls itself an error, but it is not fatal yet):</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">E2070: Unable to fit the design into the selected device. Number of PLBs in design = 451, available in device = 440
</code></pre></div>
<p>Then after the placer step runs</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Device Utilization Summary
LogicCells : 3258/3520
PLBs : 437/440
</code></pre></div>
<p>and the Place and Route seems to complete OK. Even so, this makes the "new normal" living borrowed time... and the timing of the routed design takes a hit from not being able to place the PLBs as desired.</p>
<p>But if that doesn't work out, your design simply cannot be implemented...</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">E2070: Unable to fit the design into the selected device. Number of PLBs in design = 462, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 447, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 446, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 446, available in device = 440
E2070: Unable to fit the design into the selected device. Number of PLBs in design = 446, available in device = 440
E2071: Placement feasibility check failed
E2055: Error while doing placement of the design
</code></pre></div>
<p>... and the synthesis stops dead.</p>
<p>So before you need to worry about running out of LUTs, you need to worry about running out of PLBs.</p>
<h2 id="nondeterministic-packing">Nondeterministic packing</h2>
<p>Running out of resources is never going to be pleasant but it's particularly difficult to deal with on ICE5. Although there is a rough relationship between Synthesis LUT usage and post-legalization usage - more LUTs means more PLBs, it is unpredictable.</p>
<p>Here is a plot of my VHDL synthesis results showing LUT consumption vs PLB consumption, from the last few days</p>
<p><img src="https://warmcat.com/plot-lut-plb.png" alt="https://warmcat.com/plot-lut-plb.png"></p>
<p>You can see even if you are at around 2800 post-legalization LUT usage (80% occupied) you may be using 93% of the PLBs already. Conversely once you are right up against the PLB limit, you can stay there over a spread of 250 LUTs being used or not. So it is very difficult to know when you will actually fail.</p>
<p>Even more confusing there where two times after I started monitoring this that removing logic from the HDL, reducing the LUT usage, <strong>increased</strong> the corresponding PLB usage, presumably due to some quirk of the packer step.</p>
<p>Basically the PLB usage figure is telling you if even one of the LUTs in the 8-LUT PLB is in use. There's no way to see if you can use whatever spare LUTs are left in the PLB except try it out. Considering this state of affairs starts at a synthesis LUT usage of 2600, on a device that notionally has ~1000 additional free LUTs, you should <strong>significantly derate your expectation of how many LUTs are going to be usable</strong> before you enter a kind of twilight scavenging world every time you place and route your design.</p>
Visualizing bulk samples with a statistical summarizer2016-11-01T00:00:00+08:00https://warmcat.com/2016/11/01/Statistical-summarizer<h2 id="visualizing-bulk-samples">Visualizing bulk samples...</h2>
<p>Often there is more data available than is convenient to see at once. For example on a 'scope you can zoom in and out (a bit) of the capture buffer using the horizontal timebase, and move around in the capture buffer within its bounds.</p>
<p>In our case we will usually have a 160 x 128 window to view our data through, but we are generating 8 channels of samples per ms, and storing them in hyperram. Trying to see every sample as 1px, at 6fps we could show each sample once for one frame, and at our actual 30fps each sample would be on-screen for 5 frames, proceeding leftwards at 52px / frame. That should be fine if it's what you want to see.</p>
<p>Since we have room for 4 minutes of capture buffer, and we might care more about an overview of what is happening on a slower scale than every sample flashing by. So we have this same kind of problem of how to render as much useful detail as possible.</p>
<p>Oscilloscopes have had this problem for a while and there are some well-known ways of handling it...</p>
<h3 id="just-undersampling-the-buffer">Just undersampling the buffer</h3>
<p>Cheap and midrange digital scopes deal with this by just undersampling the signal, ie, to "zoom out" by a factor of ten they just show every tenth sample.</p>
<p><img src="https://warmcat.com/dalias.png" alt="https://warmcat.com/dalias.png"></p>
<p>This works, but there is a lot of messy aliasing and lost information; for example some subsamples show as 1px gaps in the blocks of data, but it is misleading: there are no gaps just the subsampling happened to hit a '0'. It's made worse because the scope either sets a pixel to full intensity or not.</p>
<h3 id="averaging">Averaging</h3>
<p>Even cheap digital scopes offer multi-trigger averaging, which can be useful if your signal capture is synchronized. However it only suits some kinds of problem and requires synchronous capture. For example the mean of an unlocked sine wave is zero: if this was the method for zooming in and out your signal would appear to attenuate and then completely disappear into a flat line as you zoomed out.</p>
<p>Basically it's a good way to reduce unpredictable noise (by attenuating low probability data), but it doesn't do the representation of the time series well. It basically reduces the data set by throwing out devations from the mean.</p>
<h3 id="digital-persistence">Digital Persistence</h3>
<p>You can use a digital 'scope's 'persistence' setting to retain some number of old samples in the display buffer, this helps to estimate the union of where the signal goes but again pixels are on or off, there's no information about how often a pixel is set, just the extent of what was ever set during the persistence period.</p>
<h3 id="analogue-persistence">Analogue Persistence</h3>
<p>Analogue scopes did a better job in respect of representing the available data as the phosphor on the back of the CRT glass automatically averaged the "hit rate" as intensity levels</p>
<p><img src="https://warmcat.com/ascope.jpg" alt="https://warmcat.com/ascope.jpg">
(From <a href="http://www.tapeheads.net/showthread.php?t=30477">http://www.tapeheads.net/showthread.php?t=30477</a>)</p>
<p>Expensive digital scopes simulate this digitally</p>
<p><img src="https://warmcat.com/dscope.jpg" alt="https://warmcat.com/dscope.jpg">
(From <a href="http://www.tek.com/datasheet/mixed-domain-oscilloscopes-3">http://www.tek.com/datasheet/mixed-domain-oscilloscopes-3</a>)</p>
<h3 id="statistical-intensity">Statistical intensity</h3>
<p>In short what is needed is when "compressing" multiple samples into one column of pixels for a time-compressed "zoom out" type view, each pixel in the column should represent the count of that row being lit in the raw samples.</p>
<p><img src="https://warmcat.com/summarizer1.png" alt="https://warmcat.com/summarizer1.png"></p>
<p>Notice that this is not related to reducing the input to a single number like a mean or even a median; the output may be discontiguous even with multiple regions active if that was what the incoming data set showed.</p>
<h3 id="dynamic-range-of-accumulator-vs-display">Dynamic range of accumulator vs display</h3>
<p>Every row with samples in the original sample set is represented by accumulating it in a row accumulator array. Once that is done though, the count in the row accumulator, which can exceed 4096 in the integer part, is scaled to be represented by effectively only 32 possible pixel intensities. That means we can't display probabilities below 3% due to the display hardware.</p>
<p>In the worst case, 4096:1 summarization (4s:1px) on 12-bit counts, counts 0 to 127 all map on to pixel intensity level 0, and are indistinguishable from no counts.</p>
<p>For summarization at 32:1 (30ms:1px) and below though, the pixel intensities have enough dynamic range to represent all 32 possible counts.</p>
<h3 id="vertical-antialiasing">Vertical antialiasing</h3>
<p>A different but related issue is around vertical antiliasing: there are only 128 vertical pixel rows in the display, and at full vertical scale, 0 - 30V, common, smaller voltages < 5V map into just the low 25 pixels (the scale can be changed dynamically to improve this, but still it won't be uncommon to see it on full scale especially if a second channel has higher voltages displayed simultaneously).</p>
<p>It means that under those worst-case conditions, each row on the display is representing around 250mV, which is very coarse. Another problem is that near the limit of the mapping to one line, eg, say, 1.248V, small noise will cause the next vertical line to get more or less hits as well.</p>
<p>The overall quality can get a big improvement by using fractional accumulation on both the current row and the "next" row.</p>
<p><img src="https://warmcat.com/dlm.png" alt="https://warmcat.com/dlm.png"></p>
<p>Instead of accumulating a '0' or '1' for a 'hit' on a row in the sample being compressed, the accumulator is changed to hold a 12.4 bit fraction. 15 "fractional points" are shared between the current and next line each time.</p>
<p>If the sample is exactly on the row value, all 15 points go on the current row accumulator and nothing on the next row's accumulator. If it's halfway, 8 and 7 are shared between the rows. In this way, more information about the fractional voltage can be retained as part of the summary accumulation and a more accurate result obtained.</p>
<h2 id="with-a-statistical-summarizer">... with a Statistical Summarizer</h2>
<p>The statistical summarizer I implemented to do this is a mode of the generic blitter I described before. This allows reuse of the Hyperram FIFOs and blitter descriptor queue conveniently and economically in terms of FPGA space.</p>
<p><img src="https://warmcat.com/sum-block.png" alt="https://warmcat.com/sum-block.png"></p>
<p>The summarizer mode, range of sample addresses, and where to draw the column of pixels, is fetched from the blitter descriptor.</p>
<p>The summarizer "compresses" sample information for up to 8 channels x 4K samples into a single column of pixels, using the techniques mentioned above.</p>
<p>All of the summarizer operations take place on contiguous samples taken from Hyperram, not at acquisition time. That means the same, fully detailed sample buffer can be rendered at different levels of detail after capture arbitrarily.</p>
<h4 id="step-1-acquire-raw-samples">Step 1: Acquire raw samples</h4>
<p>A dedicated hardware sequencer acquires the set of channel samples at the right time and DMA's them into a ringbuffer per-channel in the Hyperram. Altogether 10 channels are read and 8 stored in the hyperram.</p>
<h4 id="step-2-for-each-channel-iterate-through-the-samples-to-be-summarized">Step 2: For each channel, iterate through the samples to be summarized</h4>
<p>A sequencer in the blitter zeros down the row accumulator and composer SRAM, and then for each channel, iterates through the stored sample channel data from Hyperram, scaling it with a dedicated multiplier according to the current vertical display scale and accumulating it in bins corresponding to the vertical output.</p>
<p>This step does the fractional accumulation (12.4 bits) and spreads the result between two adjacent lines according to the fractional part.</p>
<h4 id="step-3-translate-the-hits-to-the-rgb-trace-colour-intensity">Step 3: Translate the hits to the RGB trace colour intensity</h4>
<p>The trace colour for the channel is then scaled according to the relative amount of hits accumulated in the bins. This also uses a dedicated multiplier.</p>
<h4 id="step-4-compose-all-the-trace-rgb-data">Step 4: Compose all the trace RGB data</h4>
<p>For each channel, the scaled RGB for the trace is composed into a 1-column SRAM until all the channels have been accounted for.</p>
<h4 id="step-5-blit-the-composed-rgb-pixels">Step 5: Blit the composed RGB pixels</h4>
<p>Then the composed pixels representing all the channel trace renderings is blitted into the framebuffer in Hyperram.</p>
<p>A 32-bit sum of all the samples is also kept and used both for computing a headline mean to be shown numerically, and for computing "area under the curve". Dividing this by the number of samples added gives the mean.</p>
<p>The ICE5 provides four dedicated 16x16 hardware multipliers, three are used here to</p>
<ul>
<li><p>scale the sample data (usually 16-bit) to a pixel row. This encapsulates the equivalent of a 'scope "vertical scale" control setting V/div. Actually the row mapping is fractional, 4 fractional bits are used to modulate the value accumulated in two adjacent rows</p></li>
<li><p>scale the row accumulator totals to a 6-bit "intensity level", considering the number of samples accumulated</p></li>
<li><p>scale the 5-6-5 channel trace colour intensity according to the "intensity level"</p></li>
</ul>
<p>The blitter can also render the summary pixels into Hyperram outside an overlay area. This allows accelerated remote rendering of the compressed data if the CPU picks it up and forwards it. Controlling the offset + sample scaling coefficients should allow arbitrary Y resolution rendering in chunks of 128 Y px.</p>
<p>The end result is quite rich in terms of visualizing the data... it's a cross between mean averaging and unsorted median averaging. Anywhere the signal spent a lot of time during the "compressed" period is clear and if it spent time in other places, that also should be clear corresponding to the relative time spent there.</p>
<p>Here is a closeup of a yellow trace at 128 samples (1/8th sec) per column "zoom"</p>
<p><img src="https://warmcat.com/iv-alias.png" alt="https://warmcat.com/iv-alias.png"></p>
<p>You can see the antialias working both in the stable period at the left (the lower row of pixels is a less intense yellow, indicating the relative position of the sample between the two rows), and in the "stepping" in the curved part.</p>
<h3 id="side-scrolling">Side scrolling</h3>
<p>For a 'scope style right <- left scrolling display, the overlay layer that holds the rendered summarized pixels is side-scrolled in hardware, with new columns of rendered pixels being placed just beyond the visible area while the original samples are updated on the LCD. At the start of each new frame update, the horizontal offset of the overlay is updated in hardware to include the latest rendered columns at the right.</p>
<p>Considering we may produce 1000 new columns/sec, but can only update the display at 30fps, we may produce 33 new columns of pixels in each 33ms while the previous frame is being sent to the LCD.</p>
<p>That means we need to allow at least an overlay logical size of (160 + 33 = 193) x 128... for practical reasons that was extended to 256 x 128. So we display part of the logical overlay, a 160 x 128 window which can be offset horizontally inside a 256 x 128 overlay framebuffer.</p>
<p><img src="https://warmcat.com/sum-hscroll.png" alt="https://warmcat.com/sum-hscroll.png"></p>
<p>In that way, we can draw new pixels at +160, +161 etc from the offset, while the LCD is being updated with +0..+159 from the offset. Each new frame, we update the offset to be at the last written pixel column - 159.</p>
<h3 id="hardware-wrapping-support">Hardware wrapping support</h3>
<p>Implicit in that scheme though, is that the offset framebuffer must wrap when we scan it out, and we must wrap when we draw ahead of it, from +0xff -> +0 from the current line start.</p>
<p>Consider when we scroll the viewport to actually start scanout at (start of virtual fb line) + 0xb0...</p>
<p><img src="https://warmcat.com/sum-hscroll2.png" alt="https://warmcat.com/sum-hscroll2.png"></p>
<p>If we just continue linearly reading from the framebuffer after +0xff, we will start to show the pixels from the NEXT line for the remainder of the current line, because +0x100 is the next line.</p>
<p>Therefore in this case halfway along the scanline, whenever we hit +0xff, we must force the FIFOs to restart reading from +0x0 from the start of the current line in the virtual framebuffer.</p>
<p>This is more complex than just reducing the number of address bits incremented, because the data is coming via a rate adaptation FIFO with his own SRAM buffering, which normally would want to read a while line ahead quickly in a few bursts with the Hyperram. So for this overlay channel, the FIFO transaction size is manipulated to be the 2's-compliment of the low 8 bits of the start address: this gives the behaviour of forcing a burst to end after an address 0x....FF while allowing long bursts subsequently.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>The best ways to "compress" a lot of data work by retaining as much of the input data as possible and finding a way to render it</p></li>
<li><p>If we decouple the summarizing + rendering action from the acquisition, we get a lot of flexibility to rerender with full sample detail arbitrarily and look at the same data from different ways. That also lets us rerender at full quality into offscreen buffers and display over the network.</p></li>
<li><p>Using fractional row mapping, and row accumulators to generate trace intensity information, gives us an extra dimension to drive with sample data. This makes a visible difference in how much information is being rendered and the perceived quality.</p></li>
<li><p>Although it's a significant effort, implementing the summarizer in hardware in the FPGA directly connected to the Hyperram allows us to provide both highly zoomed-out (ie, iterating through a large number of samples) and 1:1 (ie, bulk display updates on every frame) summaries for all channels in realtime responsively.</p></li>
</ul>
Generic MOSFET Power Switching2016-10-17T00:00:00+08:00https://warmcat.com/2016/10/17/Generic-mosfet-power-switching<h2 id="generic-mosfet-power-switching">Generic Mosfet Power Switching</h2>
<p>Digital developers usually deal with low DC voltages around 5V, but there are also higher voltages (12V, 19V, 24V) in less common use directly, and as the input voltage to be regulated down to the lower voltages. Therefore typical bench PSUs are able to provide voltages at least between 0 - 30V.</p>
<p>If you want to be able to control that arbitrary range of voltages, for test or as part of a developer write - burn - crash testing loop, you will need generic DC power switching - turning DC on and off - over the range 0 - 30VDC and up to say 3A. (3A is conveniently the rated current for 0.1" box connectors).</p>
<p>And in our case, we want to do it isolated from the on-off controller, which operates at digital voltage levels unrelated to 30V and indeed an unrelated ground reference.</p>
<p>After the isolated DC-DC and digital isolator (optoisolator, capacitive isolator or whatever) for the control signal, we can just throw a big MOSFET at it, right?</p>
<p>Well, let's compare a humble relay to that kind of solution...</p>
<h2 id="relay-race">Relay Race</h2>
<table><thead>
<tr>
<th>Feature</th>
<th>Relay</th>
<th>MOSFET</th>
</tr>
</thead><tbody>
<tr>
<td>Isolation of switched power from control signal</td>
<td><strong>Yes</strong></td>
<td>No</td>
</tr>
<tr>
<td>Tolerant of spikes and transients</td>
<td><strong>Yes</strong></td>
<td>No</td>
</tr>
<tr>
<td>Four-quadrant power switching (current +/-, voltage +/-)</td>
<td><strong>Yes</strong></td>
<td>No (blows up at high currents)</td>
</tr>
<tr>
<td>Low insertion loss to switch</td>
<td><strong>Yes</strong></td>
<td>Usually</td>
</tr>
<tr>
<td>Requires 0V reference of switched voltage</td>
<td><strong>No</strong></td>
<td>Yes</td>
</tr>
<tr>
<td>Fragile against short transients without precautions</td>
<td><strong>No</strong></td>
<td>Yes</td>
</tr>
<tr>
<td>Automatically OFF when unpowered</td>
<td><strong>Yes</strong></td>
<td>Not in one direction</td>
</tr>
<tr>
<td>Switching speed</td>
<td>ms</td>
<td><strong>us</strong> or ns</td>
</tr>
<tr>
<td>Switching bounce</td>
<td>Yes</td>
<td><strong>No</strong></td>
</tr>
<tr>
<td>Marketing value</td>
<td>low</td>
<td><strong>higher</strong></td>
</tr>
</tbody></table>
<p>Ehhhhh....</p>
<p>Relays have their own problems, there is a gap between ones capable to switch 2A and those capable to switch more. The 2A ones are available in convenient sealed SMT packages, but above it they are packaged in larger automotive style chunky rectangles.</p>
<p>They have contact bounce, and as electromechanical devices, eventually suffer from various failure modes like the contacts corroding. Switching DC accelerates some of them; for that reason a relay rated to swith 277VAC may only be rated for 28VDC. In fact DC is a big problem, typical relay datasheets specify a "minimum recommended contact load" of ~100mA at 5VDC, we do not want to burden our users with that kind of restriction for the generic case. In both cases it's related to electrochemical effects at the contact points of the relay switch.</p>
<p>Relays don't sound good for marketing purposes either.</p>
<p>There are optoisolated "solid state" relays, which are integrated MOSFET solutions, but these are VERY pricey for low on-resistance. And without the low on-resistance, they get hot. US$35 will buy you one that tops out at 2.5A and 24V, which doesn't meet our goals.</p>
<p>So what are the underlying problems a MOSFET faces with this generic switching task?</p>
<h2 id="problem-1-requirements-of-gate-drive">Problem 1: Requirements of Gate drive</h2>
<p>MOSFETs are used everywhere in electronic equipment for power control. Can't we just use an off-the-shelf IC for gate drive? This looks good at first sight</p>
<p><a href="http://www.ti.com/lit/ds/symlink/lm9061.pdf">http://www.ti.com/lit/ds/symlink/lm9061.pdf</a></p>
<p>There are many nice gate driver ICs which can be used for both N and P MOSFETSs, and include their own charge pump to provide the necessary gate voltage.</p>
<p>Well, these solutions don't work for switching input power down to 0V relative to system 0V. They assume that</p>
<ul>
<li><p>the job they are doing is a buck regulator, that is regulating a higher voltage down to a lower one</p></li>
<li><p>the switching is rapid and continuous as it would be in a buck regulator, perhaps up to 2MHz, which implies PWM type gate modulation, which is coupleable by transformer... not what we are doing</p></li>
<li><p>the gate driver is powered from the rail being switched (it must at least be connected to it).</p></li>
<li><p>they disable themselves if the input voltage to be switched is what they consider to be "undervoltage", usually >5V</p></li>
</ul>
<p>This last 'feature' is highly desirable in the case we're a regulator switching from a low battery, or a decaying source power supply that has been switched off, and enabling the switch should be forbidden. But for the 0 - 30V target, it renders the solution useless: in some cases like the TI chip linked above, the undervoltage threshold is as high as 7V; we can't even switch 5V with that. Plus then the charge pump is powered from the switched power; we just want to turn it on or off, not consume it.</p>
<p>In our case we have our own isolated power we need to run the switch from, and we need the power switch side to work independent of the voltage being switched (in the valid range 0... 30V).</p>
<p>Whereas relays have no problem with needing special bias for what they are switching, P MOSFETs require gate switching below 0V (typically -10V for full enhancement) and N MOSFETs require gate switching above the switched rail (again typ +10V above the switched rail). The integrated solutions provide a charge pump in the IC, without that some other way is needed.</p>
<p>Discrete solutions for this kind of non-mainstream MOSFET usages are therefore common, there's a really excellent article exploring the various permutations here</p>
<p><a href="http://www.radio-sensors.se/download/gate-driver2.pdf">http://www.radio-sensors.se/download/gate-driver2.pdf</a></p>
<p>As the author says</p>
<p><em>''All these difficulties make the high side driver design a challenging task.''</em></p>
<h3 id="problem-2-importance-of-enough-gate-margin-to-quot-enhance-quot-the-mosfet">Problem 2: Importance of enough gate margin to "enhance" the MOSFET</h3>
<p>Switching low voltages at high currents really requires these 'difficult' gate voltages, either 10V below 0V or above the switched voltage. If you try to switch using 0V or the rail voltage at the gate, as the switched voltage decreases the voltage between the Source and Gate will likewise decrease until MOSFET will find itself in the linear region, where it is not fully "on" and enhanced any more, but is restricting the current that may flow proportional to the gate voltage. In the worst case where the resistance of the MOSFET is equal to that of the the load, the MOSFET will dissipate (V/2) * I, for V=2V (typical linear region Vgs) and I=3A, that means 6W as heat. So we need to not be in the linear region of gate voltages.</p>
<p>If the MOSFET is instead always fully enhanced by the "difficult" ~10V gate-source voltage, at low voltages the heat dissipation is negligible because the on resistance is 10mR for P or as low as 75uR for N.</p>
<h3 id="problem-3-driving-gate-capacitive-load">Problem 3: Driving gate capacitive load</h3>
<p>Increasing the danger of finding yourself spending too much time in the dangerous linear region of the gate potential is the capacitive nature of the gate itself. You basically have to modify the gate charge, which is an RC type activity whose speed depends as much on the impedance (the R part) of the drive voltage you want to change it to, as it does the gate's intrinsic capacitance (the C part).</p>
<p>The larger (more power-capable) the MOSFET, the higher the gate capacitance.</p>
<p>To change the gate charge quickly then, so we slew through the linear region in ns rather than ms, the gate drive must have access to a low impedance power supply at the right level, where the "right level" is "Source - 10V" for a P Channel MOSFET, and it must actively switch the gate drive... pullups are too slow since they form an RC network with the gate capacitance.</p>
<p>That is not so simple to provide, when the MOSFET Source the gate drive is referenced to may itself be anywhere between 0V and 30V compared to the system 0V (and the system 0V may be floating compared to the digital control side).</p>
<h3 id="p-and-n-channel-mosfets-as-high-side-switch">P and N channel MOSFETs as High Side Switch</h3>
<p>P and N-channel MOSFETs are analogous to PNP and NPN bipolar transistors. N-channel MOSFETs are cheaper than P ones, and have lower Rdson (enhanced on-resistance). You can use either as the switch, the main impact is then you must come up with either switched voltage - 10V (P) or switched voltage + 10V (N).</p>
<p>As you can see, these voltages are <em>not referenced to the system 0V</em> but to the switched voltage, ie, the MOSFET source. Any MOSFET will immediately, permanently die if the gate voltage should ever exceed a maximum permitted source-gate voltage, usually around +/- 20V compared to the source; this is scary seeing as our input range encompasses 30V. It's also scary because violations as short as 1ns will kill the MOSFET, at 30V hotplug transients can easily manage this if there are no steps taken to protect it.</p>
<p>Normally the voltage being switched is already decided inside some equipment and this can be figured out. But in this case the voltage being switched is something random in the range 0 - 30V. And we do not want to load or impact the voltage we are switching any more than necessary, meaning we want to generate the gate voltage from our own isolated power.</p>
<h2 id="problem-4-mosfet-switch-only-works-in-one-direction">Problem 4: MOSFET Switch only works in one direction</h2>
<p>With the Humble Relay, there is no intrinsic polarity to the two sides of the switch, both sides are interchangeable.</p>
<p><img src="https://warmcat.com/mp1.png" alt="https://warmcat.com/mp1.png"></p>
<p>But with a MOSFET, he has a giant parasitic diode between D and S. That means if there is a MOSFET between them, the two sides of the switch are quite different, one must be more positive than the other for normal operation. What happens if they are the "wrong" way around is that even when the gate is not driven, the parasitic diode conducts and the switch is effectively "always on".</p>
<p>If there is a high load (3A in our case), the diode voltage drop translates to a large heat dissipation at the MOSFET, and we can't turn it off, since the diode is unconditionally conducting under those conditions. So the MOSFET will just burn.</p>
<h2 id="problem-5-negative-voltage-at-input-unprotected">Problem 5: Negative Voltage at input unprotected</h2>
<p>Again, the Humble Relay doesn't care if you are switching voltages below 0V or indeed what the voltages are referenced to, if it doesn't make the relay insulation break down. In our case we only support positive voltages compared to system 0V, but as it stands with one MOSFET doing the switching, again the protection diode will conduct and we lose control of the conduction and heat generation at the MOSFET.</p>
<p>The most common case for this in the real world is someone reversed the input power polarity. We will continuously pass it (minus a diode drop) and blow up his attached equipment and maybe our MOSFET.</p>
<h2 id="problem-6-mosfet-solution-needs-0v-for-switched-side">Problem 6: MOSFET solution needs 0V for switched side</h2>
<p>If we just have a simple MOSFET doing the switching, for all normal solutions the 0V the switched voltage is referenced to is required, as part of the gate voltage generation. In contrast the relay doesn't care, contacts are either shorted or not: what the voltage on the contacts is referenced to is not relevant assuming it's within the voltage + current profile the relay contacts and isolation is rated for.</p>
<h2 id="urgh">Urgh...</h2>
<p>That's a whole bunch of problems. Giving up and throwing a relay at it creates its own problems.</p>
<p>So is there a way through each issue?</p>
<h2 id="solutions-1-2-amp-6-generating-the-gate-voltage-level-0v-reference">Solutions 1, 2 & 6: Generating the gate voltage level + 0V reference</h2>
<p>We can simply use an isolated DC-DC converter, with one output side (- for N, + for P) connected to the MOSFET source and the other side being the gate drive level.</p>
<p><img src="https://warmcat.com/igd.png" alt="https://warmcat.com/igd.png"></p>
<ul>
<li><p>Output impedance is very low allowing fast gate drive.</p></li>
<li><p>It's floating and directly referenced to the MOSFET source</p></li>
<li><p>If we connect the +10V output to the MOSFET Source, the OV output will always be at "source - 10V", ie, exactly the right gate drive voltage</p></li>
<li><p>Nothing about it requires system 0V connected, everything on the isolated side of this DC-DC converter is referenced to the switching voltage</p></li>
<li><p>Gate voltage cannot get away from the Source - 10V, since it is referenced to Source, so exceeding the Vgs limit should never happen</p></li>
</ul>
<p>The solution is expensive in physical size though and not that cheap. But it nails several critical problems.</p>
<h2 id="solution-2-switching-the-floating-gate-voltage">Solution 2: Switching the floating gate voltage</h2>
<p>We can use an optoisolator channel to decouple the isolated gate drive level from the DC-DC converter from the switching signal. That gets us out of various difficulties around the gate drive level being below ground or above 40V referenced to the switching signal 0V.</p>
<p>Now we have our own isolated domain for gate drive, we can use a discrete optoisolator with his output directly referenced to that domain.</p>
<h2 id="solution-6-removing-the-need-for-0v-reference-on-switched-side">Solution 6: Removing the need for 0V reference on switched side</h2>
<p>You might still want it so you can measure the voltage of the switched side, but with the DC-DC converter and optoisolator scheme, there is no longer any need for the switching 0V reference in the switching circuit. Everything there is floating and referenced to the MOSFET Source, which as we will see becomes the highest of the two potentials on either side of the switch, minus a diode drop if the switch is off.</p>
<h2 id="solution-4-making-the-switch-commutative">Solution 4: Making the switch commutative</h2>
<p>We cannot eliminate the parasitic diode on the MOSFET. But we can add a second identical MOSFET in series with its parasitic diode in the opposite direction, effectively blocking current flow between the drains via the diodes.</p>
<p><img src="https://warmcat.com/mp2.png" alt="https://warmcat.com/mp2.png"></p>
<p>Now whatever the situation at the two drains, the diodes cannot conduct on both sides simultaneously and pass current from drain-drain without the MOSFET actually being on.</p>
<p>Instead the <strong>drain with the highest potential</strong> will set the bias on the combined source pins of the two MOSFETs, minus a diode drop when the MOSFETs are OFF. We don't care what that is, but since we will use the source pins as our reference for the floating gate bias, we need it.</p>
<p>Now whichever way you connect the voltage source and switched output, it meets the same situation, and no current passes without the MOSFET being on. We can monitor the combined source pins and estimate the highest drain voltage even when the switch is off.</p>
<p>This solves the "uncontrolled parasitic diode conduction" problem, but it creates a two new, smaller problems:</p>
<p>1) the parasitic diode was actually useful snubbing negative flyback currents created when the load was somewhat inductive. We therefore need to add back a diode on either side of our switch (to system 0V this time) to snub negative excursions.</p>
<p>2) We doubled our Rdson by making the switched current navigate through two MOSFETs instead of one. We can mitigate it by using slightly more expensive P Channel MOSFETS with 10mR Rdson, giving a manageable 20mR total. (Originally I was using cheaper P channel MOSFETS each of 20mR).</p>
<h2 id="solution-1-amp-3-isolated-active-gate-drive-for-on-and-off">Solution 1 & 3: Isolated, active gate drive for on and off</h2>
<p>We can use a discrete bipolar totem pole circuit to actively force the gate between source and the gate drive voltage rapidly, without relying on pullups.</p>
<p><img src="https://warmcat.com/iso-gd.png" alt="https://warmcat.com/iso-gd.png"></p>
<p>The fact we control the gate from a discrete optoisolator simplifies controlling it, as we can directly reference the transistor output of the isolator to the floating S - 10V.</p>
<h2 id="solution-5-protecting-against-reversed-negative-voltages">Solution 5: Protecting against reversed / negative voltages</h2>
<p>We can detect that either side of the switch is lower than a diode drop below system 0V (if they bothered to connect that), and turn off the MOSFETs. Now we have two MOSFETs, with back-to-back parasitic diodes, no fault current can flow through the diodes with the MOSFETs off, so we just need to turn the MOSFETs off when we see negative voltages.</p>
<p>However that means if the power polarities were reversed, we have left the system 0V connected at up to +30V. If we have digital connections using that system 0V reference, that means a low level on them appears to be -3.3V compared to the effective system 0V, or +26.7V compared to the intended system 0V.</p>
<p>The solution to this is a comparator to monitor if either side of the switch is more than a diode drop below ground, and an N-channel MOSFET that can disconnect the external ground, as well as the high-side switch.</p>
<h2 id="rerun-relay-race">Rerun Relay Race</h2>
<table><thead>
<tr>
<th>Feature</th>
<th>Relay</th>
<th>Single MOSFET</th>
<th>Dual isolated MOSFET scheme</th>
</tr>
</thead><tbody>
<tr>
<td>Isolation of switched power from control signal</td>
<td><strong>Yes</strong></td>
<td>No</td>
<td><strong>Yes</strong></td>
</tr>
<tr>
<td>Tolerant of spikes and transients</td>
<td><strong>Yes</strong></td>
<td>No</td>
<td><strong>Yes</strong></td>
</tr>
<tr>
<td>Four-quadrant power switching (current +/-, voltage +/-)</td>
<td><strong>Yes</strong></td>
<td>No (blows up at high currents)</td>
<td>2 quadrant but protected</td>
</tr>
<tr>
<td>Low insertion loss to switch</td>
<td><strong>Yes</strong></td>
<td>Usually</td>
<td><strong>Yes</strong></td>
</tr>
<tr>
<td>Requires 0V reference of switched voltage</td>
<td><strong>No</strong></td>
<td>Yes</td>
<td><strong>No</strong></td>
</tr>
<tr>
<td>Fragile against short transients without precautions</td>
<td><strong>No</strong></td>
<td>Yes</td>
<td><strong>No</strong></td>
</tr>
<tr>
<td>Automatically OFF when unpowered</td>
<td><strong>Yes</strong></td>
<td>Not in one direction</td>
<td><strong>Yes</strong></td>
</tr>
<tr>
<td>Switching speed</td>
<td>ms</td>
<td><strong>us</strong> or ns</td>
<td><strong>us</strong></td>
</tr>
<tr>
<td>Switching bounce</td>
<td>Yes</td>
<td><strong>No</strong></td>
<td><strong>No</strong></td>
</tr>
<tr>
<td>Marketing value</td>
<td>low</td>
<td><strong>higher</strong></td>
<td><strong>higher</strong></td>
</tr>
</tbody></table>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>MOSFETs need quite a bit of support and adaptation to "look like a relay" generically.</p></li>
<li><p>For switching, the MOSFET wants to be either "enhanced", ~10V below the switched voltage for P or 10V above for N, or off (gate = source).</p></li>
<li><p>Integrated MOSFET gate drive solutions don't apply for a generic switch application, because they target the much more common buck regulator switching application and contain features that break generic switching</p></li>
<li><p>Inbetween OFF and Enhanced, there is a linear region which can lead to high (fatal) dissipation in the MOSFET</p></li>
<li><p>The parasitic body diode of the MOSFET defeats operation of the MOSFET as a switch in one direction, the diode is "always on". We can solve it by using two MOSFETs back-to-back</p></li>
<li><p>Switching isolated voltages is more challenging, but we can avoid most of the problems by declaring the combined sources of the two MOSFETs its own isolated domain and interacting with it via a DC-DC to generate the enhanced gate level, and an optoisolator to control it.</p></li>
<li><p>Bipolar transistors are still useful in a totem-pole configuration for active gate control in both on and off switching</p></li>
</ul>
Advantages, Limitations and Costs of Galvanic Isolation2016-10-07T00:00:00+08:00https://warmcat.com/2016/10/07/Advantages-limitations-galvanic-isolation<h2 id="galvanic-isolation">Galvanic Isolation</h2>
<p>Galvanic isolation means there is no path for electrons to flow directly between two points.</p>
<p>That doesn't mean current can't pass using "indirect" methods; for example the two sides of a transformer are galvanically isolated, but current on the primary side can be exchanged for a magnetic field, and secondary side may exchange that magnetic field for current that has "no galvanic link" to the primary side; electrons did not pass directly. The magnetic field created on one side efficiently induced a proportionate current on the other side.</p>
<p><img src="https://warmcat.com/800px-Faradays_transformer.png" alt="https://warmcat.com/800px-Faradays_transformer.png">
Faraday's original toriodial transformer</p>
<p>This is how come you can plug your phone charger in the wall, which is at a voltage that can hurt or kill you, but you can touch the shield of the USB connector coming out of it without problems, the energy passed through a magnetic intermediary that removed all direct connection to its origin. (This is not 100% true in a real power adapter, as we will discuss in another article).</p>
<p>In Faraday's version, the two sides are not physically connected to each other; the wire is isolated from the metal ring with "cotton and calico". The official term for this is that the two sides are "Galvanically isolated", this is after Lugi Galvani, another European this time from the 18th Century, who discovered the effect of electricity on dead frog muscles, "sparking off" the whole Frankenstein thing.</p>
<h2 id="advantages-of-galvanic-isolation">Advantages of Galvanic Isolation</h2>
<h3 id="floating-reference">Floating reference</h3>
<p>We say that the isolated part of the circuit is "floating" with respect to the power source. Since there is no direct path for current to flow back to the power source, there is no way to keep it aligned to any reference relative to the power source. It's similar to a balloon that has nothing tethering it.</p>
<p>Instead it weakly finds its own potential according to whatever tiny leakage currents are available - if you connect it to something, it will immediately align itself to the potential of whatever you connected it to. In itself, it won't put up any resistance to adopting that reference; it is not connected directly to anything that could provide resistance.</p>
<p>A familiar case of galvanic isolation is a battery-powered instrument like a multimeter, since it has nothing fixing its reference potential it will float to whatever reference it is connected to; for example in itself a battery powered multimeter can be directly connected to the mains for measurements. It still works because it is floating at the mains potential, so its "0V" is at the mains potential, and its battery is "mains potential + 9V". Local to the multimeter nothing changed.</p>
<h3 id="break-grounding-loops">Break grounding loops</h3>
<p>In a building (especially between floors), across a building and sometimes even in a room, the earth reference at the mains sockets it not at the same potential everywhere.</p>
<p>That means if they should be connected together, for example by plugging different equipment together even if the connection is indirect, current will flow through the connection.</p>
<p>This can cause mysterious 50/60Hz hums, unexpected sparks, crashes and shocks when connecting cables etc.</p>
<p>If the connection is via galvanic isolation, this kind of issue is eliminated.</p>
<h2 id="gotchas-of-galvanic-isolation">Gotchas of Galvanic Isolation</h2>
<h3 id="this-isolator-isolates-in-itself-but">This isolator isolates in itself, but...</h3>
<p><img src="https://warmcat.com/gi1.png" alt="https://warmcat.com/gi1.png"></p>
<p>"in itself" may not be the whole story, there may be other external connections and relationships through those connections that can pass current directly, bypassing the isolation.</p>
<p><img src="https://warmcat.com/gi2.png" alt="https://warmcat.com/gi2.png"></p>
<p>For example your battery powered instrument may become completely un-isolated if you plug in a USB connection to your PC, or plug in a charger, or you accidentally left a 'scope ground probe connected somewhere.</p>
<p><img src="https://warmcat.com/gi3.png" alt="https://warmcat.com/gi3.png"></p>
<p>It's the user's responsibility to make sure the isolation can be effective in the larger system safely.</p>
<h3 id="the-two-sides-are-isolated-via-insulation-but">The two sides are isolated via insulation, but...</h3>
<p>Nothing is a perfect insulator, but many things have such high impedance they can usually be counted as infinite for some set of purposes.</p>
<p>However --></p>
<h3 id="no-direct-path-for-electrons-to-flow-but">No direct path for electrons to flow, but...</h3>
<p>There will be a breakdown voltage at which the insulation will break down and conduct, arcing over. This is a particular consideration trying to connect to mains with isolated equipment, since the mains is nominally 110VAC or 230VAC, but can experience excursions many times higher if you, eg, plug or unplug the equipment or other mains equipment on the same ring.</p>
<p>So the degree of isolation must be qualified with the maximum voltage below which the isolation is guaranteed to not break down. Often this is in the hundreds of volts or kV range, although that also must be qualified as the whether it's talking about DC or AC and for how long it can withstand that potential difference. If there may be excursions, you will have to do something to snub them.</p>
<h3 id="dc-carbonization-tracking">DC carbonization / tracking</h3>
<p>There is also a breakdown mode related to formation of conductive carbon deposits between conductors that have a high DC potential between them for long periods... this "tracking" that forms is like a kind of soot. I'll discuss that further in a related article.</p>
<p>If you will use isolated equipment that has dangerous potentials between the two sides, you must concern yourself with these and other details and safety arrangements.</p>
<h2 id="isolated-system-architecture">Isolated system architecture</h2>
<p>A system involving galvanic isolation usually has two things that must cross the barrier, power and data.</p>
<p><img src="https://warmcat.com/genisoblock.png" alt="https://warmcat.com/genisoblock.png"></p>
<p>And in the overall system, there will be two "0V" references, one on the nonisolated side and the other we can call "System 0V" on the isolated side.</p>
<h3 id="passing-power-between-isolated-domains">Passing power between isolated domains</h3>
<p>Isolated DC-DC converters are available quite cheaply and quite compactly for up to 1W. Inside the insulated package, they contain a tiny toroidial transformer very much like Faraday's. They are rated for isolation voltages from 1kV to 3kV usually.</p>
<h3 id="communication-between-isolated-domains">Communication between isolated domains</h3>
<p>There are several ways to communicate across the isolated domains without a galvanic connection.</p>
<table><thead>
<tr>
<th>Method</th>
<th>Description</th>
<th>Advantages</th>
<th>Disadvantages</th>
</tr>
</thead><tbody>
<tr>
<td>Optocouplers</td>
<td>LED and optosensor sealed in same package</td>
<td>Offers isolated bipolar transistor output</td>
<td>Expensive for many channels and may consume power like an LED</td>
</tr>
<tr>
<td>Capacitive couplers</td>
<td>Modulates high frequency RF carrier with the incoming signal, passes it over a capacitive barrier, demodulates</td>
<td>Digital (usually 3.3V) semantics, relatively cheap for many channels</td>
<td>Only digital semantics (no bipolar option)</td>
</tr>
<tr>
<td>Transformer</td>
<td>Passes data in the magnetic domain</td>
<td>cheap for one channel, can include voltage conversion</td>
<td>requires AC or modulated data</td>
</tr>
<tr>
<td>Explicit RF</td>
<td>WLAN or BT etc</td>
<td>no physical connection</td>
<td>not suitable for for intra-device connections, expensive for whole stack</td>
</tr>
</tbody></table>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Normally equipment must share a common reference voltage ("0V" or "Ground") to be powered and to communicate</p></li>
<li><p>Isolated DCDC converters and various techniques for communication across isolated domains let you get around that</p></li>
<li><p>The isolation is rated for a particular voltage beyond which it will "break down". This is often in the 1 - 5kV range.</p></li>
<li><p>If the galvanic isolation supports your worst case voltage differential, it allows you to work on circuits with radically different potentials, including circuits directly connected to live mains. It doesn't make any other danger magically disappear though, you must take responsibility for all necessary precautions regarding hazardous voltages.</p></li>
<li><p>Having no galvanic connection is useful for eliminating "ground loops" that may otherwise cause intractable problems</p></li>
<li><p>You have to ensure your isolated side really is isolated, considering any connections it may have in parallel to the galvanic isolation, eg, scope probe grounds. Otherwise the isolation is bypassed.</p></li>
</ul>
Driving Piezo Sounders2016-10-04T00:00:00+08:00https://warmcat.com/2016/10/04/Piezo-sounding<h2 id="piezo-sounding">Piezo Sounding</h2>
<p><img src="https://warmcat.com/sounder1.jpg" alt="https://warmcat.com/sounder1.jpg"></p>
<p>Since we need some audio feedback and alerts, we need some kind of audio transducer on a budget. The cheapest way to do that is a piezo transducer.</p>
<p>Unfortunately that turns out to be a long and interesting story, creating a very long article from what I initially assumed was already a well trodden path.</p>
<p>There are no DAC or other audio arrangements on the CPU (except a heavily pinmuxed I2S we don't have access to), we could fall back to</p>
<ul>
<li><p>simple PWM on a gpio</p></li>
<li><p>a fixed external oscillator making a beep</p></li>
<li><p>a canned buzzer with the oscillator integrated</p></li>
</ul>
<p>These will all work but suffer from being kind of basic and crappy.</p>
<p>Like with the video, we have enough FPGA LUTs available we can do something better. However we don't want to over-engineer it... it's the optimum way to do much more than expected with just a little. There is no requirement for high fidelity, we just want to take ourselves out of the shameful world of the fixed beep.</p>
<p>Piezo sounders have some constraints</p>
<ul>
<li><p>No useful low-frequency response (really anything below 1.5kHz)</p></li>
<li><p>To make 1.5kHz, PWM needs to operate at quite a high rate if all you have is soft-pwm</p></li>
<li><p>Many cryptic claims about how to drive them in the internet and complaints they are not loud enough</p></li>
<li><p>Care needed: dropping or knocking the piezo can generate some dozens of volts in both positive and negative directions killing your driver hardware</p></li>
</ul>
<h3 id="piezo-the-bidirectional-transducer">Piezo: the bidirectional transducer</h3>
<p>Ehh what was that last one? Yes piezo works both ways, you can flex it with a voltage, and it generates a corresponding voltage when externally flexed the same amount.</p>
<p>Here is the 40mm sounder in the first pic being tapped gently with a screwdriver</p>
<p><img src="https://warmcat.com/piezo-hit.png" alt="https://warmcat.com/piezo-hit.jpg"></p>
<p>You can see it happily generated +20V / -7.5V spikes just from that, dropping your finished device on the floor may create much more. So the circuitry around the piezo must limit these, effectively, fault voltages. Advice elsewhere on the Internet suggests just selecting 60V rated transistors, but there is no evidence under stress the piezo stops at 60V, or guarantee the next batch you buy won't have different characteristics. So it requires zener / diode clamping to properly control it.</p>
<h3 id="piezo-the-fragile-transducer">Piezo: the fragile transducer</h3>
<p>The active material is sandwiched between a thin brass circle and an <strong>extremely</strong> thin coating of silver.</p>
<p><img src="https://warmcat.com/piezo-silver-weld.png" alt="https://warmcat.com/piezo-silver-weld.jpg"></p>
<p>Silver is very attractive to solder, in fact the good 30AWG patch wire you can buy for breadboarding is coated with it for that reason.</p>
<p>However the plating is like 1 / 30th the thickness of aluminum foil, and like aluminum foil it tears easily; at this thickness even a little bit of heat from your soldering iron will melt it. Unless you want to dig up a furrow on it and ruin the transducer, you need the following approach</p>
<ul>
<li><p>reduce the heat on your iron so your solder still melts readily but no more</p></li>
<li><p>lay solder on the silver and tin a spot near the middle of the silver, with the iron there for less than half a second</p></li>
<li><p>tin your wire</p></li>
<li><p>bond the tinned wire to the tinned spot again with the iron there for a fraction of a second just to melt solder to solder</p></li>
<li><p>don't let the wires pull or twist off the transducer subsequently</p></li>
</ul>
<p>Another kind of fragility is related to overstress on the silver from excessive flexing (eg at too high a voltage): the silver under the solder will simply debond from the underlying piezo substrate en bloc over time, leaving the wire poised uselessly slightly above the transducer and permanently silent.</p>
<p>You can also buy slightly more expensive sounders with an enclosure and bonding wires already sorted out, which is recommended for these and other reasons.</p>
<h3 id="piezo-the-19th-century-european-transducer">Piezo: the 19th century European transducer</h3>
<p>Piezoelectricity was discovered in France in 1880.</p>
<p>But actually it turns out it doesn't work very usefully for audio without the work of a German polymath from thirty years before, Hermann von Helmholz. (he was destined for greatness since his name is the 19th century equivalent of "German McGermanface")</p>
<p><a href="https://en.wikipedia.org/wiki/Helmholtz_resonance">https://en.wikipedia.org/wiki/Helmholtz_resonance</a></p>
<p>In a nutshell this is what happens when we blow across a bottle and get a satisfying, unreasonably strong response at a particular frequency, even though we just blew white noise across it.</p>
<p>By themselves, with no enclosure, the piezo transducers do not make a huge amount of noise even when driven properly. This caused me a lot of headscratching when I first started looking at them, because upping the drive voltage even to 24V did not really make any difference. It was only when I found a type pre-enclosed in a Helmholz resonator</p>
<p><img src="https://warmcat.com/sounder2.jpg" alt="https://warmcat.com/sounder2.jpg"></p>
<p>that suddenly I was able to make sufficiently loud sounds even with 5V switching.</p>
<p>Basically the difference is that of the large sound coming from blowing across a bottle opening, and the sound of just blowing with no bottle there. Anyway the takeaway is you can only usefully use the combination of transducer + Helmholz resonator.</p>
<h3 id="just-use-the-combined-transducer-resonator">Just use the combined transducer + resonator</h3>
<p>There are other problems trying to use the bare piezo "bender" (as they are officially called, no sniggering at the back there) they must be anchored to something and exactly how and where that is done affects the vibration mode and the amount of point stress experienced by the transducer where it is anchored.</p>
<p>The simplest way is just eliminate</p>
<ul>
<li>wire bonding problems</li>
<li>fabrication problems creating an effective resonator</li>
<li>unexpected impact of strange vibration modes</li>
<li>reliability issues around flexing and around the wire bonding</li>
<li>questions of how to anchor</li>
</ul>
<p>by forgetting trying to use the raw piezo bender, and buying a resonator along with the transducer.</p>
<h2 id="the-thin-spactetime-of-guanghua">The Thin Spactetime of Guanghua</h2>
<p>In Taipei there's a district with a long and sometimes dubious history called Guanghua, this was originally a seedy rundown market-under-tunnels for electronics goods and other, sometimes illegal, things (a garrulous modern-day vendor there pining for the Good Old Days confided to me: guns) back into the 1970s. It's the Taiwanese analogue of the historical Akihabara before it lost a dimension.</p>
<p><img src="https://warmcat.com/old-guanghua.jpg" alt="https://warmcat.com/old-guanghua.jpg">
(Photo from Richy at the Chinese Wikipedia, licensed under GFDL1.2)</p>
<p>It became forcibly cleaned up before my family and I moved here, and the previous family market stalls shunted into concrete retail spaces in a large multistory building dedicated to them.</p>
<p>However in Taiwan markets bleed into one another without clear boundaries, there is another section of electronics stores alongside it between it and the Zhongxiao Xingsheng MRT, and there are found aboveground and subterranean stores serving electronics geeks such as myself.</p>
<p>My "KPEG156HP" bender + resonator came from there... like many things it holds the promise of being able to source them inexpensively locally if you can roll 6 on enough dice. But Akihabara, Guanghua... the spacetime they are built on is thin and sometimes things leak across from other worlds. The KPEG156 I have in my hand does not officially exist, the closest the real world dares acknowledge is the KPEG159</p>
<p><a href="http://www.kingstate.com.tw/index.php/ja/component/k2/item/408">http://www.kingstate.com.tw/index.php/ja/component/k2/item/408</a></p>
<p>That device is specified for 4kHz operation, the overall frequency response is like this (note the log frequency scale)</p>
<p><img src="https://warmcat.com/kpeg159-freq.png" alt="https://warmcat.com/kpeg159-freq.png"></p>
<p>Well you can see from the messy response, it's not reasonable to expect any kind of general fidelity from this. And it is specified for 6Vp-p operation, we will be using it with 5V which is almost there. (This of course is not the same product or exactly the same size, but we can buy these ones from Kingstate in the normal non-Guanghua reality)</p>
<p>Clearly though, within its abilities (ie no low frequency content), you can make more than just buzzing sounds with it, even if you have to give up on classical music.</p>
<h2 id="driving-the-element">Driving the element</h2>
<p>Piezo elements act like a capacitor, in the case of KPEG159 one of 14nF.</p>
<p>There are many funky schemes out there for driving them, including inductors to make larger spikes; that's fine if all you want are uncontrolled spikes at pulse edges.</p>
<p>However if you want to make more complex sounds, and you don't have a DAC, you are going to have to make a high resolution PWM that can use time to express the level as the ratio between two states. That can be done by basically making a boost regulator with the inductor, but that eats into your time resolution maintaining the boost ratio to create the voltage hike.</p>
<p>Between higher voltage and having a resonator around the transducer or not, the higher voltage is more or less worthless. 5V and the resonator is enough to make loud, controlled, noises with KPEG156 (and presumably KPEG159).</p>
<p>That suits us, since we effectively then need a 1-bit DAC clocked at 48MHz resolution, which can be made from very little logic in the FPGA.</p>
<p>There are several choices about how to go about it:</p>
<ul>
<li>1) Single-ended pulled-up drive</li>
</ul>
<p><img src="https://warmcat.com/pd1.png" alt="https://warmcat.com/pd1.png"></p>
<ul>
<li>2) Single-ended push-pull (aka Half-bridge)</li>
</ul>
<p>(Notice you have to manage the base drive so the top and bottom transistors are never on simultaneously)</p>
<p><img src="https://warmcat.com/pd2.png" alt="https://warmcat.com/pd2.png"></p>
<ul>
<li>3) Bipolar pulled-up drive</li>
</ul>
<p><img src="https://warmcat.com/pd3.png" alt="https://warmcat.com/pd3.png"></p>
<ul>
<li>4) Bipolar push-pull (aka Full or "H" Bridge)</li>
</ul>
<p>(Notice you have to manage the base drive so the top and bottom transistors are never on simultaneously)</p>
<p><img src="https://warmcat.com/pd4.png" alt="https://warmcat.com/pd4.png"></p>
<p>It boils down to</p>
<ul>
<li><p>"drive one side or both sides" (solutions 1 & 2 vs solutions 3 & 4)</p></li>
<li><p>"use a pullup or active transistor" (solutions 1 and 3 vs solutions 2 and 4)</p></li>
</ul>
<p>The ideal solution is # 4, to use an H bridge. There are two reasons:</p>
<ul>
<li>the pullup acts relatively slowly and with an exponential waveform, adding distortion because the PWM ratio loses some time when in the pulled-up state. If we actively drive it, that is eliminated.</li>
</ul>
<p>For example here, with a 1K pullup on both sides, in the centre of the waveform, when we turn on the transistor we immediately start the "low" part of the PWM in a clean way</p>
<p><img src="https://warmcat.com/piezeo-pullup.png" alt="https://warmcat.com/piezeo-pullup.png"></p>
<p>But at the left of the capture, when we turn off the transistor and start the "high" part of the PWM waveform, it lazily ambles high, since it is a passive resistor fighting a 14nF capacitive load. The H bridge would have provided the necessary current immediately to force it straight to 5V the same way that the existing transistor forced it low immediately.</p>
<p>Reducing the pullup to 470R noticeably improves the risetime and the loudness</p>
<p><img src="https://warmcat.com/piezo-pullup-470r.png" alt="https://warmcat.com/piezo-pullup-470r.png"></p>
<p>The pullup can be reduced further at the cost of power when the audio is running and maybe noise introduced on the 5V rail.</p>
<ul>
<li>the piezo element can flex in two directions depending on polarity of the charge across it. When we use a single-ended solution, we give up half of the potential flex travel (which means, half the noise generating capability) because we can't reverse the polarity of the charge across the transducer in that configuration.</li>
</ul>
<p>Indeed when you look at catalogue IC solutions for this, they are implementing an H bridge because the incremental cost inside an IC is almost nothing and the advantages are nice.</p>
<p>Although the H bridge is "ideal", it isn't ideal in terms of cost or board space if you are making a discrete solution. In the real world, the third solution is the best compromise considering the other constraints we will discuss next.</p>
<h2 id="designing-the-quot-sound-card-quot-in-the-fpga">Designing the "sound card" in the FPGA</h2>
<p>Although I was hoping to experiment with a stochastic solution, where you gate noise from an LFSR using the PWM sample to select or reject it, that will have to wait until we have a GHz design to try it out on.</p>
<p>With this design, the fastest clock we can rely on is 48MHz. So you can do some quick back-of-the-envelope numbers and see what is possible.</p>
<p>Eg, if we wanted 16-bit PWM resolution, the sample rate would be 48MHz / 65536 = 732Hz, which is useless.</p>
<p>At 12-bit resolution, we can get 11.718kHz, which is workable. It can resolve a 5kHz sine fine which is above the ideal resonant frequency for the transducer.</p>
<p>Trying this, driven using a Hyperram FIFO, although it makes sounds well, the sample rate creates a high-pitched, perceptible 11kHz whine even when the samples are all 0.</p>
<p>Therefore I kept the sample rate at 11kHz / 12bps, but split each sample into 4 PWM actions, taking care to dither the two LSB back into the PWM quadrant timing; the sample rate remains at 11kHz, the effective resolution remains at 12 bps but the PWM rate is increased beyond audible range to 45kHz. This comes at some cost in amplitude though... as shown earlier, using the pullup scheme is not very responsive, multiplying the number of times we need the pullup by 4 means a corresponding loss of efficiency. It's still louder than required so no problem.</p>
<h2 id="clicks-and-pops">Clicks and pops</h2>
<p>Using the "third solution" of "Bipolar pulled-up drive" also means that the transducer idles with 5V on each side, ie, both transistors are off and no current flows; there is no DC voltage across the transducer since both sides are at 5V.</p>
<p>When we want to play samples though, the initial state and final state of the samples will be signed "0" from the WAV files that are coming from a ROMFS. Since our scheme is fundamentally unsigned, this is corrected in the logic to effectively unsigned 0x800 in the unsigned rand 0x000 to 0xfff.</p>
<p>Therefore when you start or stop playing, you will get a discontiguity from the resting state that corresponds to PWM level 0xfff, and the first sample, which corresponds to PCM level 0x800, and you will get "clicks and pops".</p>
<p>To avoid that, the FPGA logic performs two slow linear ramps from 0xfff to the initial state 0x800 before starting playback, and from whatever the last played sample was before stopping playback, to 0xfff in hardware. This reduces any discontiguity to the point it is inaudible (doubly so since the transducer is unresponsive at low frequencies).</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Forget unenclosed piezo elements unless you're going to put it in some headphones. Only piezo elements with a Helmholz resonator built in are worth considering.</p></li>
<li><p>Voltage across the piezo element affects loudness to some extent. But to far less an extent than the resonator, and there are diminishing returns as the voltage increases. 5V is enough to hear it across a room.</p></li>
<li><p>The piezo element generates large voltages. Clamp it to within what the drive circuitry can handle.</p></li>
<li><p>Using a transistor on both sides driven at 180 degree phase offset, with pullups, is recommended as a good compromise with good results.</p></li>
<li><p>You can make a nice "sound card" in an FPGA with PWM. But you must crank the PWM rate beyond audibility, even if the new sample rate remains in the audible range to avoid whines.</p></li>
<li><p>Signed 0 samples are disjoint from the piezo resting state of 5V and will generate click and pops when starting and stopping playback. The FPGA automatically performing a linear ramp down and up when starting and stopping suppresses them nicely.</p></li>
</ul>
SPI as video and alpha compositor2016-09-26T00:00:00+08:00https://warmcat.com/2016/09/26/SPI-as-video-and-alpha-compositor<h2 id="spi-lcd-extreme-sports">SPI LCD extreme sports</h2>
<p>As I <a href="https://warmcat.com/embedded/lcd/tft/st7735/2016/08/26/st7735-tdt-lcd-goodness.html">wrote up a month ago</a>, there are some nice LCD panels available with SPI interface and a frame buffer in the controller chip. You can see all kinds of projects on the Internet using them, many contaminated by Arduino since you can buy a shield with the panel on, since SPI is easy to wire up and even bitbang if necessary.</p>
<p>But all these projects simply use a CPU to scribble in the on-panel RAM. That means none of them look very much as people expect LCDs to look in a post-Android age. Redraws are unlocked to the video update, because none of these panels provide both SPI and TE (Tearing Effect, aka VSYNC), and there is no support for pageflipping at the panel, so you can see the CPU doing updates on the display. This reflects an assumption "nobody will do video on SPI".</p>
<p>Another issue with typical usage level of these panels is people using Arduino as a crutch are screwed for reading back the framebuffer data due to a design flaw in the shield implementation discussed in the other article. Even if you can read back, it is relatively slow to read the background then apply the new foreground; after you wrote it back once the original background is lost. So effects like alpha done in software must be composed in software, holding the related planes in CPU memory simultaneously, and in many cases the CPU is too weak or resource-constrained to do this well. In fact due to the very limited scope of the overall projects using this kind of panel already, interest in driving the panel well tends not to be the focus of the activities: it usually gets some blocky fixed-width numbers on it and that's it. Overall, there's a mismatch between what these nice panels can do and what people are using them for.</p>
<p>We can't fix the lack of TE, but the limited approaches to leveraging these nice panels just reflects prejudice not borne out by any technical restriction, because "SPI is not real video".</p>
<h3 id="spi-as-real-video">SPI as real video</h3>
<p>The ST7735 (which has nearly identical second-sources with other names, eg, ILI9163V) has an efficient SPI command RAMWR which says all following SPI data will be copied into a user-definable 2D region until the transaction ends. So nothing stops us setting the 2D region to the whole display and using the FPGA to spam 565 video at <= 20MHz SPI rate. For 160x128, we can easily deliver 30fps this way: SPI is truly acting as a digital video streaming protocol then.</p>
<p>The impact of no TE signal (or no VSYNC lock in other words) depends on what and how we update in the frame.</p>
<ul>
<li><p>For updates where large contiguous areas changed data (intensity or chroma, although from which colour to which matters) completely, we will noticeably tear.</p></li>
<li><p>For regions that "scrolled by one pixel" though, any tearing is restricted to part of one line of pixels. That may be completely unnoticeable.</p></li>
<li><p>For regions where characters update-in-place, like a counter type display, the tearing can impact any amount of the character update area, although only in the characters that changed anyway and only for the one frame they changed in.</p></li>
</ul>
<p>In cases where restricted areas of the display update, or things move in one pixel increments then, tearing is not necessarily an issue.</p>
<h2 id="display-subsystem-architecture">Display Subsystem Architecture</h2>
<h3 id="block-diagram">Block Diagram</h3>
<p>This is an overview of the video subsystem implemented in the FPGA. It's designed to be paired with a very resource-constrained CPU.</p>
<p><img src="https://warmcat.com/disp-block.png" alt="https://warmcat.com/disp-block.png"></p>
<h3 id="video-compositor">Video compositor</h3>
<p>Since we have hardware FIFO-buffered access to hyperram casually, a huge bandwidth to the hyperram, and no lack of FIFOs (20 x 256 x 16 on the FPGA), we also have the ability to compose (combine) the video stream in realtime from multiple sources; since we have all the data at the time we issue it, alpha composition or other effects between the layers are relatively simple to implement.</p>
<p>In my case the FPGA will collect data autonomously and render it into one separate hyperram buffer, and this also can be a hardware overlay plane dynamically composed into the output video stream without needing CPU intervention.</p>
<p>Overall there are three hardware planes composed together.</p>
<p>The CPU retains his own interleaved access to hyperram, so we don't lose any flexibility.</p>
<h3 id="blitter">Blitter</h3>
<p>Even though the display resolution isn't huge, a big problem is the amount of CPU time needed for copying font gylphs into the display plane. CPU access to the hyperram, like the LCD panel directly, is via SPI, so although interleaved access from the CPU to the hyperram is easy it's still relatively slow. Drawing large characters in a display plane will block the CPU for a relatively long time and in my case some text is updated very often, potentially inside an IRQ.</p>
<p>Since only a small proportion of the text is updated at a high rate, I considered for a while implementing a complex sprite unit which dynamically composed 2D areas at the video rate... this can work but it quickly becomes more complicated than we will ever get any benefit from, when you consider overlapping or different height sprites on the same scanline. In an FPGA stuff that takes a lot of LUT real estate (and / or design time...) has to pay for itself reasonably directly in functionality: since it's not much like PacMan or so sprites don't fit.</p>
<p>Instead I implemented a blitter engine, this is basically an autonomous "2D memcpy" that copies a given width x height from a source to a destination. This works fine, but considering it will be doing copies as finegrained as individual font glyphs at say 8x8px, it's difficult to synchronize just the blitter itself to the CPU activity without excessive overhead.</p>
<p><img src="https://warmcat.com/disp-blit-op.png" alt="https://warmcat.com/disp-blit-op.png"></p>
<p>While working with Fujitsu in Taiwan, I wrote a Linux kernel driver for a much more complex but basically the same kind of blitter hardware on their silicon: although it's very efficient for medium - large bitmaps, for many small actions like glyph blitting, the driver CPU overhead of</p>
<ul>
<li>formatting the blit in userspace</li>
<li>transferring it by IOCTL</li>
<li>queuing it in a driver (soft) ringbuffer</li>
<li>receiving the idle interrupt, and</li>
<li>setting up the transfer repeatedly</li>
</ul>
<p>massively overwhelmed the benefit of getting hardware to blit 8x8 or 16x16 compared to the CPU just poking it on to the plane memory. So not only blitter local efficiency is important, but it it's going to be useful at the system level also how it cooperates with the CPU.</p>
<h3 id="blitter-descriptor-ring">Blitter descriptor ring</h3>
<p><img src="https://warmcat.com/disp-blit-desc.png" alt="https://warmcat.com/disp-blit-desc.png"></p>
<p>Cognizant of that I added a hardware blitter descriptor engine in front of the blitter which is controlled by a large descriptor ringbuffer held in Hyperram... big enough for 1K descriptors. The CPU can then "fire and forget" by appending 8-word descriptors per-blit here and kick the blitter unit when a one or more of the descriptors is updated. It will autonomously read the blit descriptors in order, and perform them synchronously until it runs out of active descriptors, where it will wait in idle to be kicked again to service more appended descriptors in the ringbuffer. This lets multiple different processes share the descriptor ringbuffer, and it's able to interleave blits between completely different source and / or destination planes without problems, and allows the CPU to append whole strings of font blits quickly to hyperram.</p>
<p>Because the descriptor is only 8 words, even for a tiny font like 8x8 px that's definitely < 1/8th of the CPU load compared to drawing it into memory from the CPU, and has the advantage font can live in hyperram (on platforms like ESP8266, there is no RAM for it to live in since it's > 64KB). At an average 10 x 21 px font I plan to use, the CPU saving is 95% writing the 8-word descriptor instead of drawing the font directly; considering we are usually updating a string instead of isolated characters, the benefits of being able to treat the whole string as "fire and forget" while it's drawn asynchronously to the CPU adds up quickly.</p>
<p>The last descriptor word being zero indicates that the descriptor is invalid, and causes the descriptor engine to stop when it fetches it, until "kicked" over SPI to look again and attempt to restart. Making the last word hold the validity attribute simplifies protecting against the hardware seeing half-written descriptors, since when the last word is written, the descriptor is both valid and completely written.</p>
<h3 id="font">Font</h3>
<p>There's no problem using a proportional font with this scheme, although since the font glyph is simply copied into place the font cannot do dynamic pair kerning. However for many Sans type fonts, there are no overlapping serifs to make trouble.</p>
<p>The font is arranged into a bitmap with 16 characters per line, each character starting at the left of a 16 pixel space. So finding the start of a character is simple. The cpu has a small table of character widths it uses when advancing the X part of the descriptor, providing the proprtional behaviour.</p>
<p><img src="https://warmcat.com/font-detail.png" alt="https://warmcat.com/font-detail.png"></p>
<p>The overlays are additive per primary, with clipping per primary if they overflow; there is no space in the FPGA for the multipliers needed for real alpha. The background largely being black to facilitate this additive scheme naturally leads to black being transparent in the upper overlays, since it's basically "adding 0" to each colour channel.</p>
<p>For this reason the font bitmap uses black as its background; the font bitmap itself is 565 though, so there is some "additive alpha blending" in the anti-aliased parts of the glyph. In addition coloured characters are possible in the font for status symbols, etc.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>The 160x128 LCD panels are unexpectedly capable, even with SPI and no TE / VSYNC, if given some hardware support</p></li>
<li><p>By adding the right hardware various kinds of limitations in expectation of inexpensive SPI panel performance can be transcended.</p></li>
<li><p>We can once again leverage the hyperram to get a quite sophisticated video unit running in about 20% of the FPGA. That includes 3 x overlay mixing, associated FIFOs and a blitter</p></li>
<li><p>Three overlay layers with additive alpha blending simplifies object update by removing the need to care about background handling from software. You can choose the effective Z order when you choose which overlay plane to blit to</p></li>
<li><p>The blitter is perfectly matched to the cpu via a descriptor ringbuffer, so it can handle 1024-deep font glyph rendering in one step, at the cost of writing an 8 byte descriptor per transaction, independent of the bitmap dimensions</p></li>
<li><p>We can do antialiased, proportional fonts simply and rapidly, in multiple sizes</p></li>
<li><p>Updated video streams to the panel in 565 format, over SPI, at 30fps</p></li>
</ul>
Hyperram Bus Interface Unit and Arbitrator2016-09-20T00:00:00+08:00https://warmcat.com/2016/09/20/hyperram-biu-and-arbitrator<h2 id="hyperram-bandwidth-in-context">Hyperram bandwidth in context</h2>
<p>Now the hyperram Bus Interface Unit is working, the next problem is sharing its bandwidth efficiently so we can make use of it between the other functions in the FPGA confidently and efficiently.</p>
<p>On the other side of the Hyperram pipe, under the pseudo-static veneer, it's nothing more than traditional SDRAM. That means it's expensive in latency to select a column, but after that it's cheap to spam contiguous row data.</p>
<p>In other words since we have to spend 3 SDR clocks + latency (at least 2 more and as many as 5 more at our low speed) to select a random address, plus dead time around CS up and down, it's relatively expensive to move to a new address unless it's the old address plus one word. But once you have started the packet to set a random address, bursting is very cheap, 16-bits coming every SDR clock (15.6ns at 64MHz SDR).</p>
<p>(CPU access to Hyperram over SPI is a special case in this system, it's handled with a burst size of 1 every time since it's relatively uncommon).</p>
<h3 id="interfacing-to-a-bursty-bus">Interfacing to a bursty bus</h3>
<p>You could design the system with peripherals that only make individual 1-word "bursts" in sequence, ie, ignore bursting. (In fact I tested this after the BIU was able to work, with two masters doing accesses wordwise and it works fine, it blocks the bus for around 250ns each time overall). But as explained above, that goes directly against the main characteristic of SDRAM, that bursting is cheap.</p>
<p>Therefore the solution involves:</p>
<p><img src="https://warmcat.com/biu.png" alt="https://warmcat.com/biu.png"></p>
<ul>
<li><p>Designing the peripherals to favour sequential accesses, as we will see, that may mean instead of storing a "struct" or "descriptor" sequentially, it may be advantageous to store the members in their own sequential storage separately.</p></li>
<li><p>The peripherals must be able to wait until ALL the data they need is available.... another peripheral may be bursting on the bus and only one peripheral can do that at a time</p></li>
<li><p>Independent FIFOs that leverage the high bandwidth burst action so we can get in and grab a chunk of sequential data, and then hand it out to the peripheral on demand with usually no latency. The RAMs are acting like a rate adapter, between the constrained Hyperram burst and the unconstrained FIFO consumer.</p></li>
<li><p>An arbitration scheme between possibly many clients who may need to share the bus at reasonably low latency</p></li>
</ul>
<h3 id="ice5-block-ram">ICE5 block RAM</h3>
<p>The big ICE5 chip has 20 x 256x16 dualport RAMs on the die for exactly this kind of task. These can be converted to Hyperram compatible FIFOs with a modest amount of glue logic, giving us a primitive that can read from a range of Hyperram addresses into the 256x16 SRAM (aka "block ram"), but present a dumb FIFO interface on the other side.</p>
<p>This is the interface for a VHDL component that buffers burst reads from the Hyperram:</p>
<p><img src="https://warmcat.com/cbl_brfifo_read.png" alt="https://warmcat.com/cbl_brfifo_read.png"></p>
<p>The dumb FIFO side signals if something is there to read, and gets a 1 clock strobe from the consumer to say it has been used. That's all the downstream consumer needs to concern itself with; it might have to wait for "some reason" for the next FIFO data being available, but as for why, that it only has serialized access to a single SDRAM, or the number of steps needed to initialize a burst there: it doesn't have to understand or deal with it.</p>
<p>Fundamentally these abstract everything about the hyperram into a "stream" which the consumer can draw down as it (becomes available & it wants to use it), without negatively impacting the bandwidth characteristics needed to interoperate with Hyperram.</p>
<p>The probability of having to wait because the FIFO is empty is related to</p>
<ul>
<li><p>the latency gaining access to the SDRAM: itself related to</p>
<ul>
<li>the number of competing peers on the bus, and</li>
<li>the restriction placed on burst length</li>
</ul></li>
<li><p>the depth of the FIFO and</p></li>
<li><p>the rate that it is drawn down from the peripheral side.</p></li>
</ul>
<h3 id="bus-arbitration">Bus Arbitration</h3>
<p>Since this model involves many competing FIFOs that may simultaneously need access to the SDRAM, there has to be some arbitrator to decide who will get that access next.</p>
<p>Today I use the simplest form, a sequential round-robin arbitrator. If you have 16 FIFOs who may want to access the hyperbus, after each allocation completes it wastes a clock even if the related FIFO doesn't need the SDRAM right now. So FIFO 0 has a go, then next clock it checks FIFO 1 and so on. It's "correct" but it's not optimized and it becomes more expensive the more bus masters appear. (Currently with 4 masters, this doesn't make any issue, but it will grow to a dozen or so).</p>
<p>There are more efficient ways of handling this, where the priority dependent on who went last time is burned into a single logic expression for each state. So if FIFO 0 went last time, the expression to choose who goes next prioritizes FIFO 1 then 2 etc and FIFO 0 last. That way in one clock, it always chooses someone to use the bus, if anyone at all was waiting. But the number of client FIFOs is still fluid at the moment, so optimizing this is for later.</p>
<p>In more complex scenarios the individual FIFOs may need to be tagged with a general latency limit: for example FIFOs related to video generally have a higher priority since they cannot defer issuing the next pixel. But in my design, it is possible to defer issuing the next video pixel and this additional prioritization or deadline scheduling not necessary.</p>
<p>The most expensive part of the arbitrator is the mux required to allow every FIFO to control the hyperram address, write data, and various strobes. The logical addresses for everything are 32-bit to allow expansion, so there are a lot of signals being muxed. As the number of FIFOs needing access grows, the mux complexity also grows accordingly, putting a brake on the max system clock rate.</p>
<h3 id="system-burst-limit">System burst limit</h3>
<p>The Hyperram itself will just burst forever (although if you want to do that, you may need to observe individual wait states signalled back to the master using RWDS for read). But although that's nice for empty FIFOs who are getting their turn as bus master, it's a disaster for max latency for the FIFOs who need topping up but are stuck waiting their turn to talk to the Hyperram.</p>
<p>For that reason, it's necessary the Hyperram BIU FPGA RTL component itself reserves the right to end burst transactions unilaterally, in the name of putting a limit on how long anybody else has to get stuck with having to wait.</p>
<h3 id="dependency-on-multiple-fifos">Dependency on multiple FIFOs</h3>
<p>A common pattern is that a peripheral requires multiple streams, from various places in the SDRAM map, to be processed together. That's no problem with this scheme since the peripheral can understand it should stall until every FIFO required reports it has the next data to draw down, ie it stalls until all the dependencies ANDed together are met.</p>
<p>If you consider this structure starting up, initially all the FIFOs are reset and empty, then in turn each get an opportunity from the arbitrator to burst until they are at least semi-full (the policy for how empty to get before triggering a burst, and how much to burst at one sitting, is up to the FIFO implementation, plus or minus the BIU vetoing continuing a burst because it considers it too long). When the last dependent FIFO gets his turn to burst, as he starts to fill and finally signals he has valid data, the peripheral, who has been ANDing together all of the dependent FIFO valid data availability starts to process data from all the FIFOs he cares about.</p>
<h3 id="pipeline-friendly-io-primitive">Pipeline-friendly IO primitive</h3>
<p>In order to not introduce latency about use of the streaming data or deciding when to stop, there must be two critical pipelined signals between the BIU and each master. The basic semantics of these signal can be summed up as:</p>
<ul>
<li><p>"continue" - from master to BIU: when high, it means we want to continue the burst one more cycle</p></li>
<li><p>"word" - from BIU to master: when high, it means the word of data to / from the Hyperram is valid</p></li>
</ul>
<p>These two are enough to encapsulate the BIU streaming functionality while being able to stream at the rate of one word per SDR clock.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Peripherals must take advantage of Hyperram burst characteristics to get any kind of efficiency, that means using a block SRAM FIFO per bus master stream</p></li>
<li><p>Just having a BIU for the Hyperram is a critical part of the puzzle, but it's just a part of the puzzle</p></li>
<li><p>The BIU must be preceded by an Arbitrator, who can make sure all of the masters (there may be dozens) can get access in turn below some latency limit.</p></li>
<li><p>The Arbitrator comprises a round-robin scheme so everybody gets a turn within a limited latency, and a complex mux, so each master can individually control the address (to 32 bits) and other critical hyperram signals.</p></li>
<li><p>The master should define its own burst length (and start address), but the BIU must be able to truncate it, and the master deal with any remainder next time, in the interests of achieving reasonable unconditional system latency.</p></li>
<li><p>Peripherals needing to use the data from the FIFO bus masters must AND together the validity from the FIFOs they are dependent on before being able to process anything</p></li>
<li><p>"continue" and "word" are the two key semantics for no-latency stream processing</p></li>
</ul>
ICE5 Hyperbus DDR, PLLs Implementation tips2016-09-19T00:00:00+08:00https://warmcat.com/2016/09/19/hyperbus-implementation-tips-on-ice5<h2 id="hyperram-ddr-on-ice5-fpgas">Hyperram DDR on ICE5 FPGAs</h2>
<p>I was able to successfully implement a set of VHDL components for hyperbus on ICE5, including the DDR bus interface with quadrature clocks and a round-robin memory arbitrator. It's able to operate at 64MHz / 128MBytes/sec on a normal inexpensive ICE5 FPGA using inexpensive 1.8V 64Mbit Hyperram.</p>
<p>However it wasn't straightforward. There are no DDR sample VHDL components and no documented "vital" VHDL platform components either. So here are some tips for fellow strugglers.</p>
<h2 id="tip-1-register-the-hell-out-of-everything">TIP 1: Register the hell out of everything</h2>
<p>It's probably already obvious you must use DDR / registered IO cells for the 8-bit data bus, but you also need to do it for the aux signals like the output clock differential pair and nCS. I didn't do it for RWDS but I probably should have; it can work without it so far but there may be extra constraints needed to keep it like that.</p>
<p>In particular Hyperbus needs to gate the external differential clock, this needs special care using two DDR output cells to provide the correct phases at low skew. Hyperbus clock has a requirement to be gated, and from the + pair member's point of view, low when idle. This requires additional clock sync management because the output clock for the DDR clock IO cell is 90 degrees retarded compared to the logic clock. It can be made to work right but it's a struggle. Here is what a 1 wait block / 3 waitstate / 5 word burst looks like for the + and - clock pair members.</p>
<p><img src="https://warmcat.com/ddr-pclk.png" alt="https://warmcat.com/ddr-pclk.png">
<img src="https://warmcat.com/ddr-nclk.png" alt="https://warmcat.com/ddr-nclk.png"></p>
<h3 id="ddr-aware-io-cells">DDR-aware IO cells</h3>
<p>So what goes on in the DDR-aware IO cells?</p>
<p><img src="https://warmcat.com/sb_io.png" alt="https://warmcat.com/sb_io.png"></p>
<p>The cells provide four flipflops each, two for read and two for write. For DDR output you actually provide it two bits at half the DDR rate: the cell itself deals with sampling the input data and driving the FF to the output on alternate phases. Because the IO cell FFs are the last thing in the signal chain, and they are clocked by global clock lines, the result has very low skew over the whole chip. As we saw though, the DDR output FF clock must be 90 degrees behind the external clock used to sample the data.</p>
<p>Similarly for input, two input bits are provided and each latch the data state on the alternate edge of the input global clock, so it generates two bits at the rate of the global clock, from one pin.</p>
<p>These features are aimed at bringing the global clock right to the very end of the signal chain at the pin / ball, so there is an absolute minimum of skew. This was always the focus of placing and routing for FPGAs, but with DDR it becomes an issue even inside the clock phases, dealing with it like this in the IO cell is necessary for reliable operation at high speeds.</p>
<p>Registering pieces implies pipelining, that you must arrange the data changes a clock early and the IO cell updates on the next clock. That requires some care to understand when things changing in the VHDL actually take effect outside the chip.</p>
<h2 id="tip-2-pieces-needed-for-ddr-operation-on-hyperram">TIP 2: Pieces needed for DDR operation on Hyperram</h2>
<ul>
<li><p>Generate and output tightly skew-controlled differential clock, allowing for gating cleanly without runts</p></li>
<li><p>Quadrature clock generation (using the PLL)</p></li>
<li><p>Use DDR FFs built into IO cells for input and output</p></li>
<li><p>Clock other critical signals like CS also in its IO cell to control intra-FPGA skew</p></li>
<li><p>Distribute output clocks between 0 and 90 degree clock sources</p></li>
</ul>
<h2 id="tip-3-differential-outputs-on-ice5">TIP 3: Differential outputs on ICE5</h2>
<p>ICE5 can issue differential output clocks, it uses a nice trick using the global clock routing bringing the clock to the IO cells, which may be anywhere in the bank, and setting the cell to DDR mode. The two DDR output phases are set according to the desired output inversion, per pin. The results are nicely locked in phase</p>
<p><img src="https://warmcat.com/diff-clock.png" alt="https://warmcat.com/diff-clock.png"></p>
<p>That's quite a smart trick, since the global clocks have low-skew routing to all the IO cells already, and it gets away from making special fixed pin relationships for the differential pair.</p>
<h2 id="tip-4-quadrature-clocks-on-ice5-pll">TIP 4: Quadrature clocks on ICE5 PLL</h2>
<p>Clocks</p>
<ul>
<li><p>"in quadrature"</p></li>
<li><p>+90 degrees in phase</p></li>
<li><p>"center aligned" (the term used in hyperbus)</p></li>
</ul>
<p>mean that there are two clocks of one frequency, locked in phase, and that one lags the other by 1/4 cycle.</p>
<p>Assuming a square wave of equal proportions for both parts, that means that one clock's edges will occur in the middle of each of the other clock's stable parts. This is very useful for changing data as far away as possible from when the receiver will sample it.</p>
<p>This +90 degree phase pair relationship is found in many areas of electronics, especially RF, but also things like optical shaft encoders.</p>
<h3 id="quadrature-clock-generation-via-pll">Quadrature clock generation via PLL</h3>
<p>DDR clocks have a special requirement if they will operate efficiently. Since they pass data on both edges of the external clock, it's not enough to simply have the clock with those edges: they also require a locked quadrature (90 degree phase offset) clock to trigger the updates and captures in the IO cell in the middle of each external clock phase. This is not something you have typically lying around and it's hard to reliably generate it out of nothing; you'd have to halve or quarter your actual clock and do it by hand.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">DDR TX Data =X===X===X==
DDR TX Clock (internal) _/^^^\___/^^
DDR TX clock (external) ___/^^^\___/
</code></pre></div>
<p>Single data-rate clocks can readily provide a second set of clock edges at a fixed 180 degreee phase offset, simply by using upgoing edges to update data out and downgoing edges to sample it at the receiver, or vice versa.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">SDR TX Data =X=====X=====X===
SDR TX Clock (int + ext) _/^^\__/^^\__/^^\
</code></pre></div>
<p>Then the send and receive actions are locked in phase, the setup and hold time is consistent and data integrity can be assured. But DDR uses both 0 and 180 degree edges in the name of throughput, so this trick is not available and the 90 degree clock becomes necessary.</p>
<p>The usual way to solve this is pass the clock through a PLL, and when it is resynthesized, also output a quadrature (+90 degree phase) version of the same clock. The 90 degree clock will have edges in the centre of the 0 degree clock phases. Even though it's a relatively low-end FPGA, the ICE5 PLL IP can generate quadrature clocks, so you can just feed it your normal clock, set it for x1 operation with 0 and 90 degree outputs. (The PLL wizard crashes, but it dumps the register settings first so you can fill them in on a VHDL component copied from the internal "vital" VHDL file mentioned above.)</p>
<p><a href="http://www.latticesemi.com/%7E/media/LatticeSemi/Documents/ApplicationNotes/IK/iCE40sysCLOCKPLLDesignandUsageGuide.pdf?document_id=47778">http://www.latticesemi.com/~/media/LatticeSemi/Documents/ApplicationNotes/IK/iCE40sysCLOCKPLLDesignandUsageGuide.pdf?document_id=47778</a></p>
<p>The IOs that the PLL bind to have various restrictions related to what kind of other signals may be using the IO cell with the PLL and nearby, and the restrictions differ with the select PLL mode. That forced several hours of experimentation with the PCF since it was unexpected... in the end it was possible to find a way to enable the PLL with only three IOs having to be rearranged / reworked. (That's quite annoying since if the Lattice "breakout board" had not broken its FPGA PLL, I would have tested this beforehand and avoided difficult surgery.)</p>
<p>At any rate write the VHDL first and make sure the placer has no "nonfatal fatal warnings" announcing it's ignoring your pin constraints and continuing to place the design (!) There are many constraints about clock inversion and neighbouring IO cell usage that cost little if the placer has a free hand to pack the IO.</p>
<p><img src="https://warmcat.com/ddr-data-quad-sample.png" alt="https://warmcat.com/ddr-data-quad-sample.png"></p>
<p>In this picture of hyperbus operating at 64MHz / 128MHz DDR, the yellow trace is nCS to the HyperRam, the white trace is the noninverted differential side of the 90 degrees phase offset PLL clock, and the blue trace RD7 during a 5-word burst.</p>
<p>The 0 degrees phase offset PLL clock (not shown, but 1/4 phase earlier than the white clock) is used to generate and update the blue data; the 90 degree (white) clock tells the receiver when to sample the data.</p>
<p>You can see how the edges of the blue data are offset by 1/4 clock phase so the clock occurs in the middle of a bit.</p>
<p>Since it's DDR, each edge of the white clock represents one bit, so that data going out on RD7 for the 5 words / 10 bytes burst is 1111110101.</p>
<h3 id="hyperram-eye-diagram">Hyperram eye diagram</h3>
<p>Hyperbus doesn't seem to specify any particular eye diagram, I guess the reason is that the bus mandates that it should be Hi-Z for periods during the transaction, which make it impossible to provide standard time / voltage pass-fail mask regions since the Hi-Z periods wander through it (pullups are enabled to stop the bus floating for long).</p>
<p>Here is my eye diagram for RD7 with the bus working heavily at 64MHz single-rate clock (128MHz / 128MBytes/sec DDR rate), using the ICE5 IO pieces discussed above.</p>
<p><img src="https://warmcat.com/ddr-eye.png" alt="https://warmcat.com/ddr-eye.png"></p>
<p>The two lines above the bottom section are related to the tristate periods, due to different sized bursts sometimes the scope triggers and at the bit we're measuring at +200ns, the transaction has already completed and the bus is Hi-Z being pulled up gently, so these can be ignored. This is on a two-layer PTH prototype with the HR running at 1.8V via an LDO.... the eye diagram is extremely clean considering the clock came from a ring osc and went through a PLL.</p>
<h2 id="tip-5-bus-protocol-optimization">TIP 5 : Bus protocol optimization</h2>
<p>There are several ways you can squeeze bandwidth out of Hyperbus compared to the default.</p>
<p>You must get your basic bus read and write functionality working before you can do these optimizations, including for zero-wait writes.</p>
<h3 id="optimization-1-wait-states">Optimization 1: Wait states</h3>
<p>The Tacc Hyperram parameter is specified in absolute time, 40ns for read, 36ns for write for my device. And the chip defaults to expecting a count of wait states that matches its fastest clock, which comes out to 6.</p>
<p>If you're not operating it at near its fastest specified clock rate (166MHz for my chip, when I am constrained to 64MHz by the FPGA), you can inform the Hyperram to use a smaller number of your slow clocks that meets Tacc. In my case, it only needs 3, the smallest settable number. So we save 3 or 6 clocks depending on if the chip asked for double wait or not.</p>
<p>You can set this in b7..b4 of configuration register 0.</p>
<h3 id="optimization-2-allow-single-waits">Optimization 2: Allow single waits</h3>
<p>By default, Hyperbus comes up always telling you that it requires two sets of wait state waits for each transaction, that's 12 clocks by default. And these are SDR clocks, it's 24 DDR clock edges.</p>
<p>This allows you to make a simple BIU that doesn't have to take care about negotiating it inside the transaction. However these waits are quite expensive in latency, you should instead monitor RWDS and dynamically figure out how many wait clocks are needed. As often as the peer chip will allow you (which is very often) you can lose a set of waits that way, somewhere between 3-6 clocks on most transactions.</p>
<p>You enable this by clearing b3 on configuration register 0.</p>
<h3 id="optimization-3-crank-up-the-clock">Optimization 3: Crank up the clock</h3>
<p>Since you are forced to use a PLL to generate the quadrature clocks as we discuss later, might as well change up the clock ratio to a bit below whatever the FPGA timing will allow.</p>
<p>My base clock is 48MHz, but I can convert it at 4:3 using the PLL and end up with 64MHz; the FPGA timing has slack up to 70MHz. This can probably be pushed higher with more pipelining, but I have more bandwidth than I need already.</p>
<h2 id="tip-6-hyperbus-nreset">TIP 6 : Hyperbus nReset</h2>
<p>Hyperbus mandates a Power-on reset, so I decided we didn't need to deal with the nReset bus pin and tied it to 1.8V under the BGA. This was a little bit brave, because it turns out if you do resets without a powercycle which is common during development, state is held across sessions in the hyperram configuration registers.</p>
<p>For example if you optimize the waitstates to 3, when you reset you now have two possible numbers set up in the waitstate register: 6 from POR or 3 from warm reset. Although configuration space <strong>writes</strong> are specified to operate with no wait states at all times and so work unconditionally, <strong>reads</strong> use the chip's currently configured number of waitstates, so reading even the ID registers is not possible unless your side is set to use the expected number of waitstates at the hyperram.</p>
<p>After some experimentation it turns out you can loop through the possible wait states of 3, 4, 5, and 6 checking if you get a reasonable result from the ID register. After that you can use the "detected" current wait state number to confirm the ID registers and force the wait optimizations into the configuration. So it seems you can do without nReset.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>If you do the digging to uncover the undocumented VHDL prototypes, there are enough pieces there to get a proper DDR bus interface inside the FPGA in VHDL, and operate Hyperram at 64/128MHz DDR reliably</p></li>
<li><p>Again the quality of the actual FPGA, signal quality, PLL lock and jitter behaviour, is very good</p></li>
</ul>
In Praise of Kicad2016-09-08T00:00:00+08:00https://warmcat.com/2016/09/08/in-praise-of-kicad<h2 id="schematics-and-pcb-design-in-the-dawn-of-time">Schematics and PCB design In the Dawn Of Time</h2>
<p><img src="https://warmcat.com/pcb-transfers.jpg" alt="Original from https://static.rapidonline.com/catalogueimages/Module/M061299P01WL.jpg"></p>
<p>My first commercial PCB designs in the 80s were physically laid out on acetate using a lightbox, letraset and various thicknesses of red sticky tapes, at 2:1 physical size. They were then photographed, reducing them to the correct size and the negative used to create the boards via UV photoresist. This was the normal way of designing PCBs at the time. Any problems with consistency of your drawing or flaked-off letraset chunks (easily created by flexing the substrate) directly traslated to open tracks in the finished product. Doing it at a larger size and reducing it was aimed at reducing most breaks in the tracking areas to below what the photolithography could resolve.</p>
<p>This was obviously quite challenging... it did have a kind of arty, crossword-puzzle aspect to it where you would sit crouched over the lightbox for hours like a rat solving a maze, but overall it was just miserable labour requiring a lot of obsessive planning, care and luck. The substrates were typically an unwieldy A3 size since you worked at 2:1 or even 4:1, so you would jump in your car and take it personally to the company that optically processed it before passing it to the PCB house.</p>
<p>Naturally then with the rise of reliable, inexpensive PCs there was a huge market desperate to come out of the caves and use Ctrl-Z and gerber on their nightmares.</p>
<p>These packages were not priced reasonably though, the first one I bought was Orcad</p>
<p><img src="https://warmcat.com/dos-orcad.jpg" alt="Original from http://relaysbc.sourceforge.net/laptop.jpg"></p>
<p>which at that time was a headless DOS app that drew directly into 16-colour CGA. Still, it was miraculous coming from manual drawing, it could take your logical schematic and maintain an unbroken relationship from that to the gerbers. After that came Cadstar, also at that time a DOS app as I recall, which was even more expensive and not that good, then the first commercial shove router whose name I forget (but not its price, it was GBP11000), and finally Protel, which I stuck with for a decade.</p>
<p>The shove router was interesting, many PCB designers even now retain the ethos of the "one-shot" layout from the acetate and scalpel days. However using the shove router was so fast, you could treat it like a "compiler" in contrast to hand routing being "writing in assembly". It was possible to try several layout variations quickly and accept that some of the routing was sub-optimal, trading it off against the ability to find routable radical placement solutions that nobody would otherwise try.</p>
<h2 id="market-in-amber">Market in Amber</h2>
<p>The market for these expensive vertical CAD suites has barely changed since I first started throwing money at it with Orcad. There are high end ones like Cadence that are too expensive for normal mortals, midrange ones like Protel which operate using a vampire marketing strategy, you buy the software but you only get a year of updates, after that you must pay again some considerable fraction of the original cost per year to get updates.</p>
<p>If all you do is make PCBs, this can be sensible, but if you make a few PCBs a year, this is a really big burden. And paying that money feels like an extortion. All of these "affordable" platforms require windows, which is another form of extortion... in my life otherwise windows has been gone for 15 years.</p>
<p>In other markets that used to be like this, FOSS has wiped out the practices and companies with other business models like Google have turning the "precious" into a freebie to bring people into their system. (Eagle CAD had this kind of strategy, basically free CAD software tied to a PCB fabrication house, but it did not make a big dent in it).</p>
<h2 id="kicad">Kicad</h2>
<p>The first time I looked at FOSS schematics + PCB software a few years ago gEDA was the thing <a href="http://wiki.geda-project.org/geda:screenshots">http://wiki.geda-project.org/geda:screenshots</a></p>
<p>But at that time it wasn't very usable, and I turned back to Protel, running in vmware (a third form of extortion requiring constant 'subscription' / repurchasing to keep it working on Linux; and now it meets a huge problem with Fedora stock kernels not accepting modules that are not signed with a key created when the kernel was built).</p>
<p>However this time, I looked at Kicad. Basically this has crossed the threshold into a CAD system usable in the real world. It has one big problem currently on my Fedora system, the autorouter doesn't work at all. But to balance that it has manual shove routing which is very good.</p>
<h2 id="kicad-rough-edges">Kicad rough edges</h2>
<ul>
<li><p>creating new libraries is a nightmare, I had to manually copy an old library to get started. Google is full of complaints about it. But this is a very narrow problem, once you get past it all is well.</p></li>
<li><p>Router integration is screwed on Fedora 24, it cannot open the exported netlist and gives no errors about it. For small boards, manual routing is not hard, but, you know...</p></li>
<li><p>I found on some of my created SMT components, the paste mask layer was cut wrong on my metal stencil, the gap around the pads is excessive. But when I looked at the pads to see what I had done wrong, I couldn't see anything representing the larger region. This was a small problem for a prototype but I need to understand it before doing any production.</p></li>
<li><p>Scrollwheel vs zoom (Ctrl modifier) is inverted compared to most software like inkscape. I can understand why, since when you get used to it zoom as navigation is very efficient and effective. But it'd be good to have a preference to invert Ctrl meaning to match other software.</p></li>
</ul>
<h2 id="kicad-brilliance">Kicad brilliance</h2>
<ul>
<li><p>VERY stable</p></li>
<li><p>VERY fast</p></li>
<li><p>manual routing and also the schematic layout allows you to enter "cyborg trance mode" where the software is not getting in the way of your design activities at all, you brain is not interrupted trying to figure out how to do.</p></li>
<li><p>copper pours are immediately updated</p></li>
<li><p>3D stuff just works, renders just like a real board</p></li>
<li><p>print and plot / drill file generation "just works"</p></li>
</ul>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Kicad is really more than "good enough" for most tasks, from start to finish. This is serious firepower for real work.</p></li>
<li><p>Routing integration is broken for it in Fedora</p></li>
<li><p>It's not perfect, but most people can stop thinking about paying for schematics + PCB CAD now. Protel is in the bin.</p></li>
</ul>
Lattice's Unintended Darwinism in Tech Support2016-09-07T00:00:00+08:00https://warmcat.com/2016/09/07/negative-results-of-darwinism-in-support-triage<h2 id="roadblocks-using-vhdl-on-ice5-with-lattice-tools">Roadblocks using VHDL on ICE5 with Lattice Tools</h2>
<p>VHDL itself is completely generic, but the target FPGAs often have highly specific special functionality. For example you may use the multiply operator in an expression, and the compiler may produce a discrete logic multiplier; but the FPGA itself may have much more efficient hardware multiplier instances. Generic VHDL internally has no way to know or choose about that, or even between the many different kinds of discrete multiplier design that it could generate with different tradeoffs.</p>
<p>To get around it, the vendor provides canned component names + generics / ports representing IPs that are hard macros on the FPGA, to make the compiler wire up what you want, you explicitly instantiate these "magic name" components. (That gets you sometimes a very very long way away from VHDL's "write once synthesize anywhere" promise as more and more platform-specific FPGA features get dragged in)</p>
<p>For that, you need the equivalent of C's function prototype, giving the correct name of the canned macro and the correct argument names and meanings.</p>
<h2 id="lattice-and-vhdl">Lattice and VHDL</h2>
<p>I already have some of these canned macros in my design, but for many in the canonical library documentation for ICE5</p>
<p><a href="http://www.latticesemi.com/%7E/media/LatticeSemi/Documents/TechnicalBriefs/SBTICETechnologyLibrary201504.pdf">http://www.latticesemi.com/~/media/LatticeSemi/Documents/TechnicalBriefs/SBTICETechnologyLibrary201504.pdf</a></p>
<p>There are no details for VHDL, only Verilog. You can see from the filename the latest version of those docs are from April 2015, ie, 1.5 years ago.</p>
<p>For example for SB_IO, the IO cell, on p71 it gives Verilog only and omits VHDL.</p>
<p>The only other pdf reference I could find to IO cell instantiation was TN1253</p>
<p><a href="http://www.latticesemi.com/%7E/media/LatticeSemi/Documents/ApplicationNotes/UZ/UsingDifferentialIOLVDSSubLVDSiniCE40Devices.pdf?document_id=47960">http://www.latticesemi.com/~/media/LatticeSemi/Documents/ApplicationNotes/UZ/UsingDifferentialIOLVDSSubLVDSiniCE40Devices.pdf?document_id=47960</a></p>
<p>about differential IO, again in there just Verilog, for VHDL it says</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">VHDL
Under development.
</code></pre></div>
<p><img src="https://warmcat.com/vhdl-under-dev.png" alt="https://warmcat.com/vhdl-under-dev.png"></p>
<p>This current document is 20 months old, from Jan 2015.</p>
<p>Despite how it looks, VHDL is otherwise a first-class citizen with Lattice, and I was able to find other components documented and use them with VHDL. And my tools from Lattice are more modern than the docs, from Feb 2016 which is encouraging.</p>
<p>At the bottom of it, it says</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Technical Support Assistance
e-mail:
techsupport@latticesemi.com
Internet: www.latticesemi.com
</code></pre></div>
<h2 id="glimpse-of-a-frozen-hell">Glimpse of a Frozen Hell</h2>
<p>I sent them an email referring the documents and explaining the problem, and asking for the . After a while I received the reply</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Thank you for your email. If you have sent the email for technical support, we request you to submit your request through our Technical Support Portal on the Lattice Semiconductor Website (www.latticesemi.com\techsupport). The Technical Support Portal allows users to enter their cases directly in the Lattice Technical Support tool, check status of their cases, update their cases, and view their old cases.
PLEASE NOTE THAT THIS MAILBOX IS NOT MONITORED SO YOUR QUERY WILL NOT BE ADDRESSED.
</code></pre></div>
<p>So the first sign all is not well was the old docs not being updated with the info in the first place.</p>
<p>The second sign is their current docs provide a support email alias they don't support any more (and have not updated their docs).</p>
<p>I rewrote the question in their web-based system, which says this as the first thing</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">
Please be advised that highest priority for Lattice Technical Support will be given to:
1. Software which is not working as expected or suspected bugs.
-We recommend that customers use the latest software tools and follow the system requirements specified in the tool’s Software User Guide.
-When submitting software issues you will be asked to provide the software version, operating system details, exact steps for how to recreate the issue, provide an example project to recreate the issue if possible, and send us the log file as applicable.
2. Questions and problems related to datasheet, timing, functionality, device compatibility, and general application support requests like schematic review, programming cable issue, etc.
3. Any modifications or changes to existing Reference Designs, IP, demos, and solutions beyond the guidelines specified in the User Guide are responsibility of the user. Lattice reserves the right to select the appropriate updates that apply to a generic solution.
</code></pre></div>
<p>This is where the Darwinism comes in... this list is a function describing the "fitness" of your question to be dealt with by Lattice. I think some kind of triage about support questions is normal, but there is no understanding in that list that the reason people are forced to ask questions may be because your documentation is inadequate and must be improved.</p>
<p>As such, Lattice have made questions that should provoke documentation updates extinct. That's why their docs are still incomplete and not useful for the VHDL case after getting on for two years.</p>
<p><strong>Every user who wants to use ICE5 with the IP assets on the FPGA without documentation for VHDL, faces the same problem, over and over.</strong> As such, multiplying the issue by the number of people affected and the number of support questions dealing with it properly would eliminate in future, correcting failures in Lattice documentation should be #1 on the list.</p>
<p>Looking at the headers, the messageid is "<a href="mailto:...@agiloft.com">...@agiloft.com</a>", that seems to be a customer support outsourcing company; the Lattice tools come from Synopsys and Silicon Blue, except "Ice Cube 2" which is a flow manager on top of the other tools, and very basic integrated editor (and no simulator if you're on Linux, only Windows... that makes zero sense). Basically it seems everything is outsourced with a little glue.</p>
<h2 id="work-around">Work around</h2>
<p>I found by digging around that you can find the component prototypes in the compiler tree here</p>
<p><code>./lscc/iCEcube2.2016.02/vhdl/sb_ice_syn_vital.vhd</code></p>
<p>The actual DDR SB_IO instantiation needs to look like this kind of structure, per IO</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">drd0008: SB_IO
generic map (
NEG_TRIGGER => '0',
PIN_TYPE => "100000",
PULLUP => '0',
IO_STANDARD => "SB_LVCMOS"
)
port map (
PACKAGE_PIN => RD0,
LATCH_INPUT_VALUE => '0',
CLOCK_ENABLE => '1',
INPUT_CLK => clk0deg,
OUTPUT_CLK => clk0deg,
OUTPUT_ENABLE => r_data_oe,
D_OUT_1 => r_data_write(0),
D_OUT_0 => r_data_write(8),
D_IN_1 => r_data_read(0),
D_IN_0 => r_data_read(8)
);
</code></pre></div>
<p><code>"PACKAGE_PIN"</code> is the signal with the toplevel port you want to bind the SB_IO definition to.</p>
<p>See my article "Hyperbus implementation tips" <a href="https://warmcat.com/embedded/hardware/lattice/hyperbus/hyperram/2016/09/19/hyperbus-implementation-tips-on-ice5.html">https://warmcat.com/embedded/hardware/lattice/hyperbus/hyperram/2016/09/19/hyperbus-implementation-tips-on-ice5.html</a> for other information about implementing hyperbus / hyperram on ICE5.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Lattice IP library docs always seem to contain Verilog component information, but don't always take care about VHDL component prototypes so the IPs can be used in VHDL. That includes critical IPs like IO cells.</p></li>
<li><p>Lattice support triage doesn't lead to the root cause - the documentation - being fixed, the same problem has been going for ~ two years, creating continuous difficulty for everyone using VHDL with ICE4/5 with Lattice tools.</p></li>
<li><p>You can find the necessary prototypes in the tools tree by digging it out yourself, and the info here will get it to compile and place and eventually work.</p></li>
<li><p>If you do the digging, there are enough pieces there to get a proper DDR bus interface inside the FPGA in VHDL, and operate Hyperram at 64/128MHz DDR reliably</p></li>
<li><p>Again the quality of the actual FPGA is very good</p></li>
</ul>
Hyperbus and Hyperram2016-09-02T00:00:00+08:00https://warmcat.com/2016/09/02/hyperram<p><img src="https://warmcat.com/hyperram2.png" alt="https://warmcat.com/hyperram2.png"></p>
<h2 id="ram-missing-link">RAM missing link</h2>
<p>The various types of RAM commonly available have been in three basic types for decades now,</p>
<ul>
<li>tiny SPI RAM</li>
<li>Async SRAM needing many pins (upwards of 30)</li>
<li>SDRAM / DDR for much more dense storage but with a big hike in interface complexity</li>
</ul>
<p>It's certainly possible to interface DDR to an FPGA, but it has implications for the cost of the FPGA that has the necessary IO standards to do it at speed, and the realestate needed</p>
<p>Well now there's a brand new RAM in town using Spansion's hyperbus, called "hyperram", that fills a big gap between SRAM and DDR.</p>
<p><img src="https://warmcat.com/hyperram1.png" alt="https://warmcat.com/hyperram1.png"></p>
<p>Hyperbus detailed description:</p>
<p><a href="http://www.cypress.com/file/213356/download">http://www.cypress.com/file/213356/download</a></p>
<p>Today since it's hot off the fab, only the 64Mbit (8MByte, or more accurately 4Mx16) and 128Mbit variants are in production. However 256Mbit is coming soon in the same footprint.</p>
<p>The footprints are defined in the licensed standard, so the vendor chips are interchangeable. And because the address is passed in sequentially using the data bus, the footprint is immutable against changes in storage array size. There are no extra pins for larger addresses like Async SRAM.</p>
<p>This new 12-signal bus has been widely licensed and many manufacturers are coming out with PSRAM ("hyperram") and Flash chips (yes... "hyperflash" amazingly enough) using it.</p>
<h2 id="trend-bucking-bus">Trend-bucking bus</h2>
<p>The trend in digital busses for a while now has been to serialize them with some kind of LVDS: USB, PCI express, and the next gen SD cards all use this technique for example. (As did the now defunct Ara).</p>
<p>However hyperbus rejects this, and blends DDR - the data rate is twice the clock - with a parallel, 8-bit interface, without differential signalling (except for the clock). In fact the signalling on the bus itself is oldstyle NRZ signalling at the ancient 3.3V and 1.8V standards; there is separate silicon for each voltage standard.</p>
<table><thead>
<tr>
<th>Hyperram voltage</th>
<th>Clock format</th>
<th>Max clock rate</th>
<th>Throughput</th>
</tr>
</thead><tbody>
<tr>
<td>1.8V</td>
<td>Differential</td>
<td>166MHz</td>
<td>333MBytes/s</td>
</tr>
<tr>
<td>3.3V</td>
<td>Single-ended</td>
<td>100MHz</td>
<td>200MBytes/s</td>
</tr>
</tbody></table>
<p>The bus is much simplified compared to LVDS or DDR DRAM due to the relatively low max clock rate, there is no training / retraining and the need for matching bytelane routing length is correspondingly relaxed, in fact if you don't intend to clock it at near the max rate, it's very relaxed. The data bus drive strength is also configurable by internal registers, for EMI and signal quality control.</p>
<p>The fact it uses traditional IO standards, if you accept DDR clocking in that category (ICE40 IO cells support it natively), also means very low-cost FPGAs can talk to it, even CPLDs if the clock rate is low. That's a brand new capability introduced with hyperram, low-cost FPGA mated with low cost high density memory.</p>
<h2 id="pseudo-static-array">Pseudo-static array</h2>
<p>Underneath the shiny new bus, Hyperram is an old memory technology known as "Pseudostatic" RAM, it's actually DRAM but with an interface that hides the periodic refresh activity that the DRAM array requires internally.</p>
<p>That gets you into DDR DRAM level of density and cost, without the host controller having to take any care about refresh details.</p>
<p>However the refresh still goes on internally and blocks access externally for the duration: the interface adds dynamic wait states when the RAM array needs some "me time". So the host controller doesn't entirely escape having to deal with it adaptively: however hyperram lets you set a config register to select to always insert the refresh waitstate whether it's refreshing or not, to trade off determinism and host controller complexity against latency. If you take that route and don't care about the wasted extra clocks you really have escaped all sign it's DRAM under the covers, and the latencies are completely deterministic like SRAM.</p>
<p>The underlying DRAM reality and its row / column architecture still need consideration with hyperram, it's reflected in the address map used by the chip and the row and column sizes for a chip can be read out from configuration registers in the chip to help with that.</p>
<h2 id="bus-signals">Bus signals</h2>
<table><thead>
<tr>
<th>Signal</th>
<th>Function</th>
</tr>
</thead><tbody>
<tr>
<td>CK / CK#</td>
<td>Bus clock (both edges significant)</td>
</tr>
<tr>
<td>CS#</td>
<td>Active low chip select</td>
</tr>
<tr>
<td>RWDS</td>
<td>1) Wait state signal from RAM to hold off transaction, 2) Bytewise write enable</td>
</tr>
<tr>
<td>D7..0</td>
<td>Data Bus, also used for issuing address and cycle type information</td>
</tr>
</tbody></table>
<h2 id="alignments">Alignments</h2>
<p>It's important to notice that although the physical external data bus is 8 bits wide...</p>
<ul>
<li><p>with DDR there are naturally two transfers on that 8-bit bus per clock, ie, 16-bits</p></li>
<li><p>the addressable unit of the device is a 16-bit word. Ie address 0 points to one 16-bit word and address 1 points to the next 16-bit word.</p></li>
</ul>
<p>Therefore the natural unit for addressability and for transfer is 16-bits.</p>
<h2 id="address-map">Address map</h2>
<p>Each transaction begins with the master writing 48 bits (it's DDR, so those 6 bytes transfer in 3 clocks) that defines the transaction type and address information.</p>
<p>One of the nice things about hyperbus is even after the largest device on the initial roadmaps comes out (32MByte), there are still 19 spare bits in this packet, meaning it can address 128TB per chip (!) without needing extra pins or protocol change. So in the next years, we can expect to see GB hyperram chips in compatible packages and interfaces.</p>
<p>The addressing scheme is a bit convoluted... the underlying row and column addresses are separated in the map and some top bits are reserved for the type of transaction.</p>
<div><table><tr><td><b>Bit</b></td><td><b>4Mx16</b></td><td><b>8Mx16</b></td><td><b>16Mx16</b></td></tr><tr><td>47</td><td colspan=3>0 = Write, 1 = Read</td></tr><tr><td>46</td><td colspan=3>0 = memory, 1 = configuration registers</td></tr><tr><td>45</td><td colspan=3>0 = wrapped burst, 1 = linear</td></tr><tr><td>44 - 37</td><td colspan=3>Reserved</td></tr><tr><td>36</td><td colspan=2>Reserved</td><td>Row A23</td></tr><tr><td>35</td><td colspan=1>Reserved</td><td colspan=2>Row A22</td></tr><tr><td>34 - 22</td><td colspan=3>Row A21 - A9</td></tr><tr><td>21 - 16</td><td colspan=3>Upper Col A8 - A3</td></tr><tr><td>15 - 3</td><td colspan=3>Reserved</td></tr><tr><td>2 - 0</td><td colspan=3>Lower Col A2 - A0</td></tr></table></div>
<p>Again, notice the address bus addresses 16-bit words.</p>
<h2 id="what-came-from-where">What came from where</h2>
<p>Hyperbus blends a lot of existing technologies to make something new.</p>
<table><thead>
<tr>
<th>Feature</th>
<th>Original RAM using it</th>
</tr>
</thead><tbody>
<tr>
<td>3.3V / 1.8V simple IO standard</td>
<td>ASYNC SRAM</td>
</tr>
<tr>
<td>DDR "both edge" clocking</td>
<td>DDR DRAM</td>
</tr>
<tr>
<td>Config registers on die</td>
<td>SDRAM</td>
</tr>
<tr>
<td>8-bit bus</td>
<td>ASYNC SRAM / DDR DRAM</td>
</tr>
<tr>
<td>Hidden refresh</td>
<td>Pseudostatic SRAM</td>
</tr>
<tr>
<td>High speed differential clock</td>
<td>DDR DRAM</td>
</tr>
<tr>
<td>Burst mode</td>
<td>SDRAM</td>
</tr>
<tr>
<td>Bytewise write masking</td>
<td>SDRAM</td>
</tr>
<tr>
<td>Provide address via data bus</td>
<td>SPI RAM</td>
</tr>
</tbody></table>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Hyperram gives a new way to marry cheap FPGAs with dense memory</p></li>
<li><p>different vendor chips should be interchangeable due to specified footprints</p></li>
<li><p>64Mbit is available now, 128Mbit and 256Mbit in same footprint with same interface soon. And there is plenty of room for expansion.</p></li>
<li><p>It's really pseudostatic SRAM (ie, DRAM) with hidden refresh: can be completely hidden if you are OK burning some cycles</p></li>
<li><p>1.8V and 3.3V silicon with 1.8V having differential clocks and 333MB/sec performance</p></li>
<li><p>12-pin interface with 8-bit bus, DDR clocking but supports simple old IO standards otherwise</p></li>
<li><p>Addressable unit is a 16-bit word (1 clock / 2 edges to transfer on the 8-bit physical bus)</p></li>
<li><p>48-bit packet defines the bus transaction and attributes, followed by the transaction data until CS# deasserted</p></li>
<li><p>Much simpler to interface to than DDR DRAM</p></li>
</ul>
ST7735 TFT LCD Goodness2016-08-26T00:00:00+08:00https://warmcat.com/2016/08/26/st7735-tdt-lcd-goodness<h2 id="the-quot-widely-known-st7735-lcd-quot">The "widely known ST7735 LCD"</h2>
<p><img src="https://warmcat.com/20160826_193244_002.jpg" alt="https://warmcat.com/20160826_193244_002.jpg"></p>
<p>Here in Taiwan there are many indiginous LCD and OLED panel manufacturers, but none of them sell something casually competitive with a relatively
ancient (2011) Mainland Chinese panel. It goes by "HTF0177SN-01" from Huanan Electronic Technology Co in Guangdong.</p>
<p><a href="https://www.arduino.cc/documents/datasheets/A000096_Datasheet_HTF0177SN-01-SPEC.pdf">https://www.arduino.cc/documents/datasheets/A000096_Datasheet_HTF0177SN-01-SPEC.pdf</a></p>
<p>It has 18-bit colour, 128 x 160, and SPI, and its power consumption and 1.77" size is right in the sweet spot.</p>
<p>These have been used for years by Arduino types, and indeed even now you can go to Guanghua, Taipei's electronic otaku district, and buy one off the shelf in Arduino clothing... although it's on an unfortunate crutch board to hide the Arduino's shameful 5V heritage (the panel is natively 3.3V), and unfortunately the crutch board breaks panel readback. The FPC on the module solders direct to the PCB, and it's not hard to remove it and use the panel directly.</p>
<p>Adafruit will sell you one, and you can find them for sale at keen prices with or without the Arduino adaptation junk on Chinese ebay type sites.</p>
<p>The display quality is very good (much better than the camera captures), <del>although it has quite narrow viewing angle constraints, outside of 20 degrees or so it shows mostly white, without colour or generally inverted</del>.</p>
<h2 id="adafruit-init-code">Adafruit init code</h2>
<p>The canonical init code is from Adafruit</p>
<p><a href="https://github.com/adafruit/Adafruit-ST7735-Library/blob/master/Adafruit_ST7735.cpp">https://github.com/adafruit/Adafruit-ST7735-Library/blob/master/Adafruit_ST7735.cpp</a></p>
<p>however this is pretty abandoned (last real update a year ago, 7 issues + 9 PR ignored) and tied to Arduino.</p>
<p>The various versions of the panel over the years can supposedly be discriminated between by the colour of the tab on the cover of the polarizer sheet: red, green, or black. In the Adafruit sources, some are associated with different chip versions:</p>
<table><thead>
<tr>
<th>Variant</th>
<th>Vintage</th>
<th>Datasheet Link</th>
</tr>
</thead><tbody>
<tr>
<td>ST7735</td>
<td>2008</td>
<td><a href="http://www.displayfuture.com/Display/datasheet/controller/ST7735.pdf">http://www.displayfuture.com/Display/datasheet/controller/ST7735.pdf</a></td>
</tr>
<tr>
<td>ST7735R</td>
<td>2009</td>
<td><a href="https://cdn-shop.adafruit.com/datasheets/ST7735R_V0.2.pdf">https://cdn-shop.adafruit.com/datasheets/ST7735R_V0.2.pdf</a></td>
</tr>
</tbody></table>
<p>The only difference I noticed is PWCTR1 has 3 arguments on -R and 2 without.</p>
<p>So your panel</p>
<ul>
<li><p>has one of two chip variants on it</p></li>
<li><p>has chosen how to configure various option pins on the chip by the wiring on the FPC. These we can call 'integration choices'.</p></li>
</ul>
<h2 id="integration-choice-1">Integration choice #1</h2>
<p>If you google around, you will read how the panel supports both 3-wire and 4-wire SPI interfaces; the controller chip does (and 8-bit 8080/6800 style async memory interfaces too) but this is not true of the panel you can buy with the controller chip integrated already..</p>
<p>In fact the controller chip gets various config pins tied when he's mounted on his FPC, by the decision of the module vendor, and that controls if he will operate in 3-wire or 4-wire mode.</p>
<p>(What does it mean, "3-wire" and "4-wire", anyway? Basically there is an extra bit required that indicates if the serial byte you're sending is a command (0) or argument data (1).</p>
<p>You could deliver that bit as a 9th (first) serial bit on every SPI data, that's known as "3-wire" then. Or you could just send 8 serial bits as usual and drive a separate pin, known as RS, A0 or C/D, to inform the panel about the meaning of the 8 serial bits you were sending, which is "4-wire".)</p>
<p>Since the wiring of the panel FPC decides it, it's not that there is still a choice, <strong>the panel FPC specifically configures the controller chip to use 4-wire mode, so that's all you can use, 8-bit serial data and an extra pin to indicate if it's command or argument data.</strong></p>
<h2 id="integration-choice-2">Integration choice #2</h2>
<p>Another choice that got made by the panel manufacturer is that there is no separate MISO available at the user FPC connector, ie, no discrete data path back from the panel to the master, even though the controller chip provides it.</p>
<p>Instead there is only one serial data pin on the FPC confusingly called "SDA". (Since "SCK" is next to it, the unwary would think they're looking at I2C, not SPI, since these are I2C naming conventions).</p>
<p>In fact both MISO + MOSI at the controller chip are both tied to "SDA": this is legal because the controller defines that only one or the other is allowed to be driven at one time; with this chip you are either shifting a byte in or shifting a byte out, never both at the same time.</p>
<p>The nasty Arduino shield it came with fails to deal with this... it buffers MOSI through a non-3state-capable level-shifter, and does not provide a MISO from the LCD either; together it means the shield deliberately doesn't provide any way to read data back from the LCD. In turn that wilfully removes the ability to read back from the controller its configuration dynamically, which created this wishful thinking about panel tab colours being related in any reliable way to the chip configuration.</p>
<p>If you discard the shield and use the panel directly you can read back using SDA for MISO as well as MOSI.</p>
<h2 id="note-about-readback-commands">Note about readback commands</h2>
<p>However take note carefully, in the 4-wire mode we are forced to use, the commands RDDID and RDDST insert an explicit extra bus turnaround clock inbetween the end of the command and the data coming back. If you don't take care of it, you will find your responses from these are all shifted right one bit. Curiously other RD* commands do not add the turnaround bit and just work.</p>
<p>My "Green tab" panel reports RDDID numbers of <code>0x54, 0x80, 0x66</code> respectively... the non-R datasheet tells to expect the first one to be <code>0x5c</code>, the -R version datasheet just says -, maybe this indicates with -R or later version the panel factory programs it with their vendor code. At any rate most of the mystery about the canned init sequence application to a particular panel can be cleared away by querying the controller for its situation at runtime.</p>
<h2 id="integration-choice-3">Integration choice #3</h2>
<p>The nRESET signal on the FPC that holds the panel in reset has no pullup. So you must provide your own pullup, or drive it. Otherwise the panel just sits there in hard reset ignoring your efforts to talk to it.</p>
<p>The comment in the canonical init code that seems to claim nCS must be low during the reset is not true on my panels.</p>
<h2 id="integration-choice-4">Integration choice #4</h2>
<p>Another major configuration option on the controller chip is the layout of the panel memory vs the pixels on the panel, this is also decided when the panel FPC is wired in the factory.</p>
<p>The controller actually provides 132 x 162 x 18bpp RAM frame buffer internally: the pin-settable options relate to how the physical 128 x 160 pixels map to the slightly larger memory.</p>
<p>According to the canonical init code, the various tab colours of the panels have different mappings: the green tab is supposed to be offset by (2, 1) but I found that is not true for my panel. So the tab colour cannot be trusted to identify the way the panel configures the controller.</p>
<h2 id="integration-choice-5">Integration choice #5</h2>
<p>Finally although there are 2 x NC pins going spare on the 14-pin user FPC, the panel does not provide the controller chip's TE "Tearing Effect" signal, which is basically an indication of when the panel is performing VSYNC. That means you can't synchronize your logical update of the framebuffer with the framebuffer scanout, by wiring up what is basically a "VSYNC interrupt" to your master.</p>
<p><del>However, despite the ambiguous entry in the datasheet, it turns out after some experimentation that you can poll the dynamic vsync state using command 0xe, RDDSM. b7 of the first result byte is normally low, and set during vsync. That's a bit awkward, but with some effort it means synchronized updates are possible.</del> Edit: I was not able to reproduce this, b7 simply reflects the state of the TE signal enable bit set by TEON / TEOFF. It seems no way to probe dynamic vsync state without the physical TE signal.</p>
<h2 id="deviations-in-the-canonical-init-code">Deviations in the canonical init code</h2>
<p>The init code supposed to bring up a "Green tab" panel does not work on my Green tab panels.</p>
<p>The display comes out of reset with vertical stripes, it's the line shift register in the controller just repeated on every line, plus some scary regular black columns that look like dead column multiplexing drivers.</p>
<p><img src="https://warmcat.com/tft-bad.jpg" alt="https://warmcat.com/tft-bad.jpg"></p>
<p>But we don't let little things like that put us off...</p>
<h3 id="voodoo-delays">Voodoo delays</h3>
<p>The canonical init code also suffers from "voodoo delay inflation", the datasheet linked above for the controller specifies 120ms delays for the soft reset and SLEEPOUT commands. But the code</p>
<ul>
<li><p>has reset delays of 50ms (ie, too short...) for red tab panels, 150ms for green tab panels on soft reset, and 500ms for SLEEPOUT on both</p></li>
<li><p>does a hard reset on the panel by a GPIO first and wastes a further 1500ms toggling that, which has no reasoning</p></li>
<li><p>adds a random smattering of 10ms delays during the "red tab" init commands that are not justified by the datasheet (or needed on my panel)</p></li>
<li><p>adds 10ms and 100ms delays to NORON and DISPON: the NORON one has no requirement in the datasheet and the DISPON one just says you have to wait 120ms before doing a DISPOFF subsequently.</p></li>
</ul>
<p>All in all <strong>sweeping away the voodoo delays decreases the panel startup time from 2270ms to just 240ms</strong>.</p>
<p>When wandering around hopelessly for a while, reaching for voodoo isn't necessarily a bad idea to see if it affects anything. But once the problem started to shift, it's a good plan to go back and see which changes really affected it (often a bitter process when whole hours of investigation and changes are revealed as contributing nothing useful to the end solution...).</p>
<h3 id="weird-porch-padding">Weird porch padding</h3>
<p>Another difference between the red and green tables involved the video scanout timing on the panel... the red tab added a 0 per-frame delay and 3 and 6 pixel-time front and back porches.</p>
<p>But the green tab init changed that to 1 per-frame delay and 44 and 45 pixel-time front and back porches, that's very different: much slower.</p>
<p>On using the red table init for the timing register, for the first time the display did something different involving all the pixels, which was very encouraging. After more prodding and poking I discovered changing it to a per-frame delay of 10 additionally made the raster come out right, showing the video data in the panel framebuffer.</p>
<p>Investigating that further, per-frame delays of 0, 1 and 2 all gave different "broken lcd" displays, but numbers of 3 or above give a working raster.</p>
<p>EDIT: That is true, but the panel suffered from temporary slow "whiteout" type bleaching with the VSYNC number at 3. It caused the viewing angle to degenerate as well. This number is an offset on a fixed starting point of 20 lines for the VSYNC, on my panel 6 seems to be a good compromise.</p>
<h3 id="wrong-sram-offsets">Wrong SRAM offsets</h3>
<p>The init code implies that the green tab panels have offsets of 1 column and 2 rows into the backing store respectively. However on my green tab panel, the correct offset is zero for both rows and columns.</p>
<h2 id="lack-of-hardware-scroll-window-support">Lack of Hardware scroll window support</h2>
<p>In other vaguely similar controller variants there is a line indirection scrolling scheme built into the display controller</p>
<p><a href="http://www.displayfuture.com/Display/datasheet/controller/ST7687S.pdf">http://www.displayfuture.com/Display/datasheet/controller/ST7687S.pdf</a></p>
<p>however trying those registers on ST7735 didn't seem to do anything. So I think this is absent on the controller the panel actually has.</p>
<h2 id="what-did-we-learn-this-time">What did we learn this time?</h2>
<ul>
<li><p>Panels differ by which controller chip they have on them and the various controller chip integration configurations decided by the panel manufacturer</p></li>
<li><p>The Adafruit init code does not work on all panels</p></li>
<li><p>only delays mentioned in the datasheet need to be handled, the panel can be initialized in 240ms, not over 2s.</p></li>
<li><p>PWCTR1 differs between the two chip variants in how many arguments it takes and the argument bitfield layout</p></li>
<li><p>your panel may not follow the offset assumptions in the Adafruit code (they are option pins set by the panel FPC)</p></li>
<li><p>When trying to adapt the init code for your panel, be suspicious about FRMCTR1. Wrong numbers here (they can be set wrongly by the controller's reset defaults) get you what looks like a broken panel with vertical stripes</p></li>
<li><p>Controller MISO + MOSI are tied together and presented at the FPC as "SDA". It's possible to read back data from the panel.</p></li>
<li><p>The Adafruit Arduino shield version just gets in the way and defeats MISO appearing at SDA, it breaks reading from the panel... just bin it and use the panel directly</p></li>
<li><p>There are visually identical panel variations that configure the controller chip differently. The best way to identify them is read back RDDID</p></li>
<li><p>There is no external Tearing Effect (VSYNC) signal available~~, but you can read the dynamic state via an SPI command~~ Edit: that is not the case, no manufacturer I could find has the panel in both SPI and with TE available.</p></li>
<li><p><del>There's only a fairly narrow "6 O'Clock" viewing angle for correct visual results</del> EDIT: if you suffer from this, look at the VSYNC offset number on FRMCTR1. This needs to be above 3, 6 seems to give good results on my panel.</p></li>
<li><p>Inside the viewing angle though, the panel is pretty good once it's working. The moire in the pictures only appears in camera shots of the panel.</p></li>
</ul>
Getting started with ICE40 Ultra FPGAs2016-08-15T00:00:00+08:00https://warmcat.com/2016/08/15/ice40-ultra<h2 id="ice40-ultra-introduction">ICE40 Ultra Introduction</h2>
<p>In terms of price the cheapest way to get usable amounts of programmable logic is still the ancient cpld.</p>
<p>However the cheapest ones are very resource-starved and there is little room for tricks with their crossbar structure.</p>
<p>The cheapest FPGAs are now becoming competitive on price with the old CPLDs, this article discusses a new variant of Lattice's ICE40, ICE5 or "ICE40 Ultra".</p>
<p><img src="https://warmcat.com/ice5-dev.png" alt="https://warmcat.com/ice5-dev.png"></p>
<p>I decided to write an article about it because there are numerous puzzling things about their dev kit and even after spending a day on it, what I found should certainly help other victims. To be fair some of it is the usual problems about shipping Linux binaries but many of the issues are completely self-inflicted by Lattice.</p>
<h2 id="ice40-ultra-dev-board">ICE40 Ultra dev board</h2>
<p><a href="http://www.latticesemi.com/Products/DevelopmentBoardsAndKits/iCE40UltraBreakoutBoard.aspx">http://www.latticesemi.com/Products/DevelopmentBoardsAndKits/iCE40UltraBreakoutBoard.aspx</a></p>
<p>The dev board is nice and cheap at USD50 from Digikey. But there is quite a long list of confusions and puzzlements about it and its software.</p>
<h3 id="strange-choice-1">Strange Choice #1</h3>
<p>The dev board features the "4K" CLB ICE40 Ultra silicon, the largest version offered (actually, it's like 3.8K CLB but never mind).</p>
<p>However despite this being the largest chip in terms of internal resources, they provided it on the dev board only in a tiny 36-pin BGA variant. That means the chip on the official dev board has less FPGA IO than even the modest 48-pin package I intend to use. Packages available for that chip go up to hundreds of pins.</p>
<p>This is a major issue for me actually, since I need 12 pins to communicate with a peripheral at high speed, but the largest number of uncommitted IO on the dev board on one IO bank is 10. The other banks have 6 and 1 uncommitted IO respecively.</p>
<h3 id="strange-choice-2">Strange Choice #2</h3>
<p>The FPGA IO banks have individual Vcc, so they can bank-by-bank be assigned different IO voltages, which is nice and flexible. Indeed the peripheral I want to prototype with this board is 1.8V but the host device it will connect to is 3.3V, so I rely on it.</p>
<p>On this dev board, all IO banks are hardwired to 3.3V. There's no jumper select. This makes the board of very limited use.</p>
<h3 id="annoyance-1">Annoyance #1</h3>
<p>The dev tools are downloadable for free, which is normal now in the low-end market. And they're available for Linux as well as Windows.</p>
<p>However they require you to install a macrovision license file, tied to a mac address. This file is also for free, but the licensing stuff just makes trouble for no gain to anyone, not even Lattice.</p>
<p>I registered the license to my wlan device mac, which is the main network interface on this machine. However starting the gui failed with "Licensing Error 19" (this is on Linux, with $0 downloaded vendor software using a pointless $0 downloaded vendor license file). Googling around revealed that the macrovision license library is used on a bunch of other proprietary software, and the issue is it just looks for the mac on "eth%d" and ignores network interface names that were not popular in 1995.</p>
<p>The "solution" is just rename the interface to its expectation</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">sudo ip link set wlp3s0 down
sudo ip link set wlp3s0 name eth0
sudo ip link set eth0 up
sudo systemctl restart NetworkManager
</code></pre></div>
<h3 id="strange-choice-3">Strange Choice #3</h3>
<p>The dev board (the official name is "breakout board") ships with a test image that lights the red part of an RGB LED and sits there.</p>
<p>They have made a Windows + Mac (but no Linux!) GUI app that lets you communicate with the FPGA and change the PWM state of the RGB elements. And they give a URL on a piece of paper where to download that to "get started".</p>
<p><img src="https://warmcat.com/pwm-app.png" alt="https://warmcat.com/pwm-app.png"></p>
<p><strong>But they don't ship that HDL as an example with their GUI and HDL tools.</strong> There is an ancient example called "blinky" alright, but he does not have PCF (pin layout mapping files) for SWG36 (aka WLCSP36) package variant that is actually on the dev board, nor any other package used by ICE5. It looks like it is some ancient test app inherited from older variants that has fallen out of favour.</p>
<p>Incredibly, they provide the BITSTREAM for the demo for download, but not the HDL.</p>
<p><a href="http://www.latticesemi.com/Products/DevelopmentBoardsAndKits/iCE40UltraBreakoutBoard.aspx">http://www.latticesemi.com/Products/DevelopmentBoardsAndKits/iCE40UltraBreakoutBoard.aspx</a></p>
<p>But they do spend three pages in the dev board pdf describing the SPI protocol to control the FPGA-logic PWM they're not giving the HDL for.</p>
<p>This makes no sense: nobody is buying an FPGA dev board to "get started" with some crappy windows app doing PWM on a binary-only FPGA bitstream.</p>
<p>People need all the sources of some simple known-good project to build and adapt to get through learning the flow quickly. It's fine if it's trivial and just blinks the LED without dependency on Windows, working USB / UART, and SPI, what is needed to "get started" on is yet another variation of the build flow and different tools.</p>
<p>It seems necessary to port blinky and use the schematics in the short PDF that came with the dev board to get started from scratch!</p>
<h3 id="annoyance-2">Annoyance #2</h3>
<p>Lattice have separate tools for "programming", which are also available for Windows and Linux. The Linux version comes in a choice of 32-bit or 64-bit RPM, which is nice.</p>
<p>Why is it an "annoyance" then? ----></p>
<h4 id="annoyance-2-1">Annoyance #2.1</h4>
<p>The 64-bit RPM installed easily enough and I had a look in the package to see how to start it, because there is no information in the dev board PDF.</p>
<h4 id="annoyance-2-2">Annoyance #2.2</h4>
<p>It installed itself in <code>/usr/local</code> in defiance of packaging standards, and thus can't integrate with your Desktop Environment. Not that it tries.</p>
<h4 id="annoyance-2-3">Annoyance #2.3</h4>
<p>I found what turned out to be a script <code>/usr/local/programmer/3.7_x64/bin/lin64/programmer</code>, which is pretty strange but OK.</p>
<h4 id="annoyance-2-4">Annoyance #2.4</h4>
<p>I started the script, it blew a segfault with no feedback.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Segmentation fault (core dumped)
</code></pre></div>
<p>Later I ran its component part by hand so the console feedback was still enabled, and learned</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">get one help index file /usr/local/programmer/3.7_x64/docs/webhelp/eng/tool_programmer.xml for tool PROGRAMMER.
get one help index file /usr/local/programmer/3.7_x64/docs/webhelp/eng/tool_programmer.xml for tool PROGRAMMER.
libusb couldn't open USB device /dev/bus/usb/001/008: Permission denied.
libusb requires write access to USB device nodes.
libusb couldn't open USB device /dev/bus/usb/001/013: Permission denied.
libusb requires write access to USB device nodes.
libusb couldn't open USB device /dev/bus/usb/001/008: Permission denied.
libusb requires write access to USB device nodes.
Segmentation fault (core dumped)
</code></pre></div>
<p>So it basically craps itself if it can't open (any?) usb devices, at least if it can't open the one it likes after the scan. I guessed this was the problem and re-ran it with sudo.</p>
<h4 id="annoyance-2-5">Annoyance #2.5</h4>
<p>On starting this from the console now with sudo, I still get no feedback, but it doesn't crash... just shows an empty grey dialog with a titlebar.</p>
<p><img src="https://warmcat.com/lattice-grey.png" alt="https://warmcat.com/lattice-grey.png"></p>
<p>This is quite rough to google for, at least nobody else seemed to have met it with Lattice. Strace didn't show anything and there was no console or other logging. Eventually I went into the "programmer" startup script and checked with ldd where the libs were coming from, the ship their own Qt4 libs. I moved these out of the way so it could use the system ones, but this didn't help.</p>
<p>Finally I started running their startup script env changes one by one by hand on the commandline and executing their main app binary directly. This got me console logging, the startup script was quenching it</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">X Error: BadAccess (attempt to access private resource denied) 10
Extension: 130 (MIT-SHM)
Minor opcode: 1 (X_ShmAttach)
Resource id: 0x15d
X Error: BadShmSeg (invalid shared segment parameter) 128
Extension: 130 (MIT-SHM)
Minor opcode: 5 (X_ShmCreatePixmap)
Resource id: 0xde
...
</code></pre></div>
<p>After checking it wasn't selinux, finally those keywords got me some google company in my misery, what finally fixed it was feeding it <code>QT_X11_NO_MITSHM=1</code> in the environment.</p>
<h4 id="annoyance-2-6">Annoyance #2.6</h4>
<p>After all that, the "programmer" app starts up wanting to use JTAG / Boundary Scan semantics, which are not supported by ICE5 (its configuration is predicated around SPI), although it's possible to chain SPI so it's unclear if that would work even so. But the scan chain autodetect fails.</p>
<p>Googling around this seems to be related to ftdi_sio Linux driver binding to their onboard dual usb - serial chip used to do SPI bitbanging.</p>
<p>However with the lack of balls on the dev board, there is no point wrestling with those tools, since I can't test the things that I need to with it. Instead I hacked the dev board to be configured from my own host directly using SPI and can forget their "programming" tools.</p>
<h3 id="strange-choice-4">Strange Choice 4</h3>
<p>The official Lattice "breakout board" doesn't follow Lattice's own recommendations about power supply sequencing, from document EB87 "ICE40 Ultra Breakout Board"</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">The power supply sequencing does not conform to the NVCM boot requirements as specified in DS1048,
iCE40 Ultra Family Data Sheet. The user may encounter intermittent boot success and/or higher than specified
startup currents when attempting to boot from NVCM.
</code></pre></div>
<p>Still this is interesting looking at what shortcuts passed at lattice compared to the sermons in their datasheets:</p>
<ul>
<li><p>the complex power sequencing can be violated if you don't care about boot from the internal flash and briefly higher current at startup</p></li>
<li><p>the unwelcome 2.5V rail (it means this chip demands 1.2V, 2.5V and 3.3V) can be hacked up from 3.3V via a diode drop (with no bypassing) if you don't care about internal flash.</p></li>
<li><p>if you don't care about the PLL then you can just short the PLLVCC to 1.2V instead of providing the "required" filter network</p></li>
</ul>
<p>and the world keeps turning.</p>
<p>Shortcuts are fine but in a "breakout board" if it has a chip with a PLL on it as this does, you reasonably expect to be able to use the PLL. This board breaks the PLL by omitting the filter. It's unreasonable.</p>
<h2 id="minimal-toggle-example">Minimal toggle example</h2>
<p>The board gives us a 12MHz clock on C2. The test vhdl just toggles a flip-flop with it and sends it back out on F4. Both of these are on Bank 1 (not that it matters since everything is forced to 3.3V)</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity icetest is
Port (
clk : in STD_LOGIC; -- C2 BANK1 IOB 25B_G3
toggle : out STD_LOGIC -- F4 BANK1 IOB 11B_G5
);
end icetest;
architecture Behavioral of icetest is
signal t : STD_LOGIC;
begin
toggle <= t;
process (clk)
begin
if (rising_edge(CLK)) then
t <= not t;
end if;
end process;
end Behavioral;
</code></pre></div>
<p>You need a "Place and Route constraint file" (as opposed to design constraint file) to map the logical ports to actual balls on the right chip, create this in a .pcf file in your project dir and add by rightclick on <code>P&R flow | Add P&R files | Constraint files</code></p>
<div class="highlight"><pre><code class="language-text" data-lang="text">set_io clk C2
set_io toggle F4
</code></pre></div>
<p>Careful, the ball names are case-sensitive.</p>
<p>You'll find a 6MHz square wave on F4 (J7.8)</p>
<h3 id="annoyance-3">Annoyance #3</h3>
<p>Attempting to synthesize some test VHDL gives</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">****************************************************************
Error: platform linux_a_64 4.6.4-301.fc24.x86_64 is not supported
****************************************************************
</code></pre></div>
<p>We must inform <code>./lscc/iCEcube2.2016.02/synpbase/bin/config/platform_check</code> that 4.x Linux kernels exist.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">case $VERSION in
4.* | 3.* | 2.4.* | 2.6.* )
</code></pre></div>
<p>Afterwards it can synthesize the bitmap, it puts it in <code>lscc/iCEcube2.2016.02/sbt_backend/Projects/projname/projname_Implmnt/sbt/outputs/bitmap/icetest_bitmap.bin</code>. As the config app note says, it's ~72KBytes for the 4K Ultra.</p>
<h3 id="oversight-1">Oversight #1</h3>
<p>There is no way to stop the FTDI chip on the "breakout board" driving the FPGA config SPI CS (aka "SS_B") and SPI CLK. The other signals MOSI, MISO can be disconnected via J10, and you can disconnect the onboard SPI flash from seeing the config CS using J9.</p>
<p>The SPI config signals appear on (unpopulated...) connector J6, so it's not unexpected we might use them.</p>
<p>The FTDI chip drives CLK and SS high by default, so they can be overridden externally if you leave the FTDI interface alone on your PC. However if your CS line has plans around a pulldown, this will wreck them.</p>
<h2 id="what-did-we-learn-this-time">What did we learn this time?</h2>
<ul>
<li><p>Lattice ICE40 Ultra "breakout board" does not have enough uncommitted FPGA IO (just 17) to allow development for anything but the smallest-package variants</p></li>
<li><p>Only 3.3V IO standard is supported on the board, despite the chip allows other standards</p></li>
<li><p>The board violates Lattice's own design requirements at least 3 ways, I guess these violations are also somewhat blessed if you don't care about the things they break.</p></li>
<li><p>The chip has a PLL but the board design deliberately breaks it, too bad if you wanted to prototype with it</p></li>
<li><p>No worked HDL + pin mapped examples, just a useless binary bitstream. You are on your own.</p></li>
<li><p>Brain the size of a planet, the board comes up with a red LED and sits there</p></li>
<li><p>pointless free ($0) license file for free ($0) tools that just exists to make trouble</p></li>
<li><p>JTAG programmer tools doesn't work OOTB on Linux</p></li>
<li><p>Nobody at Lattice has a box with a 4.x kernel (out for a year now), generally it's much more of a struggle than it should be to get the tools working on Linux</p></li>
<li><p>Somebody at Lattice needs to have a vision quest about why people want an FPGA "breakout board", it's not because they have a mac and want a GUI app to control an LED.</p></li>
<li><p>the chip is easy to configure over SPI from your own host</p></li>
<li><p>the FPGA itself seems very good to me. But the breakout board is a waste of time and money.</p></li>
</ul>
ESP8266 Wifi module on Linux2016-07-22T00:00:00+08:00https://warmcat.com/2016/07/22/esp8266<h2 id="esp8266-basics">ESP8266 basics</h2>
<p>I have had a bunch of Espressif ESP8266 modules since last year, getting started with them seemed very difficult. Yesterday I looked at them again and got somewhere with them.</p>
<p><img src="https://warmcat.com/esp7.jpg" alt="https://warmcat.com/esp7.jpg"></p>
<p>There are many pages already with pieces of information about ESP8266, and some great guys have been all over it reversing the bootloader a while ago. However it took me a while to get enough info for using my modules on Linux, and to understand what the modules are and aren't inside.</p>
<p>So this page will try to document the basics in one place for people wanting to use the modules on Linux.</p>
<h3 id="architecture">Architecture</h3>
<ul>
<li><p>The chip uses an RTL core from Tensilica, this has been around since forever (at least before 2000 when I was at Emosyn). It's a cheap and cheerful RISC core. Toolchains exist for it in gcc, it's broadly a similar deal to working with Arm Cortex, but a little bit more customizable in terms of being able to configure the RTL to have optional instructions or not. But the hard work for matching the toolchain to that has already been done.</p></li>
<li><p>The chip expects an SPI flash companion chip, in the "07" version of the modules I have to get started with, the SPI flash is 8Mbit (1MByte).</p></li>
<li><p>There is a ROM bootloader inside the chip that allows reflashing SPI over serial. So you can't brick it.</p></li>
<li><p>For normal boot, a second bootloader is pulled from SPI flash (at offset 0) by the ROM bootloader and executed. If you GND GPIO0 though, the ROM bootloader enters the "reflash over serial" path and waits to be connected to.</p></li>
<li><p>The second level bootloader pulls in the main code from flash</p></li>
<li><p>The main code is flashed in two pieces, at offset 0x1000 and 0x81000. Images from Espressif come in two pieces accordingly</p></li>
</ul>
<p>The firmware is not all open but is actively developed. Since old versions of it ship on the module, the first job is set yourself up to be able to reflash it.</p>
<h3 id="required-module-connections">Required module connections</h3>
<ul>
<li><p>The module requires 0V (GND) and 3.3V power. There are many dire notes about needing a dedicated LDO because the module can pull a lot of current, however here the module works fine with 30cm of wire for 3.3V power and ground coming from the expansion connector of an Raspberry Pi 3, with a 10uF Tantalum at the module. YMMV but it doesn't seem so critical as told elsewhere.</p></li>
<li><p>RST may be left open, the module has it pulling up a cap inside already. You can force it to 0V to make a reset happen.</p></li>
<li><p>EN must be pulled up or tied to 3.3V</p></li>
<li><p>GPIO10 should be tied to 0V</p></li>
<li><p>GPIO0 can be left unconnected for normal boot, or forced to 0V to select flashing from the ROM bootloader</p></li>
<li><p>There's a two-wire UART interface on the module using 3.3V signalling. Modern USB-serial converters are typically 3.3V already, although older ones using 5V signalling are unsuitable.</p></li>
</ul>
<h3 id="baud-rate-nonsense">Baud rate nonsense</h3>
<p>From boot, in the ROM bootloader and the second stage bootloader, the chip's baud rate on serial is a completely nonstandard 74880bps. Most normal serial ports can't do that, however serial - USB adapters usually have a very flexible baud rate generator and can handle it. Even so, most serial terminal applications cannot express such random baud rates, eg. minicom can't do it. gtkterm did it once (?) and never again, it stuck at 9600 even while displaying 74880 in the UI.</p>
<p>pyserial package has a python terminal emulator that can do it, if your USB adapter can do it.</p>
<p><code>miniterm.py /dev/ttyUSB0 -b 74880</code></p>
<p>If you're debugging what is going wrong from boot, you need the funny baud rate. But modern firmwares that run after the second stage bootloader change the baud rate to 115200, so if you avoid problems in early boot using the info following, you can stick at 115200 and just see some junk coming at module reset.</p>
<p>What it's actually saying (at 74880bps) during a normal boot looks like this</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">ets Jan 8 2013,rst cause:2, boot mode:(3,6)
load 0x40100000, len 2408, room 16
tail 8
chksum 0xe5
load 0x3ffe8000, len 776, room 0
tail 8
chksum 0x84
load 0x3ffe8310, len 632, room 0
tail 8
chksum 0xd8
csum 0xd8
2nd boot version : 1.6
SPI Speed : 40MHz
SPI Mode : DIO
SPI Flash Size & Map: 8Mbit(512KB+512KB)
jump to run user1 @ 1000
rf cal sector: 251
rf[112] : 00
rf[113] : 00
rf[114] : 01
SDK ver: 2.0.0(656edbf) compiled @ Jul 19 2016 17:58:40
phy ver: 1055, pp ver: 10.2
</code></pre></div>
<p>At the end it switches to 115200 and says</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">ready
</code></pre></div>
<p>and will accept AT commands. Note some terminal emulators don't send suitable things for CRLF, in that case try Ctrl-J to end the line.</p>
<p>From a terminal emulator in 115200bps mode only, the end result is a spew of junk and then the word "ready".</p>
<p>In text from the ROM bootloader <code>boot mode:(3,6)</code>, the 3 indicates GPIO0 is pulled high for normal boot. If you hold GPIO0 to 0V, this changes to <code>boot mode:(1,6)</code> and the ROM bootloader enters the path for flashing the SPI instead of normal boot.</p>
<h3 id="areas-in-the-spi-flash">Areas in the SPI flash</h3>
<ul>
<li>0x0000 second stage bootloader</li>
<li>0x1000 first part of application</li>
<li>0x81000 second part of application</li>
</ul>
<p>Then there are two areas that go at the "end" of the SPI flash, the addresses shown are for the 8Mbit SPI flash on my module, for 16MBit, they are +0x100000</p>
<ul>
<li>0xfc000 wifi calibration dramatically</li>
<li>0xfe000 dunno but it needs setting to 0xff</li>
</ul>
<h3 id="using-linux-commandline-tools-to-flash-the-module">Using Linux commandline tools to flash the module</h3>
<p>There is a python flasher here that works well.</p>
<p><a href="https://github.com/themadinventor/esptool">https://github.com/themadinventor/esptool</a></p>
<p>You can install it using pip as mentioned in its readme, but since pyserial is packaged in Fedora I installed that dependency from the package.</p>
<p>I got the current-at-the-time-of-writing binary "NONOS" SDK image from here <a href="http://bbs.espressif.com/viewtopic.php?f=46&t=2451">http://bbs.espressif.com/viewtopic.php?f=46&t=2451</a> unpacked it and navigated to ./bin/at/512+512/</p>
<p>You must exit your terminal emulator while doing this or it will compete for reading the serial data esptool.py wants to read.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">esptool.py --port /dev/ttyUSB0 --baud 115200 write_flash --verify -fs 8m \
0x00000 ../../boot_v1.6.bin \
0x01000 user1.1024.new.2.bin \
0x81000 user2.1024.new.2.bin \
0xfc000 ../../esp_init_data_default.bin \
0xfe000 ../../blank.bin
</code></pre></div>
<p>At the end, it should says</p>
<p><code>-- verify OK (digest matched)</code></p>
<p>Afterwards stop forcing GPIO0 to 0V and either cycle the power or take RST to 0V briefly to come up in the application firmware at 115200 (after the inevitable junk from the ROM + second stage bootloader at 74880 baud first), and say "ready".</p>
<h3 id="at-commands-to-connect-to-ap">AT commands to connect to AP</h3>
<div class="highlight"><pre><code class="language-text" data-lang="text">ready
AT+CWMODE=1
OK
AT+CWLAP
+CWLAP:(3,"myap",-77,"4c:e6:76:c4:e7:b8",11,-29,0)
+CWLAP:(3,"happycat",-78,"5c:f4:ab:70:52:18",11,-7,0)
OK
AT+CWJAP="myap","mypassword"
WIFI CONNECTED
WIFI GOT IP
OK
AT+CIPSTART="TCP","192.168.2.253",22
CONNECT
OK
+IPD,21:SSH-2.0-OpenSSH_7.2
</code></pre></div>
<p>The AT based fw is the default but it turns out it's not at all the best way to use the chip, for that you need to build an application firmware from scratch. You must build the toolchain and the actual application, but that is surprisingly simple it turns out since all the main grunt work has been done.</p>
<h3 id="homebrew-firmware-toolchain">Homebrew firmware - Toolchain</h3>
<p>Clone this
<a href="https://github.com/pfalcon/esp-open-sdk">https://github.com/pfalcon/esp-open-sdk</a></p>
<p>and follow the build instructions. For Fedora, as mentioned in <a href="https://github.com/pfalcon/esp-open-sdk/pull/56/commits/464e275e6a18ef31a8381839d87abdd69f4878b4">https://github.com/pfalcon/esp-open-sdk/pull/56/commits/464e275e6a18ef31a8381839d87abdd69f4878b4</a> the following packages will be needed.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">sudo dnf install make unrar autoconf automake libtool gcc gcc-c++ gperf \
flex bison texinfo gawk ncurses-devel expat-devel python sed \
help2man python-devel pyserial
</code></pre></div>
<p>I chose the "include the SDK" build option and after some minutes it completed the build OK on Fedora24.</p>
<p>Set PATH to point to wherever you cloned the Toolchain</p>
<p><code>export PATH=/projects/esp-open-sdk/xtensa-lx106-elf/bin/:$PATH</code></p>
<p>That's it to get started with the toolchain.</p>
<h3 id="homebrew-firmware-application">Homebrew Firmware - application</h3>
<p>I was really surprised that there is already a FOSS (MIT licensed) application in < 400KB with basically all the features needed (in a bare way, but still, this is an amazing boon). The author seems to have put together things from other projects, but it seems all FOSS.</p>
<p>Clone this
<a href="https://github.com/israellot/esp-ginx">https://github.com/israellot/esp-ginx</a></p>
<p>Edit line 8 of ./Makefile to point to your toolchain bin dir from earlier.</p>
<p><code>XTENSA_TOOLS_ROOT ?= /projects/esp-open-sdk/xtensa-lx106-elf/bin/</code></p>
<p>After that just <code>make</code>. He built without errors here.</p>
<p>Using the esptool from earlier, we can blow everything in one step</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">esptool.py --port /dev/ttyUSB0 --baud 115200 write_flash --verify -fs 8m \
0x00000 bin/0x00000.bin \
0x10000 bin/0x10000.bin \
0xfe000 bin/blank.bin \
0xfc000 bin/esp_init_data_default.bin
</code></pre></div>
<p>The core application is taken from some smart relay project. It runs the ESP8266 in both AP and station modes, and allows you to select and authenticate to another AP. It also has some kind of websocket serving support, but something about it is broken since it logs</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">http_ws_cgi_execute c =0x3ffea500
websocket frame size 520
Invalid frame type 07
received invalid frame
</code></pre></div>
<p>Opcode 7 is reserved in RFC6455... anyway that's a great start, I thought doing stuff like making the station connection and keeping it nailed up would have to be done by hand. I guess the next step is study the pile of code.</p>
<h3 id="special-constraints">Special constraints</h3>
<p>ESP8266 has one big advantage on the constraint front, a relatively huge SPI NOR flash used to store code and data. Even on my old modules from 2015 that is 1MiByte, and on recent modules it's 4MiBytes.</p>
<p>Allowing that half of it will need reserving for processing updates, 2MiBytes is still a lot of space for this kind of device. The whole image for example from the esp-ginx image is only 350KiB.</p>
<p>However there are problems. Of the main two issues, one can be imagined, there is only 80KiB of SRAM on the whole device. Typically there is around 30KiB of heap left after init.</p>
<p>The much less expected problem is that the ROM copies the first 0x9000 (36KiB) of flash to SRAM and jumps into it, for unknown reasons it seems the global .rodata section is required to be in there (!)</p>
<p>Any code that operates the SPI flash has to be in there as well, because it can't mess with it while executing from it.</p>
<h3 id="update">Update</h3>
<p>Edit: the bad opcode thing is a bug, the websocket "cgi" code receives a first packet which is actually the GET / HTTP headers, 0x47 for G of GET is parsed in ws protocol to get opcode 7. However the code (illegally) ignores it and continues on. With small packet sizes (32 bytes) the lwip stack falls over with OOM, like this <a href="https://github.com/micropython/micropython/issues/1971">https://github.com/micropython/micropython/issues/1971</a> It seems related to how many connections are coming at lwip, which is a bit worrying for robustness (the http / websocket parser stuff also looks a bit shaky).</p>
<p>I threw out their http arrangements and replaced it with lws, which I will discuss in another post.</p>
Silego GreenPAK crosses Analogue, CPLD and FPGA2016-03-29T00:00:00+08:00https://warmcat.com/2016/03/29/silego-digital-analogue-cpld-fpga<h2 id="silego-greenpak">Silego GreenPAK</h2>
<p>Silego <a href="http://www.silego.com/">http://www.silego.com/</a> is a small semiconductor company with some very interesting products.</p>
<p>Their GreenPak devices basically combine a OTP CPLD type crossbar with both digital assets like latches and FPGA-style Lookup Tables (LUT), and analogue assets like 8-bit ADC and DACs, and analogue comparators. The routing for all the assets is controlled from a 2Kbit OTP array, loaded into volatile configuration FFs at power-on.</p>
<p><a href="http://www.silego.com/uploads/Products/product_365/SLG46621r111_03252016.pdf">http://www.silego.com/uploads/Products/product_365/SLG46621r111_03252016.pdf</a></p>
<p>Basically they are like a cheap microcontroller with crossbars + random logic instead of the cpu to move things between peripherals.</p>
<p>They are very small and very cheap, at quantity down to US$0.20, and the development tools are also inexpensive.</p>
<p>The overview of the die assets is like this</p>
<p><img src="https://warmcat.com/silego-assets.png" alt="microscope-silego-bot"></p>
<p>That price is so low, it means these can be considered to replace discretes, and these have advantages like onchip oscillators and sequencing logic, low power and being smaller than the equivalent discrete real-estate.</p>
<h3 id="otp-but-also-reconfigurable">OTP but also reconfigurable</h3>
<p>But potentially the killer thing about them is although the initial state can only be programmed one time in the OTP, actually the active 2Kbit configuration state may be changed dynamically over I2C. Their dev board uses this to emulate your design in realtime, "programming" the configuration state into the volatile registers only, but if there are additional smarts on the device it's designed into enough for i2c, this dramatically increases the value of the assets in the chip.</p>
<p>Back in the 1990s when I was using the early Xilinx FPGAs, XC2064, I reverse-engineered the bitstream so I could similarly set "constants" like initial FF states dynamically instead of setting them in the development tools, or having to use assets to hold them in explicit registers: this radically improved what could be done with these small devices. So it definitely adds a whole new dimension to these chips: and the configuration bit mappings for GreenPak are already in the public documentation.</p>
<h3 id="silicon-breadboard">Silicon breadboard</h3>
<p>In effect the crossbar means that these tiny chips are populated "breadboards" with digitally programmable interconnect.</p>
<p>Because of the fixed set of assets on the die, and other restrictions and constraints, the key to leveraging the crazy price:performance ratio is properly understanding what's in there and how it can be used.</p>
<p>For example although the digital assets seem a scatty mix of LUTs and counters, in fact there is enough in there to make complex state machines that may be enough to replace small microcontroller tasks.</p>
<h2 id="these-chips-are-tiny">These chips are tiny...</h2>
<p>Silego provide two types of package and samples with the dev board, and two types of test socket you can mount the chips on the dev board with. I am only interested in the "larger" SLG46621 one (2mm x 3mm!), but the chip dimensions are incredibly tiny, there is no way to mount them for prototyping without a paste mask and hot air rework tool. The pin pitch is 0.4mm.</p>
<p><img src="https://warmcat.com/silego-top.png" alt="microscope-silego-top">
<img src="https://warmcat.com/silego-bot.png" alt="microscope-silego-bot"></p>
<p>The dimensions of the test socket are correspondingly insane as well</p>
<p><img src="https://warmcat.com/silego-test-socket.png" alt="microscope-silego-test-socket"></p>
<p>However due to the prototyping flexibility, you can be pretty sure your device is going to work before committing to a pcb and paying to get the chips mounted.</p>
<h2 id="greenpak-dev-tools">GreenPAK dev tools</h2>
<h3 id="installing-on-fedora">Installing on Fedora</h3>
<p>There's good news and bad news... the good news is Linux is a supported platform <sup><sup>.</sup></sup> The bad news is that comes in the form of .deb packages only. Well it's good news for you if you are on Debian or Ubuntu.</p>
<p>I was able to get it to install on Fedora by</p>
<ul>
<li><code>dnf install dpkg</code></li>
<li>unzip the distribution zipfile to find the two .debs</li>
<li><code>dpkg -x <32-bit deb> .</code></li>
<li><code>rsync -a lib/* /lib</code></li>
<li><code>rsync -a usr/* /usr</code></li>
</ul>
<p>Afterwards, some i686 pieces are needed</p>
<ul>
<li><code>dnf install libusb.i686 qtwebkit.i686 glibc.i686</code></li>
</ul>
<p>Then you can run <code>GPLauncher</code> as a normal user.</p>
<h3 id="graphical-tools">Graphical tools</h3>
<p><img src="https://warmcat.com/silego-tools-1.png" alt="microscope-silego-tools-1"></p>
<p>Silego have done a nice job with the tools, but they are all working from a hardware engineer's point of view. That is, you operate in a "schematic editor", "wiring up" assets on the die. In particular it's permanently in your face due to the UI design that the silicon architecture has two disjoint crossbars with a restricted number of interconnect between them.</p>
<h3 id="no-hdl-support">No HDL support</h3>
<p>While this is true, for decades CPLDs with the same basic architecture have offered HDL (Hardware Description Language) support for expressing the design. In the case that you want a complex state machine from the available pieces, it's going to be timeconsuming to "wire it up" and then inflexible and painful if you want to change it. Similarly if you want to port your design to another GreenPak part, you will have to start over. Abstracting the design into an HDL gets you away from those kinds of detail at the cost of losing exact control over what goes where, but often, that's "control" you are very glad to cede.</p>
<p>If you think about publishing work in git, the flow of editing source text in a distributed way is very well established and contributions are easy to integrate. With the graphical tools, a binary is in the repo and there is no concept of a diff.</p>
<p>So in many ways graphical hardware design is something with a past and no future.</p>
<h3 id="update-2016-05-06-nascent-foss-hdl-support">Update 2016-05-06: Nascent FOSS HDL support</h3>
<p>I was contacted by Andrew Zonenberg to tell me about <a href="https://github.com/azonenberg/openfpga">his project on Github</a> providing a Verilog compiler for the Silego GreenPak 46620.</p>
<p>Let's hope Silego consider supporting that, because being forced to work on a floorplan to implement what is basically code is now extremely unfriendly, even when there are only a couple of dozen assets to wire up.</p>
<h3 id="hello-world-experience">Hello World experience</h3>
<p>From spending a few hours with it, I learned</p>
<ul>
<li><p>it works out of the box quickly</p></li>
<li><p>the flow is edit the circuit, check for design rule errors, then click the 'emulate' button on the hardware window to update the dev board</p></li>
<li><p>routing a 2-input LUT to itself as an inverter, and drive a FF clock with the result so it toggles, gets a 38MHz square wave. So the internal self oscillation rate of the inverter is about 76MHz which is pretty nice.</p></li>
<li><p>trying to use the hard counters, <strike>the choice of clock source does not include any internal node. The closest is an "external clock" input: you'd have to route the internal node out the chip and back in the clock input it seems.</strike> it turns out the OSC macrocell has an "external clock" input that can be routed to from inside the chip.</p></li>
</ul>
Hall-effect current sensing2015-12-21T00:00:00+08:00https://warmcat.com/2015/12/21/hall-effect-current-sensing<h2 id="dynamic-power-measurements">Dynamic power measurements</h2>
<p>There continues to be interest about dynamic power measurements, after some work I did on top of Arm's "Energy Probe" back in 2012. You can read about the various issues I found with that here</p>
<p><a href="https://git.linaro.org/people/andy.green/arm-probe.git/blob/HEAD:/arm-energy-probe-101.pdf">https://git.linaro.org/people/andy.green/arm-probe.git/blob/HEAD:/arm-energy-probe-101.pdf</a></p>
<p>Basically the current measuring amplifier becomes blind below 20mA or so with typical shunts, and there are several other issues around temperature sensitivity and inter-channel matching. The problems are largely coming from the shunt amplifier chosen, which becomes badly nolinear when the voltage across the shunt is near 0.</p>
<p>Another major problem related to the choice of shunt amplifier is the relatively large fullscale voltage drop it expects to measure across the shunt, 165mV. If your voltage rail was only 1V in the first place, this is a completely unacceptable drop.</p>
<h2 id="hall-effect-current-sensing">Hall effect current sensing</h2>
<p>Allegro have a very interesting alternative current measurement technology which inserts almost no losses, less that one mR + the inductance of the SO-8 leadframe. It works by passing the conductor through a Hall effect sensor integrated in the chip, and amplifying the sensor voltage (which is linearly dependent on current flowing through the conductor) as an analogue output.</p>
<p><img src="https://warmcat.com/acs722-unipolar.png" alt="acs722 unipolar"></p>
<p><a href="http://www.allegromicro.com/%7E/media/Files/Datasheets/ACS722-Datasheet.ashx">http://www.allegromicro.com/~/media/Files/Datasheets/ACS722-Datasheet.ashx</a></p>
<p>This has some pretty impressive advantages:</p>
<h3 id="almost-no-voltage-drop-or-power-loss">Almost no voltage drop or power loss</h3>
<p>There is no resistive shunt, the power must pass through a few mm of chip leadframe is all, it's quoted as 650*<strong>micro</strong>*Ohm. So there is no power drop or significant inductance increase in the path: it's possible to insert it on the output side of low voltage / high current regulators.</p>
<h3 id="isolation">Isolation</h3>
<p>The analogue output side of the chip is galvanically isolated from the power rail being measured. 1V, 1.8V, 3.3V, 5V, 30V, 110V, 230V are all OK. (Never work on stuff directly connected to the mains without taking appropriate safety precautions and using an isolating transformer... this article is about measuring low voltage DC currents)</p>
<h3 id="bipolar-response">Bipolar response</h3>
<p>ACS722 used here has bipolar response to current moving in both directions. With no current passing, the output is nominally at 0.5 Vcc (1.65V for 3.3V) and it moves higher or lower symmetrically depending on the direction of the current flow.</p>
<h3 id="high-bandwidth">High bandwidth</h3>
<p>Output bandwidth is pin-selectable as rolling off at 20kHz or 80kHz.</p>
<h3 id="fairly-cheap">Fairly cheap</h3>
<p>They are around US$4 in low quantity.</p>
<h3 id="one-new-issue-magnetic-disturbance">One new issue, magnetic disturbance</h3>
<p>Hall effect sensors work by sensing a magnetic field, external magnetic fields are detected indistinguishably from that caused by current passing through the sensor conductor.</p>
<h2 id="kicking-the-tyres">Kicking the tyres</h2>
<p>Analogue semiconductor datasheets often play a little game with the engineer, they have something about the chip response they don't want to come out and say, but they have to indemnify themselves complaints later, so they must find a way to touch on the subject in a way that they can claim subsequently, "of course that is what we meant by that", or, "obviously a reasonable engineer would have understood from that..." Therefore it's wise to look for what is not said, or not given a graph or specific numbers with a very sour eye and go confirm it yourself before falling in love with the thing.</p>
<p>Basically these kind of solutions that output an analogue voltage for a zero-point have their work cut out reproducing that voltage over production spread of devices, temperature, age and other enemies of analogue determinism. I read in the datasheet that there is a considerable signal conditioning chain including temperature compensation after the Hall sensor and the devices are factory-trimmed to make them give identical response within a few percent.</p>
<p>Even if that is so, the ADC you would use to sample it will have its own problems acting exactly the same across process and temperature, so exactly how that whole thing acts for repeatability needs to be thoroughly understood.</p>
<p>The related graphs for ACS722 are like this:</p>
<p><img src="https://warmcat.com/acs722-performance.png" alt="acs722 performance"></p>
<p>"3 sigma" lines are telling us about the process spread around the average, 99.7% of chips will perform in the area inside the red and green lines. If you are not familiar with how crunchy the Analogue world is, you might find this surprising both that we don't exactly know how the individual chips will perform and that we are not informed about how the 0.3% of chips might perform, presumably Allegro will throw them out so we don't need to concern ourselves.</p>
<h2 id="what-is-zero">What is zero?</h2>
<p>We already mentioned with no currently flowing, ACS722 output should sit in the middle between 0V and Vcc. But even if ACS722 was perfect, the power regulator for 3.3V won't be, it will differ a little each time.</p>
<p>You can see from the top left graph, if the 3.3V regulator is otherwise perfectly the same every time, ACS722 process spread at 25 degrees is around +/-22mV for the zero point. Well, the output sensitivity for ACS7522 is 264mV per A detected, so +/-22mV corresponds to a per-device offset of up to +/-83mA in detection.</p>
<p>So there is going to be very significant disagreement between chips and at different temperatures about what "zero" looks like. But that's not too bad actually because as a static offset if we measured it and stored it somewhere along with the device, we can normalize all the sensors for their own personal offset. It varies over temperature but actually there's a cunning trick up our sleeves for that we discuss next.</p>
<h2 id="temperature-linearization">Temperature linearization</h2>
<p>All of the graphs above have temperature as their X axis, that tells you the vendor knows the device acts quite differently depending on the ambient temperature. There should not be much self-heating, it takes very little current to operate and there is almost no impedence to the current being measured.</p>
<p>You might think to linearize the response across temperature, we would have to measure the temperature and apply a correction function. However, there is a very cool trick we can perform to linearize it with almost no effort (although, a little bit of cost).</p>
<p>You can see that in 3 of the 6 graphs ("Zero current output", "offset" and "Total error"), the process spread difference is largely a matter of an offset, the three process spread lines actually describe the same shape.</p>
<p>For the other three graphs, if we restrict the operational temperature to between 0 and 50 degrees, which is reasonable for development / bench operation, the graphs then also conform quite well to the idea the process spread simply introduced another fixed offset.</p>
<p>So... so what?</p>
<h2 id="differential-measurement">Differential measurement</h2>
<p>Normally we would use an ACS722 like this</p>
<p><img src="https://warmcat.com/acs722-unipolar.png" alt="acs722 unipolar"></p>
<p>... he outputs a monotonically increasing voltage proportional to the increasing current he senses.</p>
<p>However if we use two, where the second sees the current flow inverted...</p>
<p><img src="https://warmcat.com/acs722-bipolar.png" alt="acs722 unipolar"></p>
<p>We can effectively make a differential measurement. Temperature and other problems like sensitivity to power supply voltage and aging drift, affect both sensors about the same, and in the same direction, as shown in the bottom right image. If we only measure the difference between the two sensors, we can cancel those annoying effects and become fairly immune to zero drift.</p>
<p>The problem caused by external magnetic fields being detected as current can also be helped by the differential technique, if the second device occupies nearly the same volume as the first with the same orientation, it should be affected similarly also being treated as common-mode noise and removed at the differential receiver.</p>
<p>So after thinking of this differential technique I ordered some ACS722 and hooked them up to a test jig, using a Freescale K64F dev board and the 16-bit differential ADCs on that.</p>
<h2 id="testing-acs722-performance">Testing acs722 performance</h2>
<p>I hooked the acs722 between a linear bench PSU and a BK Precision 8540 Programmable electronic load.</p>
<p>Since I was using the 5A-rated part, I first checked the gross results in both directions, at 5A scale it performs very well and as expected.</p>
<p>However most interesting current measurements on digital devices tend to involve lower currents, at least some of the time. And we are always interested in an analogue measurement system to reliably detect "zero", which is usually harder than it sounds. By definition detecting "zero" involves system performance at low currents.</p>
<p>acs722 performs very poorly here. If you zoom out, he can indicate the current rate very well, but if you watch him with no load, or any consistent load, he is wandering about all over the place. Here is one ACS722 (ie, "singleended") with a consistent load over 110s... this data has already been averaged over 32 ADC samples, the raw data at ~50kHz is much worse.</p>
<p><img src="https://warmcat.com/plot-acs722-singlended-drift-ring-2.png" alt="plot-acs722-singlended-drift-ring-2"></p>
<p>Because of the duration of the excursions, it seems there is signalprocessing inside the acs722 that periodically "autozeroes" the device to counter thermal drift. Thermal drift or noise by itself we might be able to do something about but this semi-processed version that rebases itself at arbitrary times cannot be told from actual signal. Here's the same situation with a 128-length averaging ringbuffer on top of the 32-sample ADC averaging (notice the Y axis is tighter than the first plot)</p>
<p><img src="https://warmcat.com/plot-acs722-singlended-drift-ring.png" alt="plot-acs722-singlended-drift-ring"></p>
<p>This makes the "drifty" nature of the noise clearer and shows how averaging isn't going to help.</p>
<p>I was surprised by that, so I checked with a 5W power resistor as the load instead of the programmable DC load in case there was some interaction with that, but the results are the same. I also checked the "differential" configuration with two ACS722 back-to-back and the ADC set for differential input mode, but because each ACS722 is doing his own recalibration continuously alone and unsynchronized, ie, it's not a common-mode issue, it's no better.</p>
<p><img src="https://warmcat.com/plot-acs722-diff-drift-ring-2.png" alt="plot-acs722-diff-drift-ring-2"></p>
<h2 id="conclusion">Conclusion</h2>
<p>After these results I went back to the datasheet, they basically tell straight up the noise performance is really bad for small currents.</p>
<p><img src="https://warmcat.com/acs722-noise-numbers.png" alt="acs722-noise-numbers"></p>
<p>It means that a measurement of "42mA" as told by the ACS722 may be completely ficticious with no current flowing actually, or actually 82mA is really flowing. Even if it tells you instantaneously 420mA is flowing, you can only trust that number within + / 42mA. And that 42mA noise number is in the "Typical" column, there is no max figure.</p>
<p>Even with heavy averaging (reducing the measurement bandwidth to a few Hz not the 10kHz for parity with AEP) the zero level uncertainty is > 3.5mA meaning you cannot tell signal from noise at all at this low level.</p>
<p>So despite its many good qualities for general current monitoring, at least by itself the ACS722 doesn't seem like it can give useful results as an AEP-style, generic dynamic current monitoring instrumentation.</p>
mbed3 libwebsockets port2015-11-29T00:00:00+08:00https://warmcat.com/2015/11/29/mbed3-libwebsockets-port<h2 id="fedora-mbed3-support-patches">Fedora mbed3 support patches</h2>
<p>When I started looking at mbed, the first problem I met was that Fedora's arm-none-eabi packages do not generate the "nano" library versions necessary for C++ on mbed: mbed itself is C++.</p>
<p>I fixed the rpm specfile to take care of this and sent patches to solve it on an existing Bugzilla entry about the problem</p>
<p><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1260439">https://bugzilla.redhat.com/show_bug.cgi?id=1260439</a></p>
<p>These were taken up by the maintainer of the Fedora packages and pushed to testing a couple of weeks ago</p>
<p><a href="https://bodhi.fedoraproject.org/updates/FEDORA-2015-9f0bfdec67">https://bodhi.fedoraproject.org/updates/FEDORA-2015-9f0bfdec67</a></p>
<p>And the fixed packages went into Fedora Stable 5 days ago.</p>
<p>So now, F22 / F23 / Rawhide can be used with mbed out of the box.</p>
<p>Actually that was quite interesting, the toolchain has to be built twice with different CFLAGS and selected libs from the second build merged with the first build. But the installed packages are aware of the DESTDIR they are installed into, some swapping around is necessary so both builds feel they were installed into the final DESTDIR, even though that can't be done directly. Otherwise rpmbuild will blow QA errors.</p>
<h2 id="mbed3-broken-listen-socket">mbed3 broken listen socket</h2>
<p>As I wrote before, mbed3 listen socket behaviour was badly broken and subject to races. I opened an issue on it in github</p>
<p><a href="https://github.com/ARMmbed/sockets/issues/35">https://github.com/ARMmbed/sockets/issues/35</a></p>
<p>and last week the bug was fixed and pushed to the public mbed repository in sal-stack-lwip 1.0.4.</p>
<p>That seems to completely solve the variety of race-type issues around accepting connections from a listening socket, which is great.</p>
<h2 id="mbed3-500ms-polled-onsent-issue">mbed3 500ms polled OnSent issue</h2>
<p>There's now one outstanding problem with mbed3 sockets I know about, again I filed a github issue</p>
<p><a href="https://github.com/ARMmbed/sockets/issues/38">https://github.com/ARMmbed/sockets/issues/38</a></p>
<p>This problem is less critical than the first one, everything acts correctly but OnSent() notifications are delayed and occur at 500ms intervals if no incoming packets are appearing. Since we use OnSent to regulate sending packets in the nonblocking manner required by both mbed3 and libwebsockets, this slows network traffic to a crawl.</p>
<p>But the actual traffic and notifications are correct and stable, just artificially delayed.</p>
<h2 id="mbed3-libwebsockets-port">mbed3 libwebsockets port</h2>
<p>Last week I updated libwebsockets</p>
<p><a href="http://github.com/warmcat/libwebsockets">http://github.com/warmcat/libwebsockets</a></p>
<p>to also support mbed3 properly: it's able to be build as a yotta module and when run with the mbed3 test app</p>
<p><a href="https://github.com/warmcat/lws-test-server">https://github.com/warmcat/lws-test-server</a></p>
<p>it's able to perform all the normal lws test server functions, if slowly due to OnSent() being delayed by 500ms at the moment.</p>
<p>The combination of the mbed3 OS / socket stack, lws itself, and the test server assets like PNG, ICO and HTML is only 118KB... the Cortex M4 in the K64F has 1MB of flash. So it's possible to consider meaningful HTML5 networking devices in as little as 128KB flash... this is two orders of magnitude less than required by Linux...</p>
HDMI Audio on Hikey2015-11-23T00:00:00+08:00https://warmcat.com/2015/11/23/hdmi-hikey-audio<h2 id="audio-on-96boards">Audio on 96boards</h2>
<p>96boards designs differ from the usual Raspberry Pi and suchlike in a few
ways... one obvious one is there is no Ethernet, instead WLAN is provided.</p>
<p>For audio, the default audio device is HDMI audio, and only raw i2s is provided at the expansion connectors.</p>
<p>Since I'm the only Linaro person with an HDMI analyzer, the job of getting HDMI Audio working ended up with me.</p>
<h2 id="signal-chain-for-hdmi-audio-on-hikey">Signal chain for HDMI Audio on Hikey</h2>
<p>Basically the audio is DMA'd to a FIFO + mixer + I2S IP, where it comes out on a physical 48kHz "BT" I2S interface and is taken by the ADV7533.</p>
<p><img src="https://warmcat.com/hikey-hdmi-status-0.png" alt="Hikey HDMI data flow"></p>
<p>There's no HDMI IP onchip and so for the SoC, "HDMI Audio" is just a generic 2ch I2S stream.</p>
<p>One complication is the "BT" I2S interface wiring is not probeable on the PCB, so any work has to be done blind. That's a bit scary if it just doesn't produce any output, since you could have a problem with, eg, pinctrl, or clock, or reset, and have no way to know if anything was even coming out.</p>
<h2 id="getting-started">Getting started</h2>
<p>The first part was to get the logical alsa "card" coming in Linux.</p>
<p>To make Alsa happy, it requires several pieces in place</p>
<ul>
<li>A driver for the HDMI "Card"</li>
<li>A driver for the I2S hardware... in our case this also does the job of forcing the mixer that's combined with the I2S hardware to be configured for the output we want</li>
<li>A PCM driver to arrange the DMA to move buffers of data to the I2S hardware</li>
<li>A (fake) codec driver representing the I2S -> HDMI connection</li>
</ul>
<p>I ported the I2S driver I wrote for another project along with Jassi Brar's dmaengine PCM driver, and enough fake hdmi "codec" driver that it could all register and provide the alsa card</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">root@linaro-alip:~# cat /proc/asound/cards
0 [hi6210hdmi ]: hi6210-hdmi - hi6210-hdmi
hi6210-hdmi
</code></pre></div>
<p>Next, I gutted the I2S driver and customized it for the hisilicon IP. A large part of that is making the register and bitfield definitions, then initializing more and more of the chip until it showed some signs of life.</p>
<h2 id="extreme-hdmi-monitoring">Extreme HDMI monitoring</h2>
<p>Normally when you do this kind of thing the first place you look to understand the status is the i2s bus coming out of the SoC, because there are too many things that can stop the signal before it makes it to the TV after that. So you at first want to at least be certain that the format and rate of the data coming out of the SoC is right.</p>
<p>Again normally, after the data makes it to the HDMI cable you are again blind, you have to debug it by listening to the TV.</p>
<p>However the situation this time is reversed... there is no convenient way to access the I2S bus between the SoC and the ADV7533... we are actually blind there. When it comes to the HDMI cable, because of the HDMICAP analyzer I described last month, actually we have perfect insight into what came out of the SoC (up to 720p anyway) on HDMI.</p>
<p>The HDMI monitoring scheme is like this:</p>
<p><img src="https://warmcat.com/hikey-hdmi-monitoring.png" alt="HDMICAP monitoring scheme"></p>
<p>Because I didn't finish implementing EDID support in HDMICAP, to give the rest of the stack a valid EDID an active HDMI splitter is used, and the capture analyzer works from a digital copy of the HDMI stream from the splitter. The TV provides his EDID back to the Hikey.</p>
<h2 id="first-trial">First trial</h2>
<p>I tried DMA transfer at that point since the PCM driver already supported dmaengine as does the hikey DMA driver. However, I couldn't get any result from making the DMA write to the I2S FIFO.</p>
<p>He would enqueue two DMA actions but not transfer anything and never complete until Alsa's 10s timeout closed it as an "IO Error". Well, at least it closed cleanly.</p>
<p>I checked his dma request channel (14) was correct, his IRQ was correct and that he was setting it up inside the hisilicon DMA driver... but no data could move.</p>
<p>Somewhat suspiciously I noticed nobody used the dmac driver in the DT, although it was upstream. I described the situation to Guodong at the Hisilicon Landing Team in Linaro and some contacts at Hisilicon, and tried some other things in the meanwhile.</p>
<h2 id="operating-the-fifo-by-hand">Operating the FIFO by hand</h2>
<p>After spending a while fiddling with non-working DMA, I enabled the I2S IRQ. Normally can avoid dealing with that since exhausting the DMA buffer regulates the data flow, but with DMA not passing samples into I2S the question was whether the problem belonged on DMA or I2S side. If the CPU can fill the I2S FIFO, the problem would be on DMA side.</p>
<p>I attached the IRQ to an ISR that dumped samples into the FIFO from the CPU side, basically by writing to a 32-bit register that's normally the destination for the DMA. I also intended to study the rate of IRQs to infer if the sample rate was correct or not.</p>
<p>Since I can't look at the BT I2S signals due to lack of access, I also hacked in a fixed 48kHz Stereo Audio enable in the ADV7533 driver... that involved adding packet mode support (which I tested with SPD: the hikey with these patches has an HDMI SPD of "Linaro" and "96boards:hikey" captured by HDMICAP.)</p>
<h2 id="noises-off">Noises off</h2>
<p>That got me some noise from my TV. However the rate of interrupts from using the i2s fifo IRQ in this mode was completely wrong... the IRQ should only come when I2S has used the FIFO content below a low-water mark, and the ISR stuff samples in the FIFO then.</p>
<p>But the IRQ was continuously asserted, meaning the ISR spammed new samples in there endlessly without any regulation of what it was doing. The samples were being taken and sent to adv7511 at some slower rate, and I could capture the audio packets and confirm the content was coming from the ISR (who was writing 0x11112222, 0x33334444, etc). But because the ordering was essentially random, we heard basically white noise.</p>
<p>Although that is not quite what we wanted, in terms of which bits must be working / wired up correctly / configured to pass data, it means almost everything was correct already: we could pass data from the FIFO to the TV correctly.</p>
<p><img src="https://warmcat.com//hikey-hdmi-status-1.png" alt="HDMICAP status broken DMA"></p>
<p>In other words the only thing killing us was broken DMA: the rest of it was already at least half-working.</p>
<h2 id="secure-or-convenient-pick-one">Secure or Convenient: pick one</h2>
<p>That evening I heard from Hisilicon they had figured out the DMA issue: the DMA controller had been configured for Secure access only. Hikey has a proper "Secure World" bootloader implementation - it's open source, as is the whole Hikey boot stack - and the other bootloaders and the kernel run in nonsecure mode.</p>
<p>That means the DMA couldn't work in Linux without a bootloader update. I worked with Socionext guys last year to implement a secure / nonsecure bootloader on a big.LITTLE chip, so this is nothing new actually.</p>
<p>The next morning I received a patch and binary versions of the secure pieces, so I blew them in the Hikey partitions... and bricked my Hikey.</p>
<h2 id="painkillers">Painkillers</h2>
<p>Right now there's no real pressure on Linaro not to break boot on 96boards. So, changes are repeatedly made, for very good reasons, that are incompatible with the latest and greatest that went before. And if you say to people they should make the changes backwards-compatible, they look at you like you are crazy and feel free to ignore such wild ideas. My boot pieces were only a few weeks old but things had moved on and using the newer binaries broke boot. This has been happening repeatedly since Feb 2015 when I got my Hikey.... I'm used to things blowing up but since this is now a hardware product, like the other 96boards, I fear not everybody that buys them are going to be minded that it gets bricked regularly unless they "year zero" it every time they update anything.</p>
<p>It didn't crash, to its credit, but the non-core EFI pieces had been moved to a new path. Since this stuff goes on an eMMC, there's no access to repair it... among the pieces that moved was fastboot EFI module.</p>
<p>My hikey has a 1.8V UART hacked on UART0, which used to the the UART the fashionable people used. But from some time ago, everything switched to UART3, and no mercy for people using UART0. So I shrugged and rewired it to UART3.</p>
<p>I managed to work around the EFI breakage and blew a newer boot partition. But there too, different people had decided to make other incompatible changes - again, no doubt, for the better - and moved the kernel + dtb from the boot partition to /boot in the rootfs. It doesn't, you know, fall back to the kernel in the boot partition. It just drops dead.</p>
<p>So it necessitated flashing a newer rootfs... I guess at some point, when there are hundreds of thousands of these out there, real users will have real things in their rootfs they do not want bricked and have to overwrite the whole partition to get a boot. But for now nobody is taking care about those guys.</p>
<p>At any rate I recovered after some hours lost to this and found the DMA bootloader fix worked great. But now it was working, we could hear the next problem, the audio playback on the TV was too slow and corrupted by regular noise.</p>
<h2 id="information-from-the-capture-side">Information from the capture side</h2>
<p>When passing audio, the HDMI frame looks like this</p>
<p><img src="https://warmcat.com//hikey-720p-with-audio.png" alt="HDMICAP screenshot"></p>
<p>The blanking time that is normally spent sending control period coding (just passing HSYNC and VSYNC information) gets some extra data islands, that carry the audio data. HDMI leaves it up to the transmitter to decide how many samples he will send in a frame, and he can decide to place a 36-byte raw data island anywhere in the blanking where there is enough space.</p>
<p>The PCM packet itself always has space for 8 x 24-bit audio samples: there's a bitfield comes with the packet describing which ones contain valid sample data. 16-bit data is simply shifted to be the MS 16 bits of the 24-bit sample. In this way, HDMI is natively 8ch, 24-bit and able to accept a range of sample rates.</p>
<p>How the 8 possible samples map on to the >8 possible audio outputs is not in the PCM packet but defined in another Audio Infoframe packet sent once per frame.</p>
<p>We are sending these telling it there are 2ch and they should map on "Front Left" and "Front Right".</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">Audio Infoframe (v1, len 10)
hdr 0x84 0x01 0x0a (pol 0x4a)
70 01 00 00 00 00 00 7d
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
</code></pre></div>
<p>Also part of the summary information HDMICAP reports is the number of these PCM audio packets received in one frame. Doing the arithmetic, 48000 audio samples per second spread across ~60 video frames per second, we expect to see ~800 audio packets (audio sample sets) per video frame like this:</p>
<p><img src="https://warmcat.com/audio-packets.png" alt="HDMICAP screenshot"></p>
<p>However what I was seeing was a number around 665, which is too low by 16%.</p>
<h2 id="reconstructing-the-audio">Reconstructing the audio</h2>
<p>Because HDMICAP captures everything on the HDMI wire, it's possible to capture all the audio packets and 100% reconstruct the audio samples. In fact to allow larger captures, HDMICAP has a mode where only data island content is captured: it's then possible to fill the 8MB DMA capture region with typically >1000 frames (16-20s at 50/60Hz) of audio content.</p>
<p>When I did this and saved the result as a 48kHz wav, there was no problem playing it back on a PC: it sounded normal.</p>
<p>Looking at it closer though, the amount of captured audio was too short by 16% for the 200 video frames (3.333s) I had run the capture for. So the problem is not about corruption but simply providing the correct samples too slowly, at about 40kHz: the TV replayed them at 48kHz and played junk for the missing samples making the corruption.</p>
<h2 id="inferring-the-clock">Inferring the clock</h2>
<p>Normally you would just put a 'scope on the I2S clock and confirm the overall clock rate was correct, but since we can't touch that signal without reworking the Hikey, we have to look at it differently.</p>
<p>Clearly the samples are coming from the SoC at a rate that is -16% of what it should be.</p>
<p>The I2S unit runs at 49.152MHz from a 245.76MHz PLL, which can be had by dividing it by 5... and there was some code I added to the driver while trying to get it working that forced a value of 5 in the divider. But with the other evidence, I wondered if to divide by n, you should put n - 1 in the divider register.... when I poked 4 in there, the problem was solved.</p>
<p>Afterwards I removed the whole forcing because the clock driver had already correctly set it by default, making it somewhat of a self-inflicted wound.</p>
<h2 id="current-status">Current Status</h2>
<p>As of writing this, you can get the current HDMI stuff here</p>
<p><a href="https://git.linaro.org/people/andy.green/linux.git/shortlog/refs/heads/hikey-audio">https://git.linaro.org/people/andy.green/linux.git/shortlog/refs/heads/hikey-audio</a></p>
<p>You will also need the very newest boot pieces, they were fixed a couple of days ago but I don't know which build will have them.</p>
<p>There seem to be two problems left</p>
<ul>
<li><p>At some point the audio becomes mono, just the first channel also repeated on the second channel. I have asked Hisi how that's possible... actually from the (proprietary) datasheet for the I2S unit I can't see how to do that even if I wanted to. This could also be introduced at adv7533 but again from that (proprietary) datasheet the only way I could see to do it (i2s -> HDMI channel mapping bitfields) don't do it.</p></li>
<li><p>there is a small sound at the start of playback, it seems the wrong data is sent initially somehow</p></li>
</ul>
<p><img src="https://warmcat.com/audio-startup.png" alt="HDMICAP screenshot"></p>
<p>Anyway these are relatively small problems, it's possible to just aplay whatever.wav if it's 48kHz 2ch 16-bit and audio will come.</p>
Mbed3 starting libwebsockets port2015-11-03T00:00:00+08:00https://warmcat.com/2015/11/03/mbed3-starting-libwebsockets-port<h2 id="mbed3-as-a-foss-project">Mbed3 as a FOSS project</h2>
<p>Right now mbed3 is in a bit of a strange place as a FOSS project. I guess this is a temporary situation but today a couple of things should be understood.</p>
<h2 id="choice-of-apache-for-core-code">Choice of Apache for core code</h2>
<p>First it's surprising to me the core stuff is Apache-licensed. For those, it seems to me it shouldn't be contraversial that fixes and improvements should be given back, since those parts are basically a commons used by all the vendors to provide mbed support. I would have expected the licensing to reflect that. On libwebsockets, I use LGPL2.1 with a Static Linking Exception, which would seem to be a good fit: it's explicitly not viral but it's explicit that that core library code wants your changes.</p>
<p>Other things like examples or that are expected to be built on project by project, sure Apache makes sense, they are there to enable individual usage that can perfectly well be proprietary.</p>
<p>Anyway it's not critical from user perspective, but it may negatively affect the amount and quality of contributions / fixes / discussion on the core code.</p>
<h2 id="contributions-cannot-be-accepted-due-to-cla-not-ready">Contributions cannot be accepted due to CLA not ready</h2>
<p>I sent four separate pull requests fixing broken build and other bugs and code improvements like remove needless include files. I got a variety of responses from "we already fixed it" (but they did not push to the registry...), we will take your patch "later" and finally the truth, which was they want to have a Contributor License Agreement, Ubuntu-style in place before accepting any patches.</p>
<p>So they are on github with all that entails, turning away fixes, because their <em>Apache</em> licensed project doesn't have a Contributor contract basically, which they rather optimistically think I would sign up to. Actually myself and a lot of other people do not like Ubuntu's CLA enforcement and won't sign up on principle.</p>
<p>There is definitely a thing for a while now you can call "github culture", many developers subscribe to a philosophy of fork-and-send-pull-requests specifically on github very casually as they find problems. Plugging into that is a big part of why you would put your project on github in the first place. These are a spread of developers, some of them top-notch, with a current - as in this second - interest in your project that are willing to take time to improve it for you and other users, immediately and unpaid. This can be a huge advantage sometimes dwarfing the internal developer effort.</p>
<p>I guess the people who want the CLA are thinking about maintaining a clear copyright on the code, for, eg, relicensing later. But... it's <em>Apache</em> licensed already. ARM can just fork it and relicense according to Apache terms already. Yes they don't have standing to fight copyright battles for code that came from elsewhere, but as of today all the core guts has come from internal devs, it's not like their code will disappear. And it's <em>Apache</em> licensed, what copyright battles are they expecting?</p>
<p>Anyway let's hope they rethink the CLA thing entirely, or if not, clearly state in their README.mds that contributions involve a CLA (which will turn a lot of people right off).</p>
<h2 id="lws-porting">lws porting</h2>
<p>So... the first move was modify the tcp echo sample application to be a very dumb http server itself... that worked fine. I then cleaned it and changed the style to kernel style since it is easier on my eyes. For reasons that will be explained, you can find this trivial conversion on github</p>
<p><a href="https://github.com/lws-team/mbed3-dumb-http-test">https://github.com/lws-team/mbed3-dumb-http-test</a></p>
<p>That represents the guts of the "mbed3 way" for tcp networking, so I sat and stared at it for quite a while trying to figure out how to apply this in the posix-based libwebsockets code. The first choice was whether to abandon trying to use lws as it is, a CMake project that works on Linux, Windows, OSX, and many other (Posix) platforms and just rip code out of it to build on the dumb webserver until it worked.</p>
<p>I seriously considered that, but there are already many synergies with the existing library. Internally, mbed is using Cmake, lws uses it; mbed is event-driven, so is lws; mbed needs nonblocking, so does lws. And there is just a huge investment in the existing library code over 5 years now, with nearly 100 different contributors. It's better if the end result of this is increase the value of lws rather than fork off into some mbed-specific project that is no longer in sync. And there are many features and code paths in lws that are well-debugged as they are I can get the advantage of.</p>
<p>So I decided to go down the road of porting the library to mbed rather than mine pieces out of it, even though that is going to be a lot more work in the short term.</p>
<h2 id="yotta-project-structure">yotta project structure</h2>
<p>Yotta's strategy is to have a top level "app" who has dependencies inside a generated ./yotta_modules subdir. For development, in git, that means your toplevel app is one git project, excluding ./yotta_modules, and in the toplevel ./yotta_modules, the libraries you are working on have their own git repos (also excluding their ./yotta_modules).</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">./lws-test-server
./yotta_modules
./websockets
CMakeLists.txt
</code></pre></div>
<p>Currently lws contains his own test apps, but that can't fly with Yotta. So the structure is a new test server project, which has lws as a dependency, and develop lws at the same time as a git repo in the dependency directory. It implies if you have n test apps, you need n new toplevel git projects, and symlink to the library git repo for development, but that's a problem for another day since test-server is all I care about at the moment.</p>
<h2 id="yotta-cmake-support">yotta CMake support</h2>
<p>Yotta is evidently designed to work well with "modules" (libraries) as dependencies that themselves use CMake, it seems to label them "existing" modules.</p>
<p>When I built the stub toplevel test app, Yotta understood he was dealing with an "existing" project and built it well immediately. Unfortunately, since it's a mature CMake project that builds on many platforms, that won't work without some CMake option configuration. I looked for how to control that from the toplevel app, but there doesn't seem to be a way. I asked about it</p>
<p><a href="https://github.com/ARMmbed/yotta/issues/556">https://github.com/ARMmbed/yotta/issues/556</a></p>
<p>(Edit: I got a very good reply when the UK woke up suggesting putting the mbed3 specific set()s in the generic library CMakeLists.txt conditional on YOTTA_<modulename>_VERSION_STRING existing... this worked great and I was able to remove my hacks forcing the default states at the definition).</p>
<p>Then it built, but with a buttload of warnings and errors.</p>
<h2 id="mbed3-default-compiler-options">mbed3 default compiler options</h2>
<p>Libwebsockets has -Wall -Werror in his configuration for Linux by default, so I was surprised to see such a spew coming.</p>
<p>It turns out mbed3 default C[XX]FLAGS for gcc has -Wextra, which while I am a fan of listening to the compiler, brings in useless warnings like unused function parameter. There may be very good reasons why a function has a fixed set of parameters that under some conditions like preprocessor options, don't get referenced. At any rate it does not directly imply a problem, and the only solution is fill the code with (void)param; fake references to keep the compiler happy. -Wall already has the more useful unused variable.</p>
<p>Another very annoying useless -Wextra warning is because the code takes advantage of the sparse C99 array initialization syntax while leaving the other members at 0, it complains we did not elaborate every member in the initializer... for each missing member.</p>
<p>I gritted my teeth and "fixed" them in the code though, because it did also enable a useful warning about comparisons between signed and unsigned.</p>
<p>Still it would be cool if the yotta project definition JSON had a way to add / remove flags on the default compiler options as overrides on the target definitions Yocto-style, so we can snip out the warnings of dubious worth individually.</p>
<h2 id="lws-integration">lws integration</h2>
<p>After that it built... lws has a "platform" concept already, with mutually exclusive platform files that get built according to Windows, unix etc. I added lws-plat-mbed3.c and also .cpp that get built for mbed3 platform, and moves the networking class stuff into there from the dumb http server test app.</p>
<p>So after flashing the K64F, he could work as the dumb test app, and he was built against the library, although not using it.</p>
<p>After that, I moved the dumb test app's class into the library, and his member functions to lws-plat-mbed3.cpp, and adapted the toplevel test application to start the library.</p>
<p>That kinda worked...</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">IP: 192.168.2.205:80
4: Initial logging level 255
4: Libwebsockets version: 1.5 2e89c3c
4: per-conn mem: 104 + 2108 headers + protocol rx buf
4: Listening on port 7681
4: Protocol: http-only
callback_http: reason 27
onIncoming
callback_http: reason 17
callback_http: reason 29
4: mbed3_tcp_stream_accept
callback_http: reason 19
4: onDisconnect
</code></pre></div>
<p>He properly starts the library, the logging is working, the callbacks are coming for the http protocol handler in the library when we connect. But he never sees the actual RX payload from the browser any more.</p>
<p>After puzzling about what changed, I eventually realized nothing changed but we take a bit longer after the socket accept before returning to the event loop. So I checked with telnet on port 80, where he connects but doesn't send anything until I type something, and that RX packet was processed properly.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">IP: 192.168.2.205:80
4: Initial logging level 255
4: Libwebsockets version: 1.5 2e89c3c
4: per-conn mem: 104 + 2108 headers + protocol rx buf
4: Listening on port 7681
4: Protocol: http-only
callback_http: reason 27
onIncoming
callback_http: reason 17
callback_http: reason 29
4: mbed3_tcp_stream_accept
callback_http: reason 19
4: onRX
3: x
4: onSent
</code></pre></div>
<p>The corresponding telnet session is like this, which is correct</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">$ telnet 192.168.2.205 7681
Trying 192.168.2.205...
Connected to 192.168.2.205.
Escape character is '^]'.
x
HTTP/1.1 200 OK
Ahaha... hello
</code></pre></div>
<p>I made a github issue report about it, and as part of that confirmed using the dumb test app this was not related to library integration, RX packets are lost if there is even a small delay after the accept. If the RX packet send from the client is delayed in time compared to the accept, as in telnet, there is no problem.</p>
<p><a href="https://github.com/ARMmbed/sockets/issues/35">https://github.com/ARMmbed/sockets/issues/35</a></p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Arm can't accept patches right now (on their <strong>github</strong> projects...) due to sorting out a CLA</p></li>
<li><p>The networking stack is complex and has some bugs (they say as much in the README.md)...</p></li>
<li><p>Yotta recognizes if a module has his own CMake and uses it automagically</p></li>
<li><p>Existing CMake in a module can be configured cleanly by using CMake set()s conditional on existence of YOTTA_<MODULENAME>_VERSION_STRING</p></li>
<li><p>-Wextra is a pain due to some warnings being basically noise</p></li>
<li><p>mbed3 TCP listening doesn't work reliably right now</p></li>
</ul>
Mbed3 diving into network2015-11-01T00:00:00+08:00https://warmcat.com/2015/11/01/mbed3-diving-into-network<h2 id="mbed-example-network">mbed-example-network</h2>
<p>After a couple of days struggling and fixing things (I sent another pull request fixing a #warning in mbed-hal-ksdk-mcu), today I can look at the actual mbed3 examples without build errors (Mbed / me not setting target) or missing libraries (Fedora).</p>
<p>So the first thing was leave mbed-client-examples, which is aimed at their cloud protocol / servers, and checkout the git version of mbed-example-network, which is aimed at generic network activity.</p>
<p>He builds four examples, the most interesting for me of which is "helloworld-tcpclient.bin". This does one fixed thing using "by hand" http directly on the socket. The one fixed thing is reach out to (the old...) server and fetch some text... surprisingly enough "Hello world!".</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">
TCP client IP Address is 192.168.2.205
Starting DNS lookup for developer.mbed.org
DNS Response Received:
developer.mbed.org: 217.140.101.30
Connecting to 217.140.101.30:80
Connected to 217.140.101.30:80
Sending HTTP Get Request...
HTTP Response received.
HTTP: Received 473 chars from server
HTTP: Received 200 OK status ... [OK]
HTTP: Received 'Hello world!' status ... [OK]
HTTP: Received message:
HTTP/1.1 200 OK
Server: nginx/1.7.10
Date: Sun, 01 Nov 2015 00:42:37 GMT
Content-Type: text/plain
Content-Length: 14
Connection: keep-alive
Last-Modified: Fri, 27 Jul 2012 13:30:34 GMT
Accept-Ranges: bytes
Cache-Control: max-age=36000
Expires: Sun, 01 Nov 2015 10:42:37 GMT
X-Upstream-L3: 172.17.42.1:8080
X-Upstream-L2: developer-sjc-indigo-1-nginx
X-Upstream-L1-next-hop: 217.140.101.34:8001
X-Upstream-L1: developer-sjc-indigo-border-nginx
Hello world!
</code></pre></div>
<p>That worked great. But not until you read the source and saw that unlike mbed-client-examples, this code, I guess ported from earlier mbed, sets the serial port for 115200.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">void app_start(int argc, char *argv[]) {
(void) argc;
(void) argv;
static Serial pc(USBTX, USBRX);
pc.baud(115200);
...
</code></pre></div>
<p>The other app defaults to 9600, at least on the one recommended supported board, K64F. So you will see nothing switching between these apps even though they are both official example apps.</p>
<p>I made a bug about it on the github project and offered to send a fix</p>
<p><a href="https://github.com/ARMmbed/mbed-client-examples/issues/32">https://github.com/ARMmbed/mbed-client-examples/issues/32</a></p>
<h2 id="three-notable-things">Three notable things</h2>
<h3 id="1-it-39-s-c">1) It's C++</h3>
<p>Although you can write mainly in C, you have no choice but to frame it inside C++, because mbed3 apis themselves are in C++</p>
<h3 id="2-the-callbacks-are-sophisticated">2) The callbacks are sophisticated</h3>
<p>In app_start(), which is the mbed3 equivalent of main(), he instantiates some things and then immediately goes back to the scheduler (minar) after scheduling a callback to start the test. It's what you would do in a generic event loop, set the state for the next thing you want and then return to the event loop.</p>
<p>The callback also has a specific object instantiation associated with it. They also have some ghetto varargs where you can fix how many args the callback wants and have that delivered at callback time.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> mbed::util::FunctionPointer1<void, const char*> fp(hello, &HelloHTTP::startTest);
minar::Scheduler::postCallback(fp.bind(HTTP_PATH));
</code></pre></div>
<p>So it's basically doing schedule_the_callback(callback_t cb, void *context, ...); in two steps. Since it's mandatory to be defining callbacks a lot, I guess this will be very handy once you get used to it. But it's a C++ believer's way (C would have a state enum, and a switch() to do the callback work).</p>
<h3 id="3-the-class-hierarchy-for-socket-is-cool-send-is-unposixy">3) The class hierarchy for Socket is cool, send is unposixy</h3>
<p>There is a TCPStream class which understands TCP connection state, and fires events as they change (bear in mind though, all events are serialized and never preempt other event handlers). TCPStream has a few carefully-chosen apis</p>
<ul>
<li>connect(), which points to your callback it handle it actually connects</li>
<li>setOnDisconnect(), which... yes the same idea</li>
</ul>
<p>He also inherits from the Socket class (this is nicely done)</p>
<ul>
<li><p>setOnReadable(), again you can tell it what your handler is if some data arrived and can be read, like posix poll() POLLIN</p></li>
<li><p>setOnSent(), I guess this has the same semantic as posix poll() POLLOUT... the implication is if the last thing got sent, you may try to send() something else. It looks like it's the caller's problem to hold the buffer between send() and OnSent() coming... in contrast posix assumes the kernel will buffer it and on return from send(), you don't need to keep it around.</p></li>
<li><p>resolve(), again you tell the stream the handler to call after name resolution completes</p></li>
</ul>
<p>So particularly the send flow has a critical deviation from posix... it's not a ding, these SoCs have way less resources and a different OS architecture than assumed in posix. But for example even though libwebsockets is otherwise quite compatible with mbed3, being singlethreaded and nonblocking, he assumes in many places he can do things like make a buffer on the stack, "send" it, and exit the function, as you can do in posix. So that requires some thought.</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>mbed3 needs a little more consistency about what baud rate the official example apps will use</p></li>
<li><p>HTTP test app works great</p></li>
<li><p>Your code can be C-flavoured C++, but C++ it must be</p></li>
<li><p>class hierarchy seems to be used very wisely and neatly in Socket and friends</p></li>
<li><p>the callbacks carry with them object context and varargs, varargs needing some help by hand in the code</p></li>
<li><p>send() expects you to hold the buffer until it was sent, and you get a callback to tell you that, which is very reasonable under the circumstances. But we're not in Kansas any more posix-wise.</p></li>
</ul>
<p><a href="../../11/03/mbed3-starting-libwebsockets-port.html">Next post about mbed</a></p>
Mbed3 registry and deps2015-10-31T00:00:00+08:00https://warmcat.com/2015/10/31/mbed-registry-and-deps<h2 id="dependency-hell-mbed3-style">Dependency Hell mbed3 style</h2>
<p>As I mentioned although yotta is being Make in terms of 'yotta clean' and 'yotta build', it also feels it should be a package manager, where the "packages" are "yotta modules", ie, chunks of code. And the logic that led to that, leads to the need for a repo to get packages from, which they are bravely calling a "registry".</p>
<p>mbed3 projects all have some JSON to describe them, including versioning, which is a good idea, so the metadata to manage this is around.</p>
<p>Yotta can dump his local package state... as we left it in the last post, I can't build mbed-client-examples and my package state is like this</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">$ yotta ls
mbed-client-examples 1.0.0
┣━ mbed-client 1.2.0
┃ ┣━ mbed-client-c 1.1.1 yotta_modules/mbed-client-c
┃ ┃ ┗━ nanostack-libservice 3.0.8 yotta_modules/nanostack-libservice
┃ ┣━ mbed-client-linux 1.1.0 yotta_modules/mbed-client-linux
┃ ┗━ mbed-client-mbedtls 1.0.7 yotta_modules/mbed-client-mbedtls
┃ ┗━ mbedtls 2.2.0-rc.1 yotta_modules/mbedtls
┣━ sockets 1.0.2
┃ ┣━ sal 1.0.2 yotta_modules/sal
┃ ┃ ┗━ cmsis-core 1.0.1 yotta_modules/cmsis-core
┃ ┣━ core-util 1.0.1 yotta_modules/core-util
┃ ┃ ┗━ ualloc 1.0.2 yotta_modules/ualloc
┃ ┃ ┗━ dlmalloc 1.0.0 yotta_modules/dlmalloc
┃ ┗━ minar 1.0.1 yotta_modules/minar
┃ ┣━ compiler-polyfill 1.1.1 yotta_modules/compiler-polyfill
┃ ┗━ minar-platform 1.0.0 yotta_modules/minar-platform
┃ ┗━ minar-platform-posix * missing
┗━ mbed-example-network 0.1.8
┗━ mbed-drivers 0.6.9 yotta_modules/mbed-drivers
┗━ mbed-hal 0.6.4 yotta_modules/mbed-hal
</code></pre></div>
<p>So minar-platform-posix package / module / chunk of code is needed by minar-platform, which came from the "registry" presumably, but</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">$ yotta install minar-platform-posix
info: get versions for minar-platform-posix
error: minar-platform-posix does not exist in the modules registry. Check that the name is correct, and that it has been published.
</code></pre></div>
<h2 id="when-it-comes-to-targets-posix-mbed">When it comes to targets, posix != mbed</h2>
<p>So you might think the state of their registry is inconsistent and incomplete right now, but it seems some other platform incompatibility might be involved. When I look at minar-platform JSON</p>
<p><a href="https://github.com/ARMmbed/minar-platform/blob/master/module.json">https://github.com/ARMmbed/minar-platform/blob/master/module.json</a></p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> "dependencies": {},
"targetDependencies": {
"mbed": {
"minar-platform-mbed": "^1.0.0",
"cmsis-core": "^1.0.0"
},
"posix": {
"minar-platform-posix": "*"
}
},
</code></pre></div>
<p>So from this we can see "posix" and "mbed" are mutually exclusive platforms that can be targeted by... what... not mbed3... yotta then, and have completely separate dependency requirements.</p>
<p>I assumed posix apis was some plugin to mbed3, since it appears as a module that can be installed. But it seems to be aimed at targets that already have a posix OS, like Linux, and you choose to build for one or the other. Actually if posix stuff can't work in mbed3 I don't want posix stuff right now, I just want to see mbed3 do something, other than blink an LED, that I have sources for end to end on this K64F board.</p>
<p>Looking at the deps, the guy who wants "posix" pieces is none other than minar</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> ┃ ┣━ minar 1.0.1 >=0.6.0,<0.7.0 yotta_modules/minar
┃ ┃ ┣━ compiler-polyfill 1.1.1 yotta_modules/compiler-polyfill
┃ ┃ ┗━ minar-platform 1.0.0 yotta_modules/minar-platform
┃ ┃ ┗━ minar-platform-posix * missing
</code></pre></div>
<p>But as we learned yesterday, minar is an mbed3-specific event loop and I'm building for mbed3, not posix (at least, I hope I am).</p>
<h2 id="digging-into-the-registry">digging into the registry</h2>
<p>There's a http registry frontend here</p>
<p><a href="https://yotta.mbed.com/">https://yotta.mbed.com/</a></p>
<p>and despite there is no catalogue, you can see everything in there with</p>
<p><a href="https://yotta.mbed.com/#search/*">https://yotta.mbed.com/#search/*</a></p>
<p>which is a bit perplexing, 6 of the 60 packages are dummy tests, 20 of them are related to embedded targets as you would expect (K64f-gcc, Nordic variations, etc), 8 are related to x86_64 or osX / iOS... it seems yotta predates mbed3 and has another life somewhere. The rest are mainly frameworks related to security etc. There's nothing in there like "ethernet and tcp support for k64f".</p>
<h2 id="digging-into-yotta">Digging into Yotta</h2>
<p>Yotta has its own site on the mbed.com domain related to mbed3</p>
<p><a href="http://yottadocs.mbed.com">http://yottadocs.mbed.com</a></p>
<p>On there is s tutorial</p>
<p><a href="http://yottadocs.mbed.com/tutorial/tutorial.html">http://yottadocs.mbed.com/tutorial/tutorial.html</a>
/</p>
<blockquote>
<p>In ARM we use yotta to build software for embedded devices - not just desktop computers. When you're compiling the same software for lots of different devices you need a mechanism to do different things, and often to include different dependencies, for each of the different devices.</p>
<p>The yotta target command lets you do this. It defaults to the system you're building on (x86-osx-native on mac, x86-linux-native on linux, etc.) You can display the current target by running yotta target with no argum</p>
</blockquote>
<p>Oh so...</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">[agreen@build mbed-client-examples]$ yotta target
x86-linux-native 1.0.0
linux-native 1.0.0
</code></pre></div>
<p>... it looks like the active targets get stored in the project directory. So I saw k64f-gcc in the 'registry'...</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">[agreen@build mbed-client-examples]$ yotta target k64f-gcc
info: get versions for k64f-gcc
info: download k64f-gcc@0.0.1 from the public module registry
warning: k64f-gcc has invalid target.json:
warning: value OrderedDict([(u'name', u'k64f-gcc'), (u'version', u'0.0.1'), (u'similarTo', [u'k64f', u'ksdk-mcu', u'mk64fn1m0vmd12', u'mk64fn1m0', u'mk64fn', u'freescale', u'cortex-m4', u'armv7-m', u'arm', u'gcc', u'*']), (u'toolchain', u'CMake/toolchain.cmake'), (u'debug-server', [u'echo', u'debug server command not defined']), (u'debug', [u'echo', u'debug command not defined'])]) is not valid under any of the given schemas
warning: similarTo.10 value u'*' does not match u'^[a-z]+[a-z0-9-]*$'
warning: similarTo.10 value u'*' is too short
</code></pre></div>
<p>Ha... the registry seems to have some rotted things in it. The fact 10% of it were temporary test pushes suggests that might be an ongoing problem until they add a catalogue so people maintaining it can see what needs deleting or updating. Better yet get rid of it and replace with a git tree: that way global state of packages can be tagged and returned to if the packageset loses selfconsistency.</p>
<p>Since the targets seem to be per 'project directory', I went back to my modified blinky I had been able to build and looked at its targets</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">[agreen@build test1]$ yotta target
frdm-k64f-gcc 0.2.0
mbed-gcc 0.1.3
</code></pre></div>
<p>Yes he liked that</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">[agreen@build mbed-client-examples]$ yotta target frdm-k64f-gcc
info: get versions for frdm-k64f-gcc
info: download frdm-k64f-gcc@0.2.0 from the public module registry
info: get versions for mbed-gcc
info: download mbed-gcc@0.1.3 from the public module registry
</code></pre></div>
<p>I checked, that package does exist also in the registry. So they should delete the dead k64f-gcc package from there.</p>
<h2 id="change-of-dependency-hell">Change of dependency hell</h2>
<p>Now the target contect for the project was changed to a k64f specific mbed3 one, yotta uses the mbed set of deps from the JSON, and these unfulfillable posix package deps disappear.</p>
<p>We get new missing deps, but they look like they are on the right track</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">agreen@build mbed-client-examples]$ yotta ls
mbed-client-examples 1.0.0
┣━ mbed-client 1.2.0
┃ ┣━ mbed-client-c 1.1.1 yotta_modules/mbed-client-c
┃ ┃ ┗━ nanostack-libservice 3.0.8 yotta_modules/nanostack-libservice
┃ ┣━ mbed-client-mbed-os ^1.0.0 missing
┃ ┗━ mbed-client-mbedtls 1.0.7 yotta_modules/mbed-client-mbedtls
┃ ┗━ mbedtls 2.2.0-rc.1 yotta_modules/mbedtls
┃ ┗━ cmsis-core 1.0.1 yotta_modules/cmsis-core
┃ ┗━ cmsis-core-freescale ^1.0.0 missing
┣━ sockets 1.0.2
┃ ┣━ sal 1.0.2 yotta_modules/sal
┃ ┃ ┗━ sal-stack-lwip ^1.0.0 missing
┃ ┣━ core-util 1.0.1 yotta_modules/core-util
┃ ┃ ┣━ ualloc 1.0.2 yotta_modules/ualloc
┃ ┃ ┃ ┗━ dlmalloc 1.0.0 yotta_modules/dlmalloc
┃ ┃ ┗━ mbed-drivers 0.6.9 >=0.11.1,<0.12.0 yotta_modules/mbed-drivers
┃ ┃ ┗━ mbed-hal 0.6.4 yotta_modules/mbed-hal
┃ ┃ ┗━ mbed-hal-freescale ~0.5.0 missing
┃ ┗━ minar 1.0.1 yotta_modules/minar
┃ ┣━ compiler-polyfill 1.1.1 yotta_modules/compiler-polyfill
┃ ┗━ minar-platform 1.0.0 yotta_modules/minar-platform
┃ ┗━ minar-platform-mbed ^1.0.0 missing
┗━ mbed-example-network 0.1.8
</code></pre></div>
<p>Clearly we're on the right path, but we are not able to satisfy everything from the registry</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">[agreen@build mbed-client-examples]$ yotta install mbed-client-mbed-os
info: get versions for mbed-client-mbed-os
info: download mbed-client-mbed-os@1.1.0 from the public module registry
info: dependency mbed-client-mbed-os: ^1.1.0 written to module.json
info: get versions for cmsis-core
info: download cmsis-core@0.2.7 from the public module registry
info: get versions for ualloc
info: download ualloc@0.0.10 from the public module registry
info: get versions for minar
info: download minar@0.6.7 from the public module registry
info: get versions for core-util
info: download core-util@0.0.16 from the public module registry
info: get versions for mbed-hal-freescale
info: download mbed-hal-freescale@0.5.2 from the public module registry
info: get versions for mbed-hal-ksdk-mcu
info: download mbed-hal-ksdk-mcu@0.5.7 from the public module registry
info: get versions for uvisor-lib
info: download uvisor-lib@0.7.25 from the public module registry
info: get versions for mbed-hal-k64f
info: download mbed-hal-k64f@0.3.6 from the public module registry
info: get versions for mbed-hal-frdm-k64f
info: download mbed-hal-frdm-k64f@0.4.6 from the public module registry
info: get versions for cmsis-core-freescale
info: download cmsis-core-freescale@0.1.4 from the public module registry
info: get versions for cmsis-core-k64f
info: download cmsis-core-k64f@0.1.5 from the public module registry
info: get versions for dlmalloc
info: download dlmalloc@0.0.6 from the public module registry
info: get versions for compiler-polyfill
info: download compiler-polyfill@1.0.4 from the public module registry
info: get versions for minar-platform
info: download minar-platform@0.3.4 from the public module registry
info: get versions for minar-platform-mbed
info: download minar-platform-mbed@0.1.5 from the public module registry
info: get versions for sal-stack-lwip
info: download sal-stack-lwip@1.0.1 from the public module registry
info: get versions for sal-driver-lwip-k64f-eth
info: download sal-driver-lwip-k64f-eth@1.0.2 from the public module registry
info: get versions for sal-iface-eth
info: download sal-iface-eth@1.0.0 from the public module registry
error: sockets does not meet specification ~0.3.0 required by mbed-example-network
error: mbed-drivers does not meet specification ~0.7.0 required by minar
error: cmsis-core does not meet specification ^1.0.0 required by mbedtls
error: minar does not meet specification ^1.0.0 required by sockets
error: core-util does not meet specification ^1.0.0 required by sockets
</code></pre></div>
<h2 id="clue-about-mbed3-hal-and-driver-structure">Clue about mbed3 hal and driver structure</h2>
<p>As an aside even though this thing is stuck in dependency hell, there are interesting signs of how the SoC and board-specific IP are handled. Sockets support lookslike it calls through to a "Socket Abstraction Layer" when then instantiates drivers in a HAL...</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">┣━ sockets 1.0.2
┃ ┣━ sal 1.0.2 yotta_modules/sal
┃ ┃ ┗━ sal-stack-lwip 1.0.1 yotta_modules/sal-stack-lwip
┃ ┃ ┣━ sal-driver-lwip-k64f-eth 1.0.2 yotta_modules/sal-driver-lwip-k64f-eth
┃ ┃ ┗━ sal-iface-eth 1.0.0 yotta_modules/sal-iface-eth
┃ ┃ ┗━ mbed-drivers 0.6.9 yotta_modules/mbed-drivers
┃ ┃ ┣━ mbed-hal 0.6.4 yotta_modules/mbed-hal
┃ ┃ ┃ ┗━ mbed-hal-freescale 0.5.2 yotta_modules/mbed-hal-freescale
┃ ┃ ┃ ┗━ mbed-hal-ksdk-mcu 0.5.7 yotta_modules/mbed-hal-ksdk-mcu
┃ ┃ ┃ ┣━ uvisor-lib 0.7.25 yotta_modules/uvisor-lib
┃ ┃ ┃ ┗━ mbed-hal-k64f 0.3.6 yotta_modules/mbed-hal-k64f
┃ ┃ ┃ ┗━ mbed-hal-frdm-k64f 0.4.6 yotta_modules/mbed-hal-frdm-k64f
</code></pre></div>
<p>That looks like a pretty nice way to deal with the messy underlying reality. I'm still not sure how 'drivers' getting interrupts in the hal part trigger minar events, but presumably there is a way to register and trigger events in the HAL that become serialized and queued at minar.</p>
<h2 id="still-stuck">Still stuck</h2>
<p>At any rate the remaining problems look like inconsistent content in the 'registry'</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">error: sockets does not meet specification ~0.3.0 required by mbed-example-network
error: minar does not meet specification ^1.0.0 required by sockets
error: mbed-drivers does not meet specification ~0.7.0 required by minar
error: cmsis-core does not meet specification ^1.0.0 required by mbedtls
error: core-util does not meet specification ^1.0.0 required by sockets
</code></pre></div>
<p>For example mbed-drivers I just got from the registry is 0.6.9 but minar I got from the registry demands 0.7.0. That's something that could be eased a lot if the last known working packageset could still be referenced (a 'stable' registry, or put it all under git).</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>In each project dir you will build with Yotta, you must set 'yotta target', in my case to frdm-k64f-gcc, because it defaults to trying to build for your build OS</p></li>
<li><p>The package registry has quite a lot of deprecated and unusable junk piling up in it</p></li>
<li><p>The package registry has no history, if you put inconsistent deps in there you break the whole world with no recourse to last-known-good packagesets</p></li>
<li><p>Deps seem broken in there (minar vs mbed-drivers) at the time of writing</p></li>
<li><p>yotta seems to go to github to look for things if they're not in the 'registry' (it seems it looks for projects at <a href="https://github.com/ARMmbed">https://github.com/ARMmbed</a>)</p></li>
</ul>
<p><a href="../31/mbed-fixing-the-client-app-and-library.html">Next post about mbed</a></p>
Mbed3 fixing the client app and library2015-10-31T00:00:00+08:00https://warmcat.com/2015/10/31/mbed-fixing-the-client-app-and-library<h2 id="it-39-s-not-me-it-39-s-broken">It's not me... it's broken</h2>
<p>I posted on the mbed forum and got some help from Stanly88 there</p>
<p><a href="http://forums.mbed.com/t/difficulties-building-mbed-client-examples/614">http://forums.mbed.com/t/difficulties-building-mbed-client-examples/614</a></p>
<p>He at least confirmed even though he built mbed-client-examples before, it was now also broken for him too.</p>
<p>After some headscratching about the build problems, I realized the stuff in git is generally a lot more advanced than some of the package JSON is asking for as dependencies.</p>
<p>After I forced the dep versions to match what's in git, it also became clear that what yotta install can fetch is still lower than the versions in git. I'm not sure if that's because someone's JSON was making it fetch old things from git or something else.</p>
<p>In the end I replaced what it had fetched down ./yotta_modules for uvisor-lib and mbed-example-network with git clones of those projects, and then the dependency stuff was finally satisfied.</p>
<p>But there were two kind of compile error left.</p>
<h2 id="yotta_cfg_uvisor_present">YOTTA_CFG_UVISOR_PRESENT</h2>
<p>Basically the stuff in git currently seems to have not been tested with YOTTA_CFG_UVISOR_PRESENT absent, as it is on K64F.</p>
<p>The first problem was just a missing include, but that also was conditionally included based on YOTTA_CFG_UVISOR_PRESENT.</p>
<p>The second set of problems were caused by the headers declaring fallthrough functions in the preprocessor, AND declaring functions to handle the same thing in code. I added a patch to defeat building the code if no YOTTA_CFG_UVISOR_PRESENT, since the preprocessor way is probably smaller and faster.</p>
<h2 id="yay">Yay</h2>
<p>After that it could build and run.</p>
<p>The mbed-client-examples app is tied up with Arm's cloud solution stuff, which is not what I am interested in at the moment, but it is able to run the ethernet and acquire an IP over DHCP. So that has got the network basics working finally. </p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>It's beta quality as they say, this kind of problem is because it's being worked on heavily... using git versions and not every variation of it is getting tested</p></li>
<li><p>We can replace dirs down ./yotta_modules that came in by yotta install (from the repository) with git clones of the same modules from github, and work on them like that conveniently</p></li>
<li><p>I pushed my modified trees to github and sent pull requests for the patches.</p></li>
<li><p>Plus or minus some fixes, on K64F it works</p></li>
</ul>
<p><a href="../../11/01/mbed3-diving-into-network.html">Next post about mbed</a></p>
Mbed3 and Minar2015-10-30T00:00:00+08:00https://warmcat.com/2015/10/30/mbed-and-minar<h2 id="fedora-bug-with-the-arm-none-eabi-gcc-cs-patch">Fedora bug with the arm-none-eabi-gcc-cs patch</h2>
<p>I found there was already a bug about the lack of libs in Fedora, but nobody had made a patch yet, so I sent mine there</p>
<p><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1260439">https://bugzilla.redhat.com/show_bug.cgi?id=1260439</a></p>
<h2 id="going-past-hello-world">Going past Hello World</h2>
<p>Now the "blinky" hello world app can build and run, it's a bit of a brick wall how to start writing code. Blinky looks like this</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">#include "mbed-drivers/mbed.h"
static void blinky(void) {
static DigitalOut led(LED1);
led = !led;
printf("LED = %d \r\n",led.read());
}
void app_start(int, char**) {
minar::Scheduler::postCallback(blinky).period(minar::milliseconds(500));
}
</code></pre></div>
<p>The docs for mbed3 are still a "work in progress".</p>
<p><a href="https://docs.mbed.com/docs/getting-started-mbed-os/en/latest/Full_Guide/app_on_yotta/">https://docs.mbed.com/docs/getting-started-mbed-os/en/latest/Full_Guide/app_on_yotta/</a></p>
<p>has nothing really beyond blinky.</p>
<p>However even just looking at blinky, there are immediate questions coming like "what the hell is minar"?</p>
<h2 id="minar-is-mbed-39-s-event-loop">Minar is Mbed's event loop</h2>
<p>In the mbed3 "Hello World" app, it initializes via static functions from a class "Scheduler" in a namespace "minar"</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">minar::Scheduler::postCallback(blinky).period(minar::milliseconds(500));
</code></pre></div>
<p>Basically minar is a very simple event loop as described here (the link to here from the "app_on_yotta" page mentioned above is broken)</p>
<p><a href="https://github.com/ARMmbed/minar">https://github.com/ARMmbed/minar</a></p>
<p>So from that we can learn mbed3's general gameplan is everything should be singlethreaded, nonblocking, and get back into minar as quickly as possible.</p>
<p>Which to me sounds very good, since Libwebsockets has had a lot of success using basically that scheme even on processors that have multicore.</p>
<p>(However they seem to have given up trying to convince everyone of the benefits and say they will reintroduce threading in 2016.)</p>
<h2 id="minar-not-preemptive">Minar not preemptive</h2>
<p>Unlike an RTOS which is using at least one interrupt to enforce what should be happening at a given time, Minar seems to be simply an event loop. If something it calls doesn't return for a while, the whole show blocks.</p>
<p>This method may sound trivial to ears that are used to dealing with multithreaded and multicore code, but actually there are some solid advantages to doubling down on this approach.</p>
<ul>
<li>There is no locking</li>
<li>There are no locking bugs (some of which can be very low probability)</li>
<li>There is no locking or thread overhead</li>
<li>The determinism increases</li>
<li>Worst case resource allocation scenarios (memory, power) are easier to prove</li>
<li>Serialized resource allocation means stack or other dynamic areas can be used by multiple functions mutually exclusively, reducing peak usage</li>
<li>It's really simple... Minar has one real function "postCallback"</li>
</ul>
<p>On the other hand unless all the callbacks Minar may visit are written with this in mind, it means Minar may not always manage to look like it is multitasking in the way people expect. But the good news is, at least in Posix it is not difficult to write in nonblocking style once you adjust your thinking a little, and clearly since mbed is relying on this it shouldn't be any tougher here.</p>
<p>So I think that's a good choice for this kind of "userland application" architecture, considering the restrictions of the type of silicon it's targeting.</p>
<p>So far I did not see how interrupt-driven hardware should interface to this event loop. I saw mentions of "drivers" I guess it can be supported there.</p>
<h2 id="extreme-confusion-between-mbed3-and-quot-classic-quot">Extreme confusion between mbed3 and "classic"</h2>
<p>Google and the official site is full of info that does not seem to apply to mbed3, and this info does not come with "mbed Classic" writted on it. For example the official developer.mbed.org site has this from two clicks from the homepage... "Components" and then "Ethernet"...</p>
<p><a href="https://developer.mbed.org/components/cat/ethernet/">https://developer.mbed.org/components/cat/ethernet/</a></p>
<p>But the Freescale K64F board is not mentioned.</p>
<p>And again, two clicks from the homepage (Handbook | TCP IP protocols and APIs) I can get this</p>
<p><a href="https://developer.mbed.org/handbook/TCP-IP-protocols-and-APIs">https://developer.mbed.org/handbook/TCP-IP-protocols-and-APIs</a></p>
<p>So eg the http server example there from 2013 is clearly not for mbed3
<code>
int main (void)
{
</code>
That doesn't use the Minar event loop and nor does this</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> //listening for http GET request
while (serverIsListened) {
...
}
</code></pre></div>
<p>So all the stuff on <a href="https://developer.mbed.org">https://developer.mbed.org</a> is specific to "mbed Classic" and if you want to use it, needs a rewrite for mbed3.</p>
<p>Only the stuff on <a href="https://www.mbed.com/en/">https://www.mbed.com/en/</a> is related to mbed3.</p>
<h2 id="quot-mbed-classic-quot-mass-migration-to-mbed3-shortly">"mbed classic" mass migration to mbed3 shortly</h2>
<p>Arm should really mark up those pages as "mbed classic", the same way they mark up <a href="http://www.mbed.com">www.mbed.com</a> as "mbed beta site" on the top of every page, because people come to these pages via google. It doesn't have to say "Deprecated" but it should show the scope of its relevance.</p>
<p>They are actually explicitly preparing to deprecate everything on developer.mbed.org as written here:</p>
<p><a href="https://www.mbed.com/en/development/software/mbed-os/mbed-os-migration-plan/">https://www.mbed.com/en/development/software/mbed-os/mbed-os-migration-plan/</a></p>
<blockquote>
<p>When mbed, together with partners and developers, have fininshed porting all compatible platforms to mbed OS, we will deprecate and eventually make <a href="http://developer.mbed.org">http://developer.mbed.org</a> read-only. </p>
</blockquote>
<p>They have a timeline which shows a "technology preview" release next month, in Nov 2015.</p>
<p>And in the migration plan, they show the release of the technology preview as the time when people should migrate their board support.</p>
<p>So I think even though it's not really ready, it's better to persist with mbed3 and try to get somewhere by the time it is ready.</p>
<h2 id="blinky-big-gap-device-connector">blinky, big gap, Device Connector</h2>
<p>So I have been looking for how to move off blinky and configure the Ethernet on K64F using mbed3.</p>
<p>After I realized I should ignore developer.mbed.org, I found this repo linked from the mbed3 site</p>
<p><a href="https://github.com/ARMmbed/mbed-client-examples">https://github.com/ARMmbed/mbed-client-examples</a></p>
<p>He contains really a very small bit of cpp that connects to some Arm cloud server that the README says may not be deployed yet. However looking at the README and the code, he feels he can bring up the K64F Ethernet with DHCP, evidently via some other library.</p>
<p>I cloned the repo and followed the build instructions in the README, but</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">$ git clone https://github.com/ARMmbed/mbed-client-examples
Cloning into 'mbed-client-examples'...
remote: Counting objects: 361, done.
remote: Total 361 (delta 0), reused 0 (delta 0), pack-reused 361
Receiving objects: 100% (361/361), 1.89 MiB | 684.00 KiB/s, done.
Resolving deltas: 100% (186/186), done.
Checking connectivity... done.
$ cd mbed-client-examples/
$ yotta build
info: get versions for x86-linux-native
info: download x86-linux-native@1.0.0 from the public module registry
info: get versions for linux-native
info: download linux-native@1.0.0 from the public module registry
info: get versions for mbed-client
info: download mbed-client@1.2.0 from the public module registry
info: get versions for mbed-client-c
info: download mbed-client-c@1.1.1 from the public module registry
info: get versions for mbed-client-linux
info: download mbed-client-linux@1.1.0 from the public module registry
info: get versions for mbed-client-mbedtls
info: download mbed-client-mbedtls@1.0.7 from the public module registry
info: get versions for nanostack-libservice
info: download nanostack-libservice@3.0.8 from the public module registry
info: get versions for mbedtls
info: download mbedtls@2.2.0-rc.1 from the public module registry
info: generate for target: x86-linux-native 1.0.0 at /home/agreen/projects/mbed/mbed-client-examples/yotta_targets/x86-linux-native
-- The C compiler identification is GNU 5.1.1
-- The CXX compiler identification is GNU 5.1.1
-- Check for working C compiler using: Ninja
-- Check for working C compiler using: Ninja -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler using: Ninja
-- Check for working CXX compiler using: Ninja -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /home/agreen/projects/mbed/mbed-client-examples/build/x86-linux-native
[86/124] Building C object ym/mbedtls/source/CMakeFiles/mbedtls.dir/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbedtls/source/entropy_poll.c.o
/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbedtls/source/entropy_poll.c: In function ‘getrandom_wrapper’:
/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbedtls/source/entropy_poll.c:93:13: warning: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration]
return( syscall( SYS_getrandom, buf, buflen, flags ) );
^
[121/124] Building CXX object source/CMakeFiles/mbed-client-examples.dir/home/agreen/projects/mbed/mbed-client-examples/source/main.cpp.o
FAILED: /bin/c++ -Dmbed_client_examples_EXPORTS -O2 -g -DNDEBUG -I/home/agreen/projects/mbed/mbed-client-examples -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-c -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-linux -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-mbedtls -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/nanostack-libservice -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbedtls -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-c/nsdl-c -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-c/source/libNsdl/src/include -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-c/source/libCoap/src/include -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/mbed-client-linux/mbed-client-libservice -I/home/agreen/projects/mbed/mbed-client-examples/yotta_modules/nanostack-libservice/mbed-client-libservice -I/home/agreen/projects/mbed/mbed-client-examples/source -include "/home/agreen/projects/mbed/mbed-client-examples/build/x86-linux-native/yotta_config.h" -MMD -MT source/CMakeFiles/mbed-client-examples.dir/home/agreen/projects/mbed/mbed-client-examples/source/main.cpp.o -MF source/CMakeFiles/mbed-client-examples.dir/home/agreen/projects/mbed/mbed-client-examples/source/main.cpp.o.d -o source/CMakeFiles/mbed-client-examples.dir/home/agreen/projects/mbed/mbed-client-examples/source/main.cpp.o -c /home/agreen/projects/mbed/mbed-client-examples/source/main.cpp
/home/agreen/projects/mbed/mbed-client-examples/source/main.cpp:16:40: fatal error: mbed-net-sockets/UDPSocket.h: No such file or directory
compilation terminated.
[121/124] Building CXX object ym/mbed-client/test/CMakeFiles/mbed-client-test-mbedclient_linux.dir/mbedclient_linux/main.cpp.o
ninja: build stopped: subcommand failed.
error: command ['ninja'] failed
</code></pre></div>
<p>Something is missing from the instructions...</p>
<h2 id="what-we-learned-this-time">What we learned this time</h2>
<ul>
<li><p>Minar is an event loop</p></li>
<li><p>mbed3 wants you to write singlethreaded, nonblocking code (until "2016" when it will grow threading suppposedly)</p></li>
<li><p>You can schedule callbacks from Minar</p></li>
<li><p>Scheduling callbacks is basically the only Minar API</p></li>
<li><p>developer.mbed.org is for "mbed classic" and nothing there is directly useful for mbed3</p></li>
<li><p><a href="http://www.mbed.com">www.mbed.com</a> is for mbed3</p></li>
<li><p>mbed3 has a trivial example (blinky) and a complex example with undocumented dependencies (mbed Client) that I can't build yet, and nothing inbetween that I found yet</p></li>
</ul>
<p><a href="../31/mbed-registry-and-deps.html">Next post about mbed</a></p>
The mbed maze2015-10-29T00:00:00+08:00https://warmcat.com/2015/10/29/the-mbed-maze<h1 id="mbed-and-fedora-arm-none-eabi">Mbed and Fedora arm-none-eabi</h1>
<h2 id="getting-started-is-simple">Getting started is simple</h2>
<p>I decided to take a look at mbed, since it looks like it is going to rule the class of chips around Cortex M3/M4 that don't have enough resources to run Linux.</p>
<p>Mbed clearly has a long history and, well, some "changes in direction".</p>
<p>First the good news, I bought a Freescale K64F eval board, they have done a great job making it easy to work with. Actually the board has the Cortex M4 that was expected, and a second Cortex M4 physical chip who acts as a JTAG and other management interface. So it actually has a JTAG TAP controller built on the board and already connected. As we'll see that comes in handy.</p>
<p>Connectivity-wise it presents as a USB composite device and one of the members of that is Mass Storage, you just mount it (VFAT) and cp your binary file there, it flashes it (literally, it flashes an LED while it does it) and then you reset it into the new firmware.</p>
<p>(There seems to be a "bootloader" concept that needed updating to "support mbed"... the update worked OK but I have no idea what that's about yet, or which Cortex chip took that update).</p>
<p>Another of the composite members is an ACM "modem" (ie, serial port) that's supposedly useful for debugging.</p>
<p>There's a nice getting started page that explains this linked to from a file on the mass storage device.</p>
<h2 id="problem-1-quot-hello-world-quot-doesn-39-t-print-anything">Problem 1, "hello world" doesn't print anything</h2>
<p>I can't tell if ttyACM works or not, because the canonical "hello world" binary just flashes an LED. Well then we can just put a printf or something in "hello world", right? -----></p>
<h2 id="problem-2-how-are-you-meant-to-build-things-in-mbed">Problem 2, how are you meant to build things in mbed?</h2>
<p>Mbed has been around a while and there is evidence of "geological strata" in Google about how you are supposed to actually use it.</p>
<p>Originally it had a concept around running the build system serverside, so you would paste your C into your browser and it would give you back a binary file. Although that's quite an interesting way to deal with building casually, it doesn't seem very scaleable and I am sure the many people making proprietary software weren't too keen on continuously sending all their sources over the internet.</p>
<p>So that seems to have died, and if you clone the mbed github <a href="https://github.com/mbedmicro/mbed">https://github.com/mbedmicro/mbed</a>, he has a list of 7 toolchains he supposedly supports, actually 5 of the 7 are some form of GCC, but still.</p>
<p>And if you believe that README there are dozens of boards that it supports, however after much googling you will find that is what they now call "mbed classic".</p>
<p>The new shiny mbed that is completely different, invalidating all the accreted google knowledge, is "mbed3". Although a lot of the knowledge the mbed formus and google accreted was of limited usefulness, eg</p>
<p><a href="http://forums.mbed.com/t/mbed-os-what-does-it-really-mean/477">http://forums.mbed.com/t/mbed-os-what-does-it-really-mean/477</a></p>
<p>What this means is when googling, you have to first categorize what you're looking at as to if it is related to mbed3 or not. If "not", it's just going to confuse you. The right starting point for mbed3 is here --></p>
<p><a href="https://docs.mbed.com/docs/getting-started-mbed-os/en/latest/FirstProjectmbedOS/">https://docs.mbed.com/docs/getting-started-mbed-os/en/latest/FirstProjectmbedOS/</a></p>
<h2 id="yotta">Yotta</h2>
<p>Yotta is the new mbed3 python-based build-and-package-management system. I think this is a good move, because the previous mbed seemed to have a huge job cut out supporting many different compilers and IDEs. Yotta seems to take config plugins that are each in their own git repo to configure it for the various toolchains. Yotta is fundamentally doing the job of "make". It has concepts like "yotta clean", "yotta build"... I guess as I see more of it, it might become apparent why something new got created.</p>
<p>I was able to install it by following the docs and the provided Fedora-specific dependency install stuff. But I was surprised the first time I set the target, he wants me to log in to yotta.mbed.com with a verified email. That's quite a cultural difference compared to, eg, github that they also use and integrate with. Their default license is Apache, but this "public repo" wants a verified email, I am not quite sure of the logic of that.</p>
<h2 id="yotta-builds-something">Yotta builds something...</h2>
<p>However once I told it my target was frdm-k64f-gcc, in one step it (as suggested by the target name) also understood it should be using the gcc toolchain. Like Ubuntu, Fedora also has a handy prepackaged prebuilt arm-eabi-none toolchain and associated newlib you can just install, so this seems pretty sane.</p>
<p>After that I cut and pasted the 10-line new mbed3 hello world code that actually prints something on serial as well as flash the LED, and it nearly built.</p>
<p>It got confused because the build system it uses, ninja, has the executable name "ninja-build" on Fedora, not "ninja" that it was expecting. I just symlinked to it</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">$ sudo ln -sf /usr/bin/ninja-build /usr/bin/ninja
</code></pre></div>
<p>and it completed the build, but with warnings</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">$ yotta build
info: generate for target: frdm-k64f-gcc 0.2.0 at /home/agreen/projects/mbed/test1/yotta_targets/frdm-k64f-gcc
GCC version is: arm-none-eabi-g++ (Fedora 5.2.0-2.fc22) 5.2.0
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/agreen/projects/mbed/test1/build/frdm-k64f-gcc
[88/131] Building C object ym/mbed-hal...lities/src/fsl_os_abstraction_mbed.c.o
In file included from /home/agreen/projects/mbed/test1/yotta_modules/mbed-hal-ksdk-mcu/source/TARGET_KSDK_CODE/utilities/src/fsl_os_abstraction_mbed.c:19:0:
/home/agreen/projects/mbed/test1/yotta_modules/mbed-drivers/mbed/wait_api.h:19:2: warning: #warning mbed/wait_api.h is deprecated. Please use mbed-drivers/wait_api.h instead. [-Wcpp]
#warning mbed/wait_api.h is deprecated. Please use mbed-drivers/wait_api.h instead.
^
[130/131] Building CXX object source/C...n/projects/mbed/test1/source/app.cpp.o
In file included from /home/agreen/projects/mbed/test1/source/app.cpp:1:0:
/home/agreen/projects/mbed/test1/yotta_modules/mbed-drivers/mbed/mbed.h:19:2: warning: #warning mbed/mbed.h is deprecated. Please use mbed-drivers/mbed.h instead. [-Wcpp]
#warning mbed/mbed.h is deprecated. Please use mbed-drivers/mbed.h instead.
^
[131/131] Linking CXX executable source/test1
</code></pre></div>
<p>Well they said mbed3 was just in beta, these don't seem fatal. It generated a .bin file down ./build.</p>
<p>Actually that was all pretty painless.</p>
<h2 id="problem-3-it-built-but-doesn-39-t-boot">Problem 3: ... it built but doesn't boot</h2>
<p>I copied it to the mass storage mountpoint and umounted, but it just sits there not flashing an LED.</p>
<p>As I mentioned at the beginning, this Freescale K64F board is very cool, he has an OpenOCD compatible companion chip already wired up to the main chip's JTAG. So it is very easy to start up some Jtag-fuelled GDB necromancy on its dead body.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">[root@localhost test1]# yt debug test1
info: found test1 at source/test1
info: preparing PyOCD gdbserver...
info: finding connected board...
info: new board id detected: 02400201A0BA1E755D44E3CD
info: board allows 5 concurrent packets
info: DAP SWD MODE initialised
info: IDCODE: 0x2BA01477
info: K64F not in secure state
info: 6 hardware breakpoints, 4 literal comparators
info: CPU core is Cortex-M4
info: FPU present
info: 4 hardware watchpoints
info: starting PyOCD gdbserver...
info: Telnet: server started on port 4444
info: GDB server started at port:3333
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=arm-none-eabi".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/test1/build/frdm-k64f-gcc/source/test1...done.
info: One client connected!
HardFault_Handler ()
at /home/agreen/projects/mbed/test1/yotta_modules/mbed-hal-k64f/source/bootstrap_gcc/startup_MK64F12.S:259
259 /home/agreen/projects/mbed/test1/yotta_modules/mbed-hal-k64f/source/bootstrap_gcc/startup_MK64F12.S: No such file or directory.
(gdb)
</code></pre></div>
<p>Hmm OK well I am running this on an intermeidary laptop that is physically close to the board, unlike my build machine so I missed copying some files. But looking at line 259, it's the debugging interrupt itself. So I tried gdb backtrace</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">(gdb) bt
#0 HardFault_Handler ()
at /home/agreen/projects/mbed/test1/yotta_modules/mbed-hal-k64f/source/bootstrap_gcc/startup_MK64F12.S:259
#1 <signal handler called>
#2 _GLOBAL__sub_I___cxa_allocate_exception ()
at ../../../../gcc-5.2.0/libstdc++-v3/libsupc++/eh_alloc.cc:307
#3 0x0000773e in __libc_init_array ()
at ../../../../../../newlib/libc/misc/init.c:41
#4 0x00001ac0 in _start () at ../../../../../libgloss/arm/crt0.S:416
#5 0x00001ac0 in _start () at ../../../../../libgloss/arm/crt0.S:416
#6 0x00001ac0 in _start () at ../../../../../libgloss/arm/crt0.S:416
#7 0x00001ac0 in _start () at ../../../../../libgloss/arm/crt0.S:416
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb)
</code></pre></div>
<p>Uhhhh he blew up while trying to init an array in newlib? I installed the matching newlib sources and line 41 in init.c there is calling an array of init functions that seem to be in its own hidden section.</p>
<p>Fedora arm-none-eabi-newlib package did not provide a couple of _nano library variants, for libc++ and libsupc++.... I symlinked them to the non-nano versions of the library. But it seems that will not fly... well it sounded a bit risky, but things linked without errors. Anyway it has to be solved another way, but the root cause for this problem in Fedora turned out to be interesting.</p>
<h2 id="ubuntu-is-not-the-only-distro">Ubuntu is not the only distro...</h2>
<p>The mbed stuff basically assumes you got your toolchain from Ubuntu's gcc-arm-none-eabi package, like this</p>
<p><a href="https://launchpad.net/ubuntu/+source/gcc-arm-none-eabi/15:4.9.3+svn227297-1">https://launchpad.net/ubuntu/+source/gcc-arm-none-eabi/15:4.9.3+svn227297-1</a></p>
<p>however Fedora does have his own arm-none-eabi-newlib package. I downloaded the source RPM and compared it to what's in the specfile for the RPM. There are differences but since the Fedora one is able to generate the non-nano libc++.a and _nano versions of some libraries, clearly the actual problem was something a bit subtle. After poking around in there for a while I noticed a renaming hack in the build scripting preparing the libs it's interested in into the right install place. However hacking that and rebuilding the RPM from scratch did not help nor checking the build flags used in the Ubuntu package generated the C++ libraries.</p>
<h2 id="the-actual-differences-with-_nano-libraries">The actual differences with _nano libraries</h2>
<p>Finally I realized although these are from newlib, newlib and the compiler have a very complicated and messy elationship: the missing libraries are packaged in arm-none-eabi-gcc-cs, and any fixing will have to be done in that package.</p>
<p>Googling around I found Arch had the same problem a while ago, and found their packaging delta used to solve it:</p>
<p><a href="https://projects.archlinux.org/svntogit/community.git/commit/trunk?h=packages/arm-none-eabi-gcc&id=628c8cbc26ee5773937615f91a79a65819cbdc9a">https://projects.archlinux.org/svntogit/community.git/commit/trunk?h=packages/arm-none-eabi-gcc&id=628c8cbc26ee5773937615f91a79a65819cbdc9a</a></p>
<p>Basically this _nano malarky for libstdc++ and libsupc++ boils down to one different compiler option: -fno-exceptions. But in order to get that, we have to rebuild the whole gcc toolchain again, then patch in just those two .a archives into the binary RPM.</p>
<p>So we learned something about the _nano.a libc++ libraries.</p>
<p>I got the Fedora arm-none-eabi-gcc-cs source rpm, unpacked it, and edited the spec file to build twice to different places, and set it going up to the packaged file install to DESTDIR phase (-bi). After letting it build once and seeing what it installed in DESTDIR, I added some shellscript to transfer just the libstdc++ and libsupc++ .a's as _nano.a versions to the main DESTDIR.</p>
<p>The final diff is</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">--- a/arm-none-eabi-gcc-cs.spec 2015-10-30 10:09:04.605086992 +0800
+++ b/arm-none-eabi-gcc-cs.spec 2015-10-30 13:11:01.587794224 +0800
@@ -91,8 +91,12 @@
%build
-mkdir -p gcc-%{target}
+mkdir -p gcc-%{target} gcc-nano-%{target}
+
+#### normal version
+
pushd gcc-%{target}
+
CC="%{__cc} ${RPM_OPT_FLAGS} -fno-stack-protector" \
../gcc-%{gcc_ver}/configure --prefix=%{_prefix} --mandir=%{_mandir} \
--with-pkgversion="Fedora %{version}-%{release}" \
@@ -127,6 +131,48 @@
%endif
popd
+######### nano version
+
+pushd gcc-nano-%{target}
+
+export CFLAGS_FOR_TARGET="$CFLAGS_FOR_TARGET -fno-exceptions"
+export CXXFLAGS_FOR_TARGET="$CXXFLAGS_FOR_TARGET -fno-exceptions"
+
+CC="%{__cc} ${RPM_OPT_FLAGS} -fno-stack-protector " \
+../gcc-%{gcc_ver}/configure --prefix=%{_prefix} --mandir=%{_mandir} \
+ --with-pkgversion="Fedora %{version}-%{release}" \
+ --with-bugurl="https://bugzilla.redhat.com/" \
+ --infodir=%{_infodir} --target=%{target} \
+ --enable-interwork --enable-multilib \
+ --with-python-dir=%{target}/share/gcc-%{version}/python \
+ --with-multilib-list=armv6-m,armv7-m,armv7e-m,armv7-r \
+ --enable-plugins \
+ --disable-decimal-float \
+ --disable-libffi \
+ --disable-libgomp \
+ --disable-libmudflap \
+ --disable-libquadmath \
+ --disable-libssp \
+ --disable-libstdcxx-pch \
+ --disable-nls \
+ --disable-shared \
+ --disable-threads \
+ --disable-tls \
+%if %{bootstrap}
+ --enable-languages=c --with-newlib --disable-nls --disable-shared --disable-threads --with-gnu-as --with-gnu-ld --with-gmp --with-mpfr --with-mpc --without-headers --with-system-zlib
+%else
+ --enable-languages=c,c++ --with-newlib --disable-nls --disable-shared --disable-threads --with-gnu-as --with-gnu-ld --with-gmp --with-mpfr --with-mpc --with-headers=/usr/%{target}/include --with-system-zlib
+%endif
+# --enable-lto \
+
+%if %{bootstrap}
+make all-gcc INHIBIT_LIBC_CFLAGS='-DUSE_TM_CLONE_REGISTRY=0'
+%else
+make INHIBIT_LIBC_CFLAGS='-DUSE_TM_CLONE_REGISTRY=0'
+%endif
+popd
+
+
%install
pushd gcc-%{target}
@@ -138,6 +184,25 @@
make install DESTDIR=$RPM_BUILD_ROOT
%endif
popd
+
+##### nano version
+
+pushd gcc-nano-%{target}
+mkdir -p $RPM_BUILD_ROOT/nano
+make install DESTDIR=$RPM_BUILD_ROOT/nano
+pushd $RPM_BUILD_ROOT/nano
+for i in libstdc++.a libsupc++.a ; do
+ find . -name "$i" | while read line ; do
+ R=`echo $line | sed "s/\.a/_nano\.a/g"`
+ echo "$RPM_BUILD_ROOT/nano/$line -> $RPM_BUILD_ROOT/$R"
+ cp $line $RPM_BUILD_ROOT/$R
+ done
+done
+popd
+
+rm -rf $RPM_BUILD_ROOT/nano
+popd
+
# we don't want these as we are a cross version
rm -r $RPM_BUILD_ROOT%{_infodir}
rm -r $RPM_BUILD_ROOT%{_mandir}/man7
</code></pre></div>
<p>After upgrading to the RPMs created by the modified specfile, the missing libstdc++_nano.a and libsupc++_nano.a libraries get installed properly.</p>
<p>Yotta can link against them... and the LED flashes on the target.</p>
<h2 id="end-result-of-initial-mbed-quot-getting-started-quot">End result of initial mbed "Getting Started"</h2>
<ul>
<li><p>Only mbed3 seems to have a future</p></li>
<li><p>mbed3 needs Yotta to manage packages and do the build, that seems to work well</p></li>
<li><p>Mbed3 doesn't work on many boards yet, this Freescale one is the only fully supported one; they also have two others kinda working. What they're already calling "mbed classic" supports many more boards and IDEs but seems to have no future.</p></li>
<li><p>Fedora arm-none-eabi toolchain is not adequate to work with mbed and needs the fix in the package specfile above</p></li>
<li><p>The new mbed3 'hello world' app does print stuff on the ttyACM0 composite member, but you must set baud rate to 9600 at least by default.</p></li>
<li><p>The Freescale approach of a built-in OpenOCD JTAG + mass storage + ttyACM is very, very, very cool. You can literally walk up to it and use gdb on it even if it crashed to hell, that and the reflash and ttyACM is coming from a single USB connection. And using JTAG for reflash means it's unbrickable. However the goodies in the companion chip software seem to be closed licensed.</p></li>
<li><p>Using Apache license might cause some challenges, since in this market many companies are fiercely proprietary. The Linux-style culture of giving back may not properly appear, there are many unanswered questions on the forums for example. Similarly it's a bit strange Arm want a confirmed email address to get things into yotta.</p></li>
</ul>
<p><a href="../30/mbed-and-minar.html">Next post about mbed</a></p>
HDMI Capture and Analysis FPGA Project 62015-10-25T00:00:00+08:00https://warmcat.com/2015/10/25/hdmi-capture-and-analysis-fpga-project-6<h1 id="part-6-6-hdmicap-vs-hikey-hdmi">Part 6 / 6: HDMICAP vs Hikey HDMI</h1>
<h2 id="recap-on-hikey-video-flow">Recap on Hikey video flow</h2>
<p>If you recall, hikey has a bit of a complex video output path.</p>
<p><img src="/hikey-video-path.png" alt="Hikey video path"></p>
<p>In this system, the CRTC part is ultimately resposible for generating the video timings, and he is driven by a clock we will call the "pixel clock". So ultimately the time allotted for one frame is decided in units of the CRTC's pixel clock</p>
<p><img src="/crtc-vsync.png" alt="Hikey video path"></p>
<h2 id="crtc-vsync-is-the-boss">CRTC VSYNC is the boss</h2>
<p>Everything else that happens to the video stream in the processing scheme, turns out to be secondary to that fact,</p>
<p>Just before I decided to build hdmicap, I found a clue about the video modes that had a lot of incompatibilities with TVs, that they VSYNC was not aligned with their HSYNC.</p>
<p><img src="/TEK0017.JPG" alt="Broken HSYNC to VSYNC relationship"></p>
<p>However nobody understood why this should be so, there are no registers in hikey that allow offsetting of VSYNC horizontally.</p>
<h2 id="debugging-the-vsync-offset">Debugging the VSYNC offset</h2>
<p>As you can see in the picture, the DSI + HDMI parts of the processing take a different clock than the CRTC part.</p>
<p>In the modes where the VSYNC is offset, a different clock is used for each part, eg, on 576p the CRTC gets 26.66MHz, but the DSI part gets 8 x 26.8MHz (in DSI it sends a byte per lane, so this translates to whole pixels at a rate of 26.8MHz. And the HDMI clock is directly derived from the DSI clock, so that too is 26.8MHz (with data coming at 10 x that as explained before).</p>
<p>The existing code in hikey video stack sets the intended timing exactly at the CRTC, and modifies the blanking numbers at the DSI IP to take account of the difference in clocks... it computes the absolute time in pixel clocks and the divides that by the period of the DSI byte lane clock.</p>
<p>This gives the output timing shown below</p>
<p><img src="/Screenshot%20from%202015-09-29%2011-59-47.png" alt="Output timing with CRTC set correctly and DSI adaptively"></p>
<p>Basically the timing is all slightly off from the correct, official CEA numbers on the left (hdmicap has acquired the VIC sent by the ADV7533 and automatically shown the correct numbers there).</p>
<p>I assumed that this was the basic problem, so I altered the video stack code so that it sets the DSI timing registers with the canonical, CEA numbers, and adaptively computes the timing numbers for the CRTC part, taking into account the different clock rates. That got me a "working" result, with perfect timing numbers coming through DSI and the HDMI encoder chip...</p>
<p><img src="/Screenshot%20from%202015-09-30%2007-04-17.png" alt="Output timing with DSI set correctly and CRTC adaptively"></p>
<p>... but it did not eliminate the horizontal offset of the VSYNC.</p>
<h2 id="fudging-by-one-pixel-per-line">Fudging by one pixel per line</h2>
<p>I also tried fudging the blanking by one pixel more or less per line, that did not solve the VSYNC offset problem either, just moved it around by some amount.</p>
<p>Finally after some consideration I realized... the VSYNC is effectively asynchronous to whatever happens on the DSI clock domain: the VSYNC is coming directly from the CRTC pixel clock domain.</p>
<p>If it wasn't like that, because the clocks in the two domains are different, over time the DSI side would slip further and further away from matching the CRTC timing, until we are having to buffer whole frames, and there'd be no end to it as it kept going.</p>
<p>As it is, in the DSI and / or HDMI encoder there is at least one line buffer of pixels kept in a FIFO, to hide the accumulating small differences in line time between the two clock domains. Here is a capture of what happens when I fudged an extra DSI pixel time in the blanking as as test, as mentioned above...</p>
<p><img src="/Screenshot%20from%202015-09-30%2005-19-13.png" alt="Excessive drift exceeds DSI line buffer"></p>
<p>Partway through the raster, the CRTC's opinion on where it is in the raster, using the pixel clock and CRTC timings, and the DSI + HDMI side's opinion of where it is in the raster, using the DSI clock and DSI timings (fudged to use an extra DSI clock time in HBP), got so far out of whack that the pixel FIFO could no longer cover for the difference.</p>
<h2 id="realization-about-retimed-hsyncs">Realization about retimed HSYNCs</h2>
<p>I could get rid of this by removing the extra fudged pixel time, but after thinking about it, this has another very important implication. We know with that set of timings on CRTC and DSI sides, that during the frame the timing is drifting further and further apart, the FIFO does its best to cover for it but eventually it runs out. But generally... for the top 1/3rd before it breaks down, the left edge of the active region is a straight line.</p>
<p>However we KNOW that our DSI hsync is drifting by one or more pixels per line, it was already inexact due to the different clocks involved, and we added one more fudge DSI pixel time. The fact that it comes out of HDMI aligned with the pixels from the start of the original CRTC HSYNC, means that DSI is issuing retimed HSYNCs, as well as getting pxel data shifted in time thanks to the FIFO - it's not trying to stay consistent with the original CRTC HSYNC time.</p>
<p>In other words, after DSI, we are issuing retimed, DSI-originated HSYNCS and pixel data delayed to match, but original, CRTC-originated VSYNCS.</p>
<h2 id="effect-on-vsync-relationship-to-hsync">Effect on VSYNC relationship to HSYNC</h2>
<p>The original CRTC-originated VSYNC has no clue later IPs retimed HSYNC, he just does his thing. That is why we see VSYNC coming partway through a line, we have "bulged" our retimed frame due to there being no way to express the same period exactly using integer amounts of CRTC clocks (26.66MHz) and DSI clocks (26.8MHz).</p>
<p><img src="/video-timing-2.png" alt="HSYNC drift during frame"></p>
<p>The end result is the original VSYNC comes at his original time and chops off part of the last line. Since VSYNC is defined to be an integer number of HSYNCs, this fatally confuses some TVs.</p>
<p>The only tool to affect it is add or remove one pixel per line in blanking... but it's a very blunt instrument, if there are 625 lines in the mode, you can only affect it in units of 625 pixel-times, so actually there is no chance to do anything.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The end result of all this was I could finally understand what was going on leading to the offset VSYNC problem. But having understood it, there was nothing I could see to do about it. It is possible to retime syncs in ADV7533 but whatever we do there cannot hide the timing of the original CRTC frame sync, because DSI itself gets reset to that each frame, which means there is a discontiguity in HSYNC timing each new frame.</p>
<p>Serveral modes on Hikey seem unaffected and have much better compatibility, such as 1080p and 800x600... these happen to have the same clock on both CRTC and DSI sides. So it's recommended by Linaro now that if using this video output scheme, to try to arrange that the same clock source is providing the clock for both CRTC and DSI / HDMI sides.</p>
<p>I hoped you enjoyed reading about hdmicap and the debugging adventures with it ^ ^</p>
HDMI Capture and Analysis FPGA Project 52015-10-25T00:00:00+08:00https://warmcat.com/2015/10/25/hdmi-capture-and-analysis-fpga-project-5<h1 id="part-5-6-build-your-own-hdmicap">Part 5 / 6: Build your own HDMICAP</h1>
<p>Since HDMICAP was designed as a one-off tool to scratch an itch, the only version of it is basically a prototype.</p>
<p>So there is no support on the Z-turn dev board for hdmi input, and it was unclear if the whole thing would even work when I started out.</p>
<p>For that reason I chose the basically cut up an hdmi cable and wire it to the Z-turn by hand. Even if you are motivated to make the simple PCB necessary to remove this step, you will probably want to do this one time for the same reason, to confirm everything works.</p>
<h2 id="z-turn-connector">Z-turn connector</h2>
<p>Everything connects to one 80-pin connector on Z-turn, all at one end of it. Only the TMDS data capture is supported right now, so with the exception of shorting hpd to 5V so the source hpd feels he is connected, signals unrelated to the TMDS data are currently unconnected.</p>
<p><img src="/hdmi-cn1.png" alt="Z-turn CN1"></p>
<p>Basically then we need to hook up four differential pairs and for each of + and - in the pair, provide 51R termination to 3.3V from the connector.</p>
<p><img src="/hdmi-term.png" alt="termination"></p>
<p>First apply some insulating tape to the back of the Z-turn board so you won't accidentally short anything.</p>
<p>When you cut the HDMI cable, you will find 19 conductors plus an outer braided shield.</p>
<p>The differential pairs are twisted together and separately shielded; there's an uninsulated individual shield wire along with the differential pair.</p>
<p>Before soldering anything, you must lay out the cable and firmly anchor it to the zturn board using cable ties on at least two mounting holes at right angles preferably. It's because of the weight of the hdmi cable and the fragility of the soldered connections, we cannot allow any flex or twist in the cable to get near the soldering.</p>
<p>The pairs consist of a white wire (- in the pair) and a coloured partner (obviously, +). The colour indicates which pair it is as below...</p>
<p><img src="/hdmi-cable.png" alt="HDMI cable structure"></p>
<p>As shown the brown wire (5V) needs to be shorted to the white wire in the spare pair (hpd).</p>
<p>The last step is hook all the shields (the individual pair shield wires, and the outer braid) and connect to 0V on CN1 connector.</p>
<p>You must keep the wire lengths the same during this and as short as possible, the end result should look something like this...</p>
<p><img src="/20151020_193805.jpg" alt="Wired up cable"></p>
<p>It's not going to win any beauty contests but it should take about an hour, cost almost nothing, and assuming you wired it correctly, it works and has enough data integrity for 720p as well.</p>
<h2 id="hdmicap-software">HDMICAP software</h2>
<h3 id="kernel">kernel</h3>
<p>Z-turn comes with a uSD card with an old kernel.</p>
<p>HDMICAP uses a mainline kernel 4.2-rc8 with both the hdmicap kernel driver added and a heavily fixed driver for Xilinx's generic dma IP, which is not in mainline and completely broken in the versions sent on lkml and in Xilinx's own kernel repo.</p>
<p>You can find the kernel git repo <a href="https://git.linaro.org/people/andy.green/hdmicap-linux.git">here</a></p>
<h3 id="hdmicap-userland-app">hdmicap userland app</h3>
<p>The userland app is written in C and is easist native-built. It has one dependency on current libwebsockets.</p>
<p>hdmicap git repo is <a href="https://git.linaro.org/people/andy.green/hdmicap-server.git">here</a> and libwebsockets is <a href="http://git.libwebsockets.org">here</a></p>
<h3 id="fpga-firmware">fpga firmware</h3>
<p>The fpga firmware is a bit difficult to provide as a project since it's intimately connected with nonfree Xilinx IPs to make it work.</p>
<p><a href="https://warmcat.com/hdmicap-boot.tar.gz">here</a> is a tarball of fpga bitsream and kernel binary pieces to configure hdmicap fpga and boot to a matching mainline kernel.</p>
<p>A git repo with the vhdl for hdmicap function itself that I wrote is <a href="https://git.linaro.org/people/andy.green/hdmicap-vhdl.git">here</a> but this is incomplete in terms of reproducing the bitstream. If there's enough interest I'll try to figure out how to share the whole project in a way the Xilinx IPs can be added back in so it's buildable.</p>
<h2 id="continued">Continued</h2>
<p>The <a href="../25/hdmi-capture-and-analysis-fpga-project-6.html">the last part</a> in this series discusses what I found about Hikey / 96boards using the hdmicap analyzer.</p>
HDMI Capture and Analysis FPGA Project 42015-10-23T00:00:00+08:00https://warmcat.com/2015/10/23/hdmi-capture-and-analysis-fpga-project-4<h1 id="part-4-6-hdmicap-driver-and-userland-software">Part 4 / 6: HDMICAP Driver and userland software</h1>
<p>Capturing the stats and data in hardware is only half the problem, we need to make it available and easily consumed. We use Linux on the CA9s in the Zynq SOC with a kernel driver plus userland networking stack.</p>
<h2 id="software-overview">Software overview</h2>
<p><img src="/hdmicap-software-stack.png" alt="HDMICAP software stack"></p>
<p>The Xilinx tools generate a binary file which is loaded into the programmable logic at bootloader time.</p>
<p>The Linux kernel gets a driver added which allows the hardware to be queried and controlled using sysfs and a debugfs node which provides file semantics compatible with dd, for the bulk DMA data. So DMA captures may be dd'd out into userland, opened in gimp etc.</p>
<h2 id="sysfs-interface">sysfs interface</h2>
<p>The main sysfs node 'status' issues a JSON structure that contains all the information around captured statistics that are valid.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> # cat /sys/devices/soc0/amba/43c10000.hdmicap/state
</code></pre></div>
<p>At the lowest level, the MCMM (PLL) used to regenerate the x5 clock must report it is locked, otherwise no other information is valid.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">{
"pll_locked":"0",
}
</code></pre></div>
<p>If it is locked, it checks to see if the sync level (whether HSYNC and VSYNC are active low or high) has been successfully autodetected by the driver yet.... without that, none of the measurements related to raster timing will be valid.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">{
"pll_locked":"1",
"syncs_detected":"0",
"sanity":"0",
"pxclk_MHz":"74.255587",
"ctrl_ch0_pc":"37067.5",
"ctrl_ch1_pc":"9755.0",
"ctrl_ch2_pc":"9755.4",
"dvi":"0",
}
</code></pre></div>
<p>The ctrl_chx_pc members contain the number of control period symbols seen on that channel during the (short) sample period. If there are no data island periods, then these numbers should be about the same for all channels. If there are data island periods, ch0 should have many more control symbols. This is useful if you suspect problems with, eg, physical connectivity or readability of one of the data channels.</p>
<p>The "sanity" member is a summary of if the timing measurements pass some simple tests for making sense.</p>
<p>If the PLL lock is there and the driver found the HSYNC and VSYNC polarity, the whole JSON state is sent.</p>
<div class="highlight"><pre><code class="language-text" data-lang="text">{
"pll_locked":"1",
"syncs_detected":"1",
"sanity":"1",
"pxclk_MHz":"74.255587",
"ctrl_ch0_pc":"37067.5",
"ctrl_ch1_pc":"9755.0",
"ctrl_ch2_pc":"9755.4",
"dvi":"0",
"hpol":"+",
"vpol":"+",
"hact":"1280",
"vact":"720",
"hrate_kHz":"55.214",
"vrate_Hz":"73.619",
"htot":"1980",
"hbl":"700",
"hsa":"40",
"hfp":"440",
"hbp":"220",
"vtot":"750",
"vbl":"30",
"vsa":"5",
"vfp":"5",
"vbp":"20",
"px_in_frame":"1485000",
"htot_in_frame":"750.000",
"integer_htot_in_frame":"1",
"line_timestamp":"457",
"frame_timestamp":"17577",
"data_px_active":"416045",
"data_px_active_line":"577.840",
"data_vb":"23281",
"ctrl_px_active":"85640",
"ctrl_px_active_line":"118.944",
"ctrl_vb":"33977",
"est_video_clocks":"924617",
"actual_video_clocks":"921600",
"terc4_errs_frame":"766",
"vsync_onset_hpx":"0"
}
</code></pre></div>
<p>There are also nodes that control various things and can trigger data capture.</p>
<h2 id="debugfs-interface">debugfs interface</h2>
<p>The driver also presents a debugfs node that provides the file semantics needed by, eg, dd</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> /sys/kernel/debug/framestore
</code></pre></div>
<p>To grab one complete frame, raw but with active video converted to RGBA</p>
<div class="highlight"><pre><code class="language-text" data-lang="text"> # echo 0 > /sys/devices/soc0/amba/43c10000.hdmicap/frames
# echo 1 > /sys/devices/soc0/amba/43c10000.hdmicap/grab
# dd if=/sys/kernel/debug/framestore of=dump.rgba
11601+1 records in
11601+1 records out
5940004 bytes (5.9 MB) copied, 0.184539 s, 32.2 MB/s
</code></pre></div>
<p>The end of the "file" in debugfs is decided by the end of the DMA action, for capture of frames that's at the end of the last frame. In this case it's one 720p frame (htot x vtot, ie, active video and blanking together).</p>
<h2 id="automatic-statistics-and-image-polling">Automatic statistics and image polling</h2>
<p>The kernel driver's commandline interface is actually quite usable, but for casual users, like production test or for continuous debugging, it's not the friendliest UI.</p>
<p>So there is also a userland application, 'hdmicap', which reads the sysfs nodes and automatically cycles between reading the statistics counters, taking video frame samples, and capturing frame data island content and stats.</p>
<h2 id="summarizing-data-islands">Summarizing Data Islands</h2>
<p>When analyzing the data island content, hdmicap summarizes counts by type, and extracts selected information from the data packets.</p>
<p>One piece of information that's particularly interesting is in the AVI infoframe (type 0x82), this is a "VIC" byte that can be sent by the source to summarize which standardized mode the source is sending. There's a seprate standard CEA-861 which defines timing on a whole bunch of common modes, usually known as "CEA modes".</p>
<p>If we see the VIC byte coming, it implies what the rest of the timing and clock "should" be. So we can compare the CEA info against what we actually measure to assess the correctness of what was actually sent against what it claims it was sending.</p>
<h2 id="adding-http-and-ws-in-userland">Adding HTTP and WS in userland</h2>
<p>The 'hdmicap' userland application also uses HTML and Javascript to make a browser-compatible server than can be used over the network.</p>
<p>In addition when serving to the network, hdmicap expands the JSON from the kernel by adding other information to it, such as counts of data island types and if the CEA VIC appeared, information on the CEA mode timing.</p>
<h2 id="rendering-captured-hdmi-data">Rendering captured HDMI data</h2>
<p>Although we can capture one or more frames to DDR in raw or "active video as RGBA" modes, these are difficult to use in a UI. The full raster 720p capture for one frame is 6MB. So hdmicap spawns imagemagick's convert tool for each captured frame to convert to png. This is then made available at <a href="http://ip/dump.png">http://ip/dump.png</a>.</p>
<p>Unfortunately on cortex A9s, this operation is slow, and how slow depends on the complexity of the image data. So captures lag realtime by 2-5s. However since they complete in a forked process, the rest of the ui is not affected and updates many times a second continuously. The browser updates automatically each time a new png is ready.</p>
<h2 id="continued">Continued</h2>
<p>The <a href="../25/hdmi-capture-and-analysis-fpga-project-5.html">the next part</a> shows how to reproduce your own hdmicap analyzer.</p>
HDMI Capture and Analysis FPGA Project 32015-10-22T00:00:00+08:00https://warmcat.com/2015/10/22/hdmi-capture-and-analysis-fpga-project-3<h1 id="part-3-6-implementing-the-analyzer-fpga-design">Part 3 / 6: Implementing the analyzer FPGA design</h1>
<h2 id="weakest-link-in-the-chain">Weakest link in the chain</h2>
<p>The current generation of Zynq has a limitation about the speed of its receiversin TMDS33 IO mode, the bandwidth is only 700-800MHz. Since the data rate is 10x the pixel rate, it means no matter what we do, we can't reach 1080p's 1.45GHz data rate using these FPGAs.</p>
<p>However, 720p data rate is only 742.5MHz, which we can handle, along with all the slower modes using 74.25MHz or below.</p>
<p><img src="/hdmi-freq-720-vs-1080.png" alt="Frequencies in HDMI"></p>
<p>So HDMICAP is targeted at supporting 720p and below. The next generation of FPGAs coming soon are made on a smaller process and may extend the ability of TMDS33 inputs to acheive higher bandwidths, but we'll have to see.</p>
<p>Even though it can't reach 1080p, if something like HDMI audio works at 720p, so long as there are enough data islands at 1080p it will also work there, so being able to analyze 720p below is useful even for the resolutions that are out of range.</p>
<h2 id="capturing-and-deserializing-the-hdmi-data">Capturing and deserializing the HDMI data</h2>
<p>As I mentioned earlier back in the day a lot of tricks were neeed if you wanted FPGAs to operate at high speeds, that has improved a huge amount but there are still some design choices required to work around the fact that the programmable logic is slower than a direct implementation in silicon.</p>
<p>The cheap Zynq variants 7010 and 7020 don't have built-in hard SERDES (Serializer / Deserializer) blocks needed for high speed serial - parallel conversion.</p>
<p>That's OK, because we have an FPGA... we can roll our own deserializer. However since the data is coming at 742.5MHz, that itself is too fast for the FPGA.</p>
<p>To work around this, the SelectIO blocks in the FPGA support a DDR capture mode, where they capture on both edges of a clock at half the data rate. In this way we can drive it at 5x px clock rate, or 371.25MHz, and get two samples per clock. That's within the range of the FPGA logic capability, although only just, requiring extreme simplification of what's in this fast clock domain.</p>
<p>So the combination of the TMDS33 bandwidth limitation and DDR SelectIO support in the FPGA means we don't need the hardware SERDES - can't get any advantage from using it - and so the cheapest FPGAs found on Z-Turn are fine.</p>
<p><img src="/hdmi-front-end.png" alt="FPGA HDMI front end"></p>
<p>The clock and basically 3 bits of data come in at the left, go through differential -> single-ended conversion. The clock is then multiplied by 5 times and then used to sample the data on the 3-bit data bus on each edge, generating 3 x 10 samples per original pixel clock.</p>
<p>This is then captured into a 20-bit shifter for each channel. That's all that's in the 371.25MHz domain. The rest of the HDMI processing logic runs at the pixel clock rate of 74.25MHz which the FPGA can deal with much easier.</p>
<p>Because it's not known where we are in a 10-bit word in the channel bitstream, there is a per-channel programmable alignment unit that selects 10 bits from the last 20 bits. The driver uses this channel-by-channel to compare how many control symbols are captured on the channel for each of the 10 possible offsets, and picks the highest one.</p>
<p>At that point, we have recovered the 30 bits per pixel as a 30-bit bus coming at the pixel clock rate, which is the end of the front-end processing.</p>
<h2 id="per-channel-decoders-and-protocol-state-machine">Per-channel decoders and protocol state machine</h2>
<p>HDMI has three main codings</p>
<ul>
<li>10b2b Control period coding</li>
<li>10b8b Active video coding</li>
<li>10b4b TERC4 Data Island coding</li>
</ul>
<p>There are also a couple of extra control symbols used as part of transitioning between control period and data islands, and the active video period.</p>
<p>The FPGA runs all the decoders in parallel and the state machine sorts out what it must mean for what state we are in, and uses the output of the appropriate decoder accordingly.</p>
<p><img src="/hdmi-raw-processing.png" alt="FPGA HDMI raw processing"></p>
<p>The state machine is quite complex but its main trick is that seeing a control period symbol will reset it to know it is in blanking, in this way it can reliably acquire and keep sync with the HDMI data stream.</p>
<p>The state machine also implements the logic to choose from where to take the hsync and vsync state during blanking, as mentioned before in HDMI that information is carried in a completely different coding when in a data island, the first channel carries it then using TERC4. The state machine produces a consistent view of the syncs hiding these details, and also DE an "Active Video Period" signal indicating the decoded pixel data is valid.</p>
<p>Lastly it collects together 36-byte data island packets and buffers them.</p>
<h2 id="raster-measurement-and-statistics-collection">Raster measurement and statistics collection</h2>
<p>Now reliably decoded syncs are availble, they can be measured to discover the timings of the video raster.</p>
<p><img src="/hdmi-fpga-stats.png" alt="FPGA HDMI raster statistics"></p>
<p>In addition, the HDMI pixel clock frequency is measured by comparing it to the known 100MHz AXI clock also available in the FPGA.</p>
<p>There is a specific counter for the number of pixel clocks between VSYNCs, ie, in one frame. Normally this would not be needed, since you can just multiply HTOT x VTOT to know it. However this was added because I found that is not always the case, as we will discuss later. It's an interesting case where a specific measurement was added to an FPGA-based analyzer to capture debugging information we didn't have any reason to be expect to be useful at the outset.</p>
<p>There are actually several other measurements done in this block such as VSYNC horizontal offset related to the above.</p>
<h2 id="interface-to-soc-axi-bus">Interface to SoC AXI Bus</h2>
<p>Finally an AXI peripheral is instantiated to contain the read - write registers to control everything. This appears as a normal IP on the SoC, it's set up in Device Tree and we use writel() and readl() same as we would a hardwired SoC peripheral.</p>
<p>A Scatter-Gather DMAC is also instantiated so the captured data can be stored on the DDR3 in realtime. There's a choice of what is DMA'd:</p>
<ul>
<li><p>completely raw 30-bit captures which are unsynchronized, so you can capture data even if something is completely broken with, eg, syncs in what you are sending</p></li>
<li><p>raw data but frame-synchronized and the active video part converted to RGB888</p></li>
<li><p>capture only the data island packets (this lets you, eg, capture the raw pcm audio samples and reconstruct them into a wav file)</p></li>
</ul>
<p><img src="/hdmi-fpga-dma.png" alt="FPGA HDMI AXI IF"></p>
<p>The DMAC is a full AXI DMAC with scatter-gather capabilities, it has a limitation its maximum DMA length is 8MB for one action.</p>
<p>In raw capture mode, you can literally hexdump the capture and see every bit that went out on the HDMI cable exactly as it was captured.</p>
<h2 id="continued">Continued</h2>
<p>Now the FPGA hardware is described, in the <a href="../23/hdmi-capture-and-analysis-fpga-project-4.html">the next part</a> we take a look at the software arrangements that run on Linux on the same SoC with the FPGA.</p>
HDMI Capture and Analysis FPGA Project 22015-10-21T00:00:00+08:00https://warmcat.com/2015/10/21/hdmi-capture-and-analysis-fpga-project-2<h1 id="part-2-6-hdmi-on-the-wire">Part 2 / 6: HDMI on the wire</h1>
<h2 id="details-of-hdmi-protocol-layers">Details of HDMI protocol layers</h2>
<p>So I guess readers know broadly how DVI and HDMI work in outline, it serially transmits pixel data that's reconstructed at a receiver.</p>
<p>But there are a surprising amount of details and room for implementation differernces.</p>
<p>At the bottom-most layer is the electrical interface.</p>
<h2 id="tmds33">TMDS33</h2>
<p>DVI and HDMI specify that the same differential signalling is used on all the high-speed lines. Differential signalling is a very old and good technique for increasing signal to noise and resitance to common-mode noise. In the olden days transmission lines using this method were called "balanced".</p>
<p>Instead of sending the signal on one wire (referenced say to 0v), you send the signal on two wires, known as + and -. The - signal is the same signal as +, but inverted (so if + is 1, - is 0 and vice versa). When the signals are combined, "common-mode" noise, noise that appears on both signals the same, tends to be cancelled out. And when differential signals are routed for long lengths, the wires are twisted togther so their emissions also tend to cancel.</p>
<p>HDMI needs more bandwidth than can be sent on one differential pair. So they use 3 data pairs and one clock pair, plus some auxiliary signals. (Two of the auxiliary signals are also in a twisted pair physically, but they are not TMDS33 or related to the HDMI data stream.) In total there are 19 conductors in the familar HDMI cable.</p>
<p><img src="/cable-wiring.png" alt="HDMI Cable wiring"> </p>
<p>In DVI and HDMI the 3 data lanes and the clock lane are differential pairs using TMDS33 levels. TMDS requires 51R pullups to 3.3V at the receiver. However the differential voltages themselves are much smaller, on the order of 100 - 200mV. Smaller levels are quicker to reach and allow faster transmission rates.</p>
<p>The presence of these termination pullups is detected by the transmitter and it may suppress data transmission until they are seen, known as "receiver detection".</p>
<p>My 'scope is too weak to record the data faithfully at 742.5MHz clock used in 720p, but below gives you an idea of what the differential signals look like when pulled up to 3.3V and transmitting.</p>
<p><img src="/TEK0019.JPG" alt="TMDS33 signal levels"></p>
<p>It also shows how inadequate a 200MHz bandwidth scope is for looking at this, there are actually ten bit-times shown between the two vertical lines (roughly one 74.25MHz period). HDMI receivers (and the FPGA used here) have higher bandwidth inputs that can resolve the individual bits properly.</p>
<h2 id="tmds33-channels">TMDS33 Channels</h2>
<p>The DVI or HDMI cable itself carries the clock pair and three data pairs that transmit in parallel.</p>
<p><img src="/hdmi-tmds33.png" alt="HDMI data transmission"></p>
<p>Although the actual clock rate is very high, to ease carrying the clock on real cables and reduce RF emissions, the clock that is sent on the HDMI clock differential pair is 1/10th of the rate of the data on the data pairs. This makes the HDMI clock period represent one pixel period, in other words considering there are three data channels, there are 30 bits transmitted per HDMI clock period (== per pixel).</p>
<p>A PLL in the receiver reconstructs the x10 clock and uses it to capture the data on the channels.</p>
<h2 id="tmds-codings">TMDS codings</h2>
<p>Unlike a VGA or earlier cable, there are no wires reserved to carry the sync signalling. Instead, syncs are carried as part of the 10-bits per pixel data from each channel, and they are carried differently according to what else is being sent.</p>
<p>The reason for this is that HDMI is derived from the earlier DVI standard, which had a very simple plan for carrying the syncs using reserved symbols during the whole of the blanking period. But HDMI builds on DVI by allowing new "data islands", using a different coding scheme, to randomly be carried during blanking as well as "control periods", and makes explicit "video periods" that contain the active video data: since DVI doesn't have these extra concepts it means HDMI ended up with two (syncs cannot change during active video data) completely different ways to express HSYNC and VSYNC state in the stream.</p>
<p><img src="/dvi-hdmi-codings.png" alt="DVI + HDMI codings"></p>
<p>The direct logical codings in HDMI then are</p>
<ul>
<li>Active video data using 10b8b coding (providing 8b each for RGB in RGB mode)</li>
<li>Control periods using one of four special 10-bit control codings to express the 4 possible HSYNC + VSYNC states (10b2b)</li>
<li>Data islands using TERC4 (10b4b) coding</li>
</ul>
<p>During the active video region where pixel data is being sent and 10b8b coding is used, there is actually a choice of two codings per byte. Once choice has more zeros and the other more ones. The transmitter selects which coding to use per pixel based on whether it has sent more ones or zeros lately, it keeps a running count and selects the coding to keep the ratio of ones to zeros at 50:50 overall. </p>
<p>TERC4 is a sparse 10b4b coding that is used to carry both HSYNC and VSYNC data and generic data such as HDMI audio samples. TERC4 has its own structure and subchannels, and inside the overall 36-byte packets sent using it, various packet types can be found (including PCM or other audio samples).</p>
<p>Although the Conrtrol Period reserved symbols are unique, actually you can't interpret HDMI data overall without parsing what has been going on before to eg, understand you are in an active video period or data island and track it using a state machine.</p>
<h2 id="types-of-stream-sync-in-hdmi">Types of stream sync in HDMI</h2>
<p>At the high data rates of HDMI, skew between clock and data, or between data channels, or between differential pair members on a single channel or clock can destroy data integrity. These skews depend on the transmitter and cable as much as the receiver, and they vary somewhat with temperature.</p>
<p>So receivers typically have to hunt for the best clock phase with the least error rate... FPGAs have some support for either delaying to clock or individual data channels by small amounts to compensate. All high speed serial buses have some need for this including DDR, SD and PCIe who call it "tuning" or "training".</p>
<p>If we are able to collect bits with low error rate, we need to decide which on the 10 bits per HDMI clock is the first bit. This can be done by trying all 10 and finding out which exhibited the most valid control period symbols.</p>
<p>At that point we are aligned enough we can receive the 30 bit pixel data as it was sent.</p>
<p>The reserved control symbols are then used to align to higher level decoding state to track where we are in the raster, since they don't appear in either TERC4 or 10b8b data.</p>
<h1 id="refresher-on-generic-video-timing">Refresher on generic video timing</h1>
<p>The structure of video streams is still basically the same as used in the first black and white TVs, with CRT type displays.</p>
<p><img src="/generic-video-timing.png" alt="Generic Video Timing"></p>
<p>Basically, there are two kinds of period, active video (black) and blanking (grey). Normally only the active video part is shown on your display, but for our purposes, we are interested in capturing ALL of it.</p>
<p>Originally the blanking was used as the time required to swing the electron beam illuminating the CRT phosphor back to the start of the next line (on HSYNC) and back to the top left (on VSYNC) and it had no other purpose. On analogue TVs audio is sent separately on a different but related frequency. When Colour was added in analogue video, a chroma burst used to sync the phase of a chroma subcarrier was added in the back porch of each line. The sync pulses were encoded as different analogue voltage excursions not used by the video part.</p>
<p>Digital video formats kept the basic HSYNC and VSYNC timing and the concept of blanking intervals. So they still have front and back porches and HSYNC and VSYNC durations. One of the reasons the blanking interval won't die is because it allows you to tune the refresh rate of the video at a given pixel rate without changing the active region; this is how it is possible for HDMI to do both 720p50 and 720p60 using the same 74.250MHz pixel clock... the active video area is the same but the amount of blanking pixels is traded off with the time needed to send ten extra frames per second: 720p50 has an artificially large blanking period as you can see from a real capture below.</p>
<p><img src="/720p50-test-card-768.png" alt="720p50 Active Region vs Blanking"></p>
<p>It actually uses 1980px for each line even though only 1280px contain active video.</p>
<p>Once it was decided that there was an unused blanking interval basically as padding, in a digital protocol naturally minds turn to using the data passed there for something good. On DVI it simply passes 10b2b control period codes that encode the HSYNC and VSYNC state. On HDMI you can still do that, but you have the option to place "Data Islands" in the blanking (these are the red areas in the picture above). After error correction codes are removed, these consist of a 3-byte header and 4 x 7 byte subpackets.</p>
<p><img src="/data-island-packet.png" alt="Generic Video Timing"></p>
<p>The three-byte header is defined in HDMI to contain a packet type, version and length information, and there are a list of packet types such as those carrying PCM Audio samples: at 24bps, the 4 x 7 bytes is enough to carry the 8 channels supporred by HDMI with 4 bytes left over.</p>
<p>Because HDMI sends these data islands using TERC4 (10b4b) on 2 of the three channels, and including error correction, 36 bytes must be passed, each data island packet "costs" 36 pixel-times of blanking plus some overhead to start and stop a data island in the protocol. During this time, the other data channel is used to send HSYNC and VSYNC state also in TERC4.</p>
<h2 id="hdmi-defined-data-island-packet-types">HDMI defined Data Island Packet Types</h2>
<p>HDMI defines the following packet types... it's legal to just ignore them and act like it's DVI, if your HDMI sink doesn't need features like audio. However we are interested in audio and the other information.</p>
<table><thead>
<tr>
<th>Type code</th>
<th>Type meaning</th>
</tr>
</thead><tbody>
<tr>
<td>0x00</td>
<td>Null</td>
</tr>
<tr>
<td>0x01</td>
<td>Audio Clock Regeneration (N/CTS)</td>
</tr>
<tr>
<td>0x02</td>
<td>Audio Sample (L-PCM and IEC 61937 compressed formats)</td>
</tr>
<tr>
<td>0x03</td>
<td>General Control</td>
</tr>
<tr>
<td>0x04</td>
<td>ACP Packet</td>
</tr>
<tr>
<td>0x05</td>
<td>ISRC1 Packet</td>
</tr>
<tr>
<td>0x06</td>
<td>ISRC2 Packet</td>
</tr>
<tr>
<td>0x07</td>
<td>One Bit Audio Sample Packet</td>
</tr>
<tr>
<td>0x08</td>
<td>DST Audio Packet</td>
</tr>
<tr>
<td>0x09</td>
<td>High Bitrate (HBR) Audio Stream Packet (IEC 61937)</td>
</tr>
<tr>
<td>0x0A</td>
<td>Gamut Metadata Packet</td>
</tr>
<tr>
<td>0x81</td>
<td>Vendor-Specific InfoFrame</td>
</tr>
<tr>
<td>0x82</td>
<td>AVI InfoFrame</td>
</tr>
<tr>
<td>0x83</td>
<td>Source Product Descriptor InfoFrame</td>
</tr>
<tr>
<td>0x84</td>
<td>Audio InfoFrame</td>
</tr>
<tr>
<td>0x85</td>
<td>MPEG Source InfoFrame</td>
</tr>
</tbody></table>
<p>0x83 (SPD) is quite interesting, the source device can send a vendor and product description in ASCII. On my laptop, it sends "Intel" and "IntegratedGfx".</p>
<h2 id="continued">Continued</h2>
<p>Now we discussed the problem and the wire protocol, the <a href="../22/hdmi-capture-and-analysis-fpga-project-3.html">the next part</a> discusses how the solution was implemented on FPGA side.</p>
HDMI Capture and Analysis FPGA Project2015-10-20T00:00:00+08:00https://warmcat.com/2015/10/20/hdmi-capture-and-analysis-fpga-project<h1 id="part-1-6-why-an-hdmi-analyzer">Part 1 / 6: Why an HDMI Analyzer</h1>
<p>This series of posts is about an FPGA-based HDMI analyzer I made recently while working on HDMI-related issues for Linaro. It discusses why you might want one, exactly how HDMI works at the wire level and above, and how the analyzer works, and how to build one yourself for ~US$150. And the fpga vhdl (minus the Xilinx pieces), the kernel pieces and the userland pieces are available in git under gpl2.</p>
<p>Here's what the browser-based UI looks like in use on a real 720p HDMI source.</p>
<p><img src="/hdmicap-screenshot.png" alt="HDMICAP screenshot"></p>
<h2 id="brief-detour-on-hikey">Brief Detour on Hikey</h2>
<p>Linaro have recently started making low-ish cost credit-card sized boards
for various SoC, <a href="https://www.96boards.org">96boards</a>.</p>
<p>It's a very crowded market, but since Linaro have access
to their members at a high level, their first one, Hisilicon 'Hikey'
96board is quite interesting, it's a proper 64-bit ARM SoC on there.</p>
<p>Note 1: I do work for Linaro almost since they began, and so have an
indirect interest in Hikey/96boards. But this is not actually the subject of these posts as you'll see.</p>
<p>Note 2: The HDMICAP design described here was paid for by Linaro.</p>
<p>It's still difficult to buy the cheapo *pi type boards with 64-bit. At
any rate since its launch in Feb, although it's pretty usable there are
still some software deficits. Actually by now anyway, these are rather
less than have been typical in this market, for example it has working Mali.
And because they base on Linaro's maintained LSK kernel, there are some
other synergies there, like various things backported.</p>
<h2 id="hdmi-on-hikey-and-other-96boards">HDMI on Hikey and other 96boards</h2>
<p>Some months ago I was asked to look at the HDMI situation on it, although it
had working hdmi since Feb, in fact that was fixed to 720p and had various
compatibility issues. I had worked on hdmi and hdmi audio drivers for a couple of other SoC already, but the setup on Hikey and other 96boards was different
and as it went on, quite interesting.</p>
<p>Hi6220 on Hikey like several other SoC on 96boards doesn't have onchip hdmi
IP, so the scheme used was onchip DSI IP is sent to a DSI-HDMI encoder IC,
<a href="http://www.analog.com/media/en/technical-documentation/data-sheets/ADV7533.pdf">ADV7533</a> (unfortunately the useful datasheet is not public, but kernel drivers are on the way). Sounds simple, right?</p>
<p>Well the DSI unit as really a kind of 'pixel bus encoder' IP, it doesn't
know itself how to create the basic raster (VSYNC, HSYNC, DE) timimg or do the DMA to get the pixel data. So it relies on a generic CRTC also in the SoC to take care of that. Then the whole system overview is like this:</p>
<p><img src="/hikey-video-path.png" alt="Hikey video path overview"></p>
<h2 id="improving-hikey-hdmi-kernel-code">Improving Hikey HDMI kernel code</h2>
<p>After some debugging and general orientation I succeeded to improve the situation, supporting 1080p and other modes using the EDID returned from the monitor. And those patches are now on the main hikey branch. There are still issues though</p>
<ul>
<li><p>made 1080p work by increasing the number of usable DSI lanes to 4</p></li>
<li><p>compatibility with various monitors and TVs differs according to the mode. 1080p is pretty good, so we are lucky there since that'll be the default mode in probably most cases nowadays. These are basically the same compatibility problems found on the original "720p" that was the only mode supported (it's timing was completely unrelated to the standard 720p modes).</p></li>
<li><p>Generally compatibility is better on monitors, rather than TVs. On some modes TVs will show the picture correctly, then blank and reacquire the image briefly before repeating that endlessly</p></li>
<li><p>Hotplug is forced to appear to always be present at ADV7533 (mainly because ADV7533 likes to reset half the chip if he sees hotplug go away)</p></li>
</ul>
<p>To make this remaining compatibility situation a bit more surviveable I came up with and implemented two hacks</p>
<ul>
<li><p>Alt-Gr SysRq G will make DRM cycle through the modes. So if you are confronted with a non-workable default mode, you can try other modes from your EDID "blind" via the keyboard. (You can force the mode from the kernel commandline once you know what works).</p></li>
<li><p>If you boot with the hdmi connector out, and no EDID appears, a canned mode of 720p60 is used. You can then plug the connector after kernel boot and always get 720p.</p></li>
</ul>
<p>However these are basically palliative hacks, and I was very curious what the exact problem was.</p>
<h2 id="difficulties-working-blind-with-hdmi">Difficulties working blind with HDMI</h2>
<p>Basically your TV or monitor is your debugger when it comes to getting HDMI working. If it displays something, you can often get a clue, but more often than not it will just go blank if there's something wrong.</p>
<p>Unlike a traditional system where you basically program the CRTC and see the output directly, in this system the data is only visible after it has been though two separate protocol encoders. So if something is broken, it becomes very hard to be sure where the problem was introduced.</p>
<p>Part of the problem with getting the output modes to bear some relationship with the correct timing was DSI introducing additional delays going to low power mode at different places in the blanking... that was only found by trial and error since there's no way anyone in Linaro has the tools to look at the DSI or HDMI stream.</p>
<p>You can use inferences like measure VSYNC IRQ period as exactly as possible, and even to some extent get a little idea by looking at the HDMI differential data with a 'scope. However you can't get hard data like the exact timings or semantic information that may be encoded in the non-active part of the raster.</p>
<p>So during the weeks I was working blind on it I was increasingly motivated to design something that would get around these problems and let me know what was happening on the wire. Finally since I had some spare time on my main job, I decided to build it. And that's the subject of this and future articles.</p>
<h2 id="hdmi-analyzers">HDMI analyzers</h2>
<p>You can buy HDMI analyzers, but there are basically two kinds, very expensive, presumably nice ones, and ones re-using existing HDMI decoder chips with an MPU to control and read out info from registers in the HDMI decoder chip. Often they just put this info on OSD overlay on the video. Linaro doesn't have a budget for this kind of development hardware.</p>
<p>However what I was looking for was anyway something a bit different, basically 100% wire capture at wire speed. So every bit that is transferred on the HDMI cable should be available in capture memory, not just the active area but the blanking too, where HDMI sends data islands and control periods. For convenience, rather than parse the stream in software, hardware should perform summary measurements on the same data so we can have the data quickly and without effort at the same time.</p>
<p>And although HDMI is today's problem, since it looks like Linaro will be involved with many SoC designs with 96boards, probably another protocol like DSI or something else will be tomorrow's problem, so it should be flexible.</p>
<p>Since I have a long experience with Xilinx FPGAs (going back to XC2064s in the late 1980s...) I chose a <a href="http://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html">Xilnx Zynq FPGA</a>, these are cheap and include deal Cortex A9s. There are a few ready-made boards around with them, I chose <a href="http://www.myirtech.com/list.asp?id=502">Z-turn</a> from MYIR, since it's only US$119 for the larger 7Z020 version.</p>
<p><img src="/20151020_193901.jpg" alt="Z-Turn with 7Z020"></p>
<h2 id="fpga-design-2015-style">FPGA design 2015-style</h2>
<p>It was a few years since I last used an FPGA, since I have mainly been working on Linux kernel. Although I learned VHDL a decade ago, and that's still the same, the density and capability of the programmable logic fabric has really matured since I last used it.</p>
<p>Xilinx offer their Java-based tools that will work on Linux natively for free (although they are not Free) and the standard basis for app IP design is the AXI bus, same as it is in real SoC design. In fact the general level of IPs available <em>is</em> the same as midrange SoC designers use. Again Xilinx offer free (but not Free) sophisticated enabling IPs such as scatter-gather DMAs, that you can just instantiate how you like. And even the low-end FPGAs have enough gates and assets in the programmable logic fabric that you can make serious digital designs that work at respectable clock rates (this was not always the case, the further back you go the more tricks were needed).</p>
<p>At any rate the cost of entry into using FPGA designs is less than US$120 now, the tools come for free for lowend devices.</p>
<h2 id="continued">Continued</h2>
<p>In <a href="../21/hdmi-capture-and-analysis-fpga-project-2.html">the next part</a> I'll discuss exactly how HDMI works on the wire. I thought I knew enough about it before I started this but the details are quite interesting and since you don't need to know most of them from the software point of view, not that widely known.</p>
Nokia Msft Lol2013-09-23T00:00:00+08:00https://warmcat.com/2013/09/23/nokia-msft-lolAdios Wordpress2013-09-19T15:47:25+08:00https://warmcat.com/2013/09/19/welcome-to-jekyll<p>... and take your enabler buddy PHP with you...</p>
libwebsockets.org2013-01-12T00:00:00+08:00https://warmcat.com/2013/01/12/libwebsockets-org<a href="http://libwebsockets.org"><img class="size-full wp-image-146 alignleft" title="warmcat-trac" src="http://warmcat.com/warmcat-trac.png" alt="" width="315" height="67" /></a>
<p>In 2012 it turns out there was a great deal of interest in libwebsockets, the lightweight, portable C websockets library, that I did not entirely stay on top of due to work commitments. In order to take care of that better going on I have created a trac and mailing list at <a title="http://libwebsockets.org" href="http://libwebsockets.org">http://libwebsockets.org/ </a>where support and cooperation for improvements should be easier.</p>
<p>Recently libwebsockets has absorbed almost all of the patch delta in Dave Galeano's github branch and added new features like configurable logging; the approach to autotools has also been fixed so it is much more compatible. If you're interested in libwebsockets please consider visiting <a title="http://libwebsockets.org" href="http://libwebsockets.org">http://libwebsockets.org/</a> and joining in. Edit: http://git.libwebsockets.org and git://git.libwebsockets.org are both working now. <a href="http://git.warmcat.com/cgi-bin/cgit/libwebsockets/">http://git.warmcat.com/cgi-bin/cgit/libwebsockets/</a>is a symlink of the same repo so you can use either.</p>
libwebsockets new features2011-03-06T00:00:00+08:00https://warmcat.com/2011/03/06/libwebsockets-new-features<img src="http://warmcat.com/jump1.png" alt="" class="alignleft" />
<h2>More platforms</h2>
In addition to Linux, there are now users of libwebsockets on win32, OSX and iOS that have contributed patches now in git at <a href="http://git.warmcat.com/cgi-bin/cgit/libwebsockets">http://git.warmcat.com/cgi-bin/cgit/libwebsockets</a> OSX wasn't too hard thanks to some patches from Darin Willits, win32 was a bit tougher despite a big patch from Peter Hinz, I was really surprised to find the the Microsoft compiler doesn't support simple C99 features like named member initialization of structs. Anyway after dumbing down the few structs that used it, and adapting a few other things it's now possible to use libwebsockets on win32 thanks to Peter's work.
<h2>v05 and v06 protocol support</h2>
Both v05 and v06 support was added within 12 hours of the spec being released, at the time of writing v06 is the current version.
I also added v00 client support by request from a developer using the socket.io server library which doesn't support anything except 76 / 00. That means that in libwebsockets, both server and client support now covers all combinations of v00. v04, v05 and v06.
<h2>Extensions and deflate-stream</h2>
There's a new extension support infrastructure added, including an implementation of zlib compression to provide the standardized "deflate-stream" extension.
The benefit of running with compression depends a lot on the payload size and content; in order to reduce latency the compression buffers are partially flushed every frame. But it is an important part of the websockets standard that libwebsockets now supports.
<h2>Integration with external poll arrays</h2>
Libwebsockets is being integrated into a fork of ircd, the daemon which runs the IRC network. The challenge there is to interoperate will the existing single poll() loop in a way that maximizes blocking and maintains serialization of poll service in a single thread to avoid the need for locking.
In addition to its default private poll array management libwebsockets now provides poll array callbacks into the user code which enables integration of websocket event loop functionality into an existing, master poll() array.
Callbacks occur into user code when file descriptors must be added, removed or have their event masks changed. There are hash tables implemented in libwebsockets to allow everything to remain opaque to the host code using just the file descriptor as an index.
With this technique, it has been possible to integrate websocket functionality into an existing irc server while keeping all details about websocket functionality and protocol versioning in the library.
Nokia failure2011-02-11T00:00:00+08:00https://warmcat.com/2011/02/11/nokia-failure<img src="http://warmcat.com/cop-horror.png">Nokia has seemed to be in a downward spiral for some years, now they really jumped the shark in losingsovereignty over their software stack.
<a href="http://conversations.nokia.com/2011/02/11/open-letter-from-ceo-stephen-elop-nokia-and-ceo-steve-ballmer-microsoft/">http://conversations.nokia.com/2011/02/11/open-letter-from-ceo-stephen-elop-nokia-and-ceo-steve-ballmer-microsoft/</a>
Can you imagine more worthless Microsoft-servitude bullshit than this (from the nokia link above)
<blockquote>There are other mobile ecosystems. We will disrupt them.
There will be challenges. We will overcome them.
Success requires speed. We will be swift.
Together, we see the opportunity, and we have the will, the resources and the drive to succeed.</blockquote>
Quelle leadership, Nokia employees.
QT is looking a bit pale too <a href="http://www.engadget.com/2011/02/11/nokia-notifies-developers-that-qt-is-out-for-windows-phone-devel/">since it has no path forward after the Microsoft takeover of Nokia</a>.
In other news, <a href="http://git.warmcat.com/cgi-bin/cgit/libwebsockets/">libwebsockets</a> was updated to -05 within a few hours of the new spec being released.
libwebsockets now with 04 protocol and simultaneous client / server2011-01-22T00:00:00+08:00https://warmcat.com/2011/01/22/libwebsockets-now-with-04-protocol-and-simultaneous-client-server<h2><img class="alignleft" title="cat stance" src="http://warmcat.com/cat-stance.png" alt="" width="150" height="194" />76 + 4 = 80</h2>
The websockets protocol reached its 80th version recently; after v76 was widely implemented they renamed it 00 and continued meddling with it. Many of the changes for 04 make a lot of sense, like moving keys and nonces into base64 encoded headers rather than raw 8-bit data. One particular expensive change is to require a SHA1 per payload frame and XOR munging of all payload data in the client -> server direction; the server -> client frames remain unmunged.
<h2>Standards politics</h2>
The reasoning behind that particular change makes no sense and is an entirely political decision AFAICT; after a "security problem" was identified, Firefox and Opera disabled websockets by default in their dev builds, putting the whole effort in danger of collapse. However on closer inspection, this "security problem" does not appear to identify any actual problem that exists in this world, and if it did exist it would be in the form of a broken intermediary / proxy that would remain broken and open to abuse no matter what websockets did to avoid enabling its theoretical exploitation (should it ever be discovered to exist). The end result is a pointless and expensive payload munging scheme that doesn't protect anything inserted into the standard, in order to encourage the browser vendors to re-enable support for it.
<h2>libwebsockets client support</h2>
I don't have a browser with 04 support yet, but it is implemented into libwebsockets and tested via libwebsockets' new client websocket support which is 04-only; the server support is 76/00 and -04 depending on what the individual client asks for on each connection.
The client support means you can connect to an -04 server as if you were a browser.
With support for 04 client transmission, the prepadding constant LWS_SEND_BUFFER_PRE_PADDING is increased to 14 reflecting the maximum needed to contain the new frame nonce along with the length coding, but you shouldn't have to worry about that if you are using "LWS_SEND_BUFFER_PRE_PADDING".
Since the client support is so integrated into the server support, I changed the name of the common init API to "libwebsocket_create_context()" instead of ...create_server.
<h2>Breaking out the service loop</h2>
A libwebsocket user hit a problem with the forked service loop approach that the library was using because on his platform, IOS, fork() is not supported. To allow libwebsockets to work in single-threaded environments I exposed a new api libwebsockets_service() that just needs to be called periodically to perform the poll() on all the sockets and handle incoming websocket traffic then. There is a new configure option --enable-nofork which disables any references to fork() and similar in the sources and implies the user will call the service api periodically (as shown in the test server sources).
Support for client sockets is integrated into the server stuff, it means that even if your application is simultaneously a websocket server and client to other servers, there is still just a single service call / poll action.
Because of the integration, it means that a single protocol callback can handle both client and server callback reasons; I added a new reason LWS_CALLBACK_CLIENT_RECEIVE for client rx payloads so they are handled separately from the server rx payload callback reason LWS_CALLBACK_RECEIVE.
You can see all these changes in the client and server test apps that are part of the sources and built along with the library.
The test client connects to the test server with two websockets, one using the "dumb-increment" protocol where the server just keeps sending the client an incrementing number in ascii, and the other uses the "lws-mirror" protocol to draw circles in the canvas of any browser that is also connected to the same server. On both the server and client side this is done fork and threadlessly, and the same server is able to deal with say a -76 version browser and an -04 libwebsocket client connected and interoperate between them.
New NXP LPC32x0 in Qi bootloader2010-11-29T00:00:00+08:00https://warmcat.com/2010/11/29/new-nxp-lpc32x0-in-qi-bootloader<img class="alignleft" src="http://warmcat.com/catkey.png" alt="" width="150" height="191" />
<h2>LPC3250 from scratch</h2>
NXP's new LPC32x0 is a very cheap and feature-filled ARM926. According to Digikey anyway, it's the cheapest ARM chip with at least v5 instruction set that's going. That's important not just because of the extra processor strength over older ARM9 core, but because ARM Fedora is built requiring armv5 or newer instruction set. Being able to use ARM Fedora and RPM as a basis means freedom from compromise and having to own the building of an integrated, self-consistent rootfs; you can just focus on doing your specialized code on top using the reliable Fedora quality basis.
There are four chips in the series, they differ in having an LCD controller and Ethernet MAC or not; also the smallest guy LPC3220 has "only" 128KBytes of Static IRAM and the others 256KBytes. Well, having worked with the 2KBytes of internal static RAM on the iMX31 for SD boot on Qi, having to shoehorn an SD card driver in there, even 128KBytes is crazy amounts.
They have support for resistive touchscreen, USB OTG, NAND controller and Mobile DDR, and up to 266MHz CPU clock at 1.4V Vcore (208MHz at 1.2V Vcore but as we will see that is not entirely true). They don't support SD Card boot from ROM, but that can be solved for about US$0.30 as will be shown.
In short they're ready to do some serious embedded work at a budget price.
<h2>Embedded Artists EA3250 Dev kit</h2>
<img src="http://warmcat.com/ea3250-400.jpg" alt="" />
There are a few dev kits around for LPC32x0, Hitex have a <a href="http://www.hitex.com/?id=1458">cheap USB stick format one</a> that has been permanently two weeks away from availability since I first looked at it a month or so ago, and it still is two weeks away.
NXP anoited two real dev boards they evidently worked with the vendors for during development, they don't actually make an NXP branded dev board, it's Phytec and Embedded Artists. Since the EA one is in Digikey, that's what I ended up with.
The dev board is well made but there are some problems with it: like many dev boards it comes in two halves, a cheaper, large breakout board and a 8-layer DIMM type board that has the actual CPU BGA and memory. In an act of supreme lunk-headedness, the large breakout board re-uses the Pn.m nomenclature that the CPU uses for GPIO, with no care to retain the CPU mapping. So for example a header is marked with having a pin P1.27, very confusingly this is nothing to do with the CPU GPIO P1.27. This is also true in the schemtatics for the baseboard and CPU board, complete confusion trying to trace a signal between the two boards or looking for a misnamed signal on the baseboard.
<h2>DDR trouble #1</h2>
There's also a more serious problem, the DDR on the CPU card is marginal and Embedded Artists have made a recall where they will replace the board with one with a different DDR DRAM for free. The CPU board I got was affected but not at room temperature; they want the old card sending back and I am not finished with it yet, so I will take advantage of this recall later.
<h2>DDR trouble #2</h2>
There's another problem with DDR, NXP issued an errata confessing their inverted signal for the differential DDR clock is skewed by no less than 1.2ns from the uninverted partner of the differential pair, a huge skew. This issue removes a lot of comfort zone from designing with DDR and means only some memory devices will tolerate it. However in the EA board case, they have not used the workaround suggested by NXP which is to nuke the inverted output entirely and make the clock unipolar, so the situation can't be that bad.
<h2>DDR trouble #3</h2>
The last problem with DDR... operation at 208MHz with 1.2V Vcore is fine for the CPU, in fact while screwing with the PLL I had the CPU running fine at 400MHz, although there is no way to divide anything useful down for the memory clock at that speed and it's illegal for the PLL over temperature, which tops out at 320MHz. However at 1.2V and 208MHz, the CPU side of the DDR bus is unreliable: it requires cranking to 1.4V to operate DDR even at 104/208MHz. That's annoying because since 1.2V is needed anyway for other circuitry, it could have saved a regulator.
<h2>Unbrickability of LPC32x0</h2>
LPC32x0 chips feature UART-based bootloader injection... if you pull down the SERVICE_N pin, then next boot the ROM in the CPU will bring up UART5 at 115200 n81 and issue a simple protocol byte allowing for bootloader download.
Since I couldn't find a Linux tool for injecting bootloaders, just a Windows one, I wrote a commandline tool for it and added it to Qi build.
<a href="http://git.warmcat.com/cgi-bin/cgit/qi/tree/tools/lpcboot.c?h=lpc">http://git.warmcat.com/cgi-bin/cgit/qi/tree/tools/lpcboot.c?h=lpc</a>
No matter how broken your nonvolatile image gets, it's still possible to recover the device via this UART scheme with a USB <-> LVTTL serial cable.
<h2>Bootloader Hell</h2>
The LPC32x0 bootloader situation is ugly. Basically NXP provided a huge suite used for chip verification called CDL ("common driver library"), this is a sort of chopped down OS in bootloader form. It has all kinds of functions to drive the chip peripherals and test memory, but nothing to actually boot Linux!
What EA shipped, and what you are meant to do as a system integrator, is get an implementation of CDL in the form of "S1L" -- stage one bootloader -- to load U-Boot, which will then load Linux. Both U-Boot and S1L -- itself like 130KBytes! -- store "state" on the board. It leads to this insane situation that two bootloaders with two kinds of state must be right in order to boot. Things are further complicated that SPI boot only allows the first 56KBytes to be loaded by ROM into IRAM and executed, but the bloated bootloaders are too big to do this in one step.
<h2>Bootloader Heaven</h2>
I added support for LPC32x0 to Qi last week, this is a single < 30KBytes image that can boot itself from SPI Flash or UART 5 injection and pull Linux from SD Card in VFAT partition or also via SPI Flash. Boot from cold, with Qi and Kernel in SPI Flash to Fedora 12 bash prompt is less than 4 seconds.
<a href="http://git.warmcat.com/cgi-bin/cgit/qi/log/?h=lpc">http://git.warmcat.com/cgi-bin/cgit/qi/log/?h=lpc</a>
This replaces both S1L and U-Boot, and in accordance with Qi philosophy it holds no state at all on the device.
Its strategy is if it finds that it is running via injection on UART5, it copies itself into SPI Flash / EEPROM so it will run next boot from there, and if it finds an SD Card kernel image it will also copy that into SPI Flash.
When it finds it is running from a non-injection source, ie, a normal boot from SPI Flash, it favours any kernel it can find on the first, VFAT, partition of an SD Card if found, otherwise it boots from the kernel also in SPI Flash.
This is why the lack of ROM -> SD Card boot is not critical, the cheapest, smallest SPI EEPROM can be used to contain Qi, which will then load the kernel and rootfs from SD Card if that's what's needed as during development. If SD Card is overkill for the job, then Qi, Kernel and initrd can all be pushed into a single US$2 32MBit SPI Flash.
Since I only have the Embedded Artists board right now it wants to see a kernel image called k-ea3250.img on the SD Card; the way Qi works you add a new file for each supported board in ./src/cpu/lpc32x0/ copied from embart-steppingstone.c in that directory; the bootloaders need some way to identify what they're running on at runtime since there is only a single image per cpu that supports all devices. See <a href="http://git.warmcat.com/cgi-bin/cgit/qi/tree/src/cpu/lpc32x0/embart-steppingstone.c?h=lpc">http://git.warmcat.com/cgi-bin/cgit/qi/tree/src/cpu/lpc32x0/embart-steppingstone.c?h=lpc</a> for an idea of what's involved to support a new board in the bootloader image.
libwebsockets now with SSL / WSS2010-11-08T00:00:00+08:00https://warmcat.com/2010/11/08/libwebsockets-now-with-ssl-wss<h2><img class="alignleft" src="http://warmcat.com/phonegirl.png" alt="happy phone" width="150" height="145" />SSL encrypted websockets</h2>
The websocket protocol allows for two kinds of transport, unencrypted ws:// sockets and encrypted wss:// ones. The server on a given port is either listening unencrypted initially for http:// connections, or encrypted for https:// ones using SSL.
Today I added optional SSL support for libwebsockets using OpenSSL, so it now supports encrypted or unencrypted types. When you connect by encrypted, you simply use a https:// URL to the server. The server returns the script over the encrypted link, and the script on the client side opens a wss:// websocket on the server. Otherwise the encryption is completely transparent. In particular, the callback the library makes back into the user code for the server is totally unaware if it is being used over SSL or not.
I adapted the javascript that the test server sends to open ws:// or wss:// according to whether its own URL was http:// or https://.
The test server builds its own test https:// certificate, browsers correctly warn that the CA is not recognized but otherwise the certs work correctly in Firefox 4.0b6 and Chrome 8.0.552.28 beta, both current on Fedora F15 rawhide.
<h2>Changed license to lgpl2.1</h2>
I realized that GPL2 isn't the best idea for this as a library so I changed the terms to LGPL-2.1 making it easier to integrate with systems using other licenses.
<h2>Autotools</h2>
The build system has also been moved to autotools / libtool so it has a traditional ./configure structure that should survive crossplatform builds better. It now has an --enable-openssl switch to control if openssl is needed.
You can get libwebsocket via git by:
git clone git://git.warmcat.com/libwebsockets
libwebsockets - HTML5 Websocket server library in C2010-11-01T00:00:00+08:00https://warmcat.com/2010/11/01/libwebsockets-html5-websocket-server-library-in-c<img class="alignleft" src="http://warmcat.com/rage.png" alt="" width="150" height="180" />
<h2>Browser vs Apps</h2>
It's been clear since browsers first started becoming popular in the 90s that they were going to be the answer to standardized cross-platform support, but somehow there were never quite enough pieces of the puzzle to replace applications outright. Java or Flash or me-toos like Sliverlight were needed and despite Flash solving the problem of video delivery, there hasn't really been a shift away from old-style apps to the browser. (When I wrote Penumbra in 2007, I was able to use an exclusively https browser interface, but that's only because it was fundamentally a filesharing app that didn't challenge simple HTML).
The issue has never been more urgent because the number of incompatible platforms in wide use has been increasing, with iPhone. Android, Macs and Linux boxes alongside Windows. Making native apps for each platform is still possible, but it's now a very large effort to cover and support all the platforms well natively.
<h2>HTML5 vs flash</h2>
HTML5 looks like it might have enough firepower to eliminate flash, it has already proven with web-m that it will be able to replace flash for the most critical job it does for the internet as a whole, video delivery, without having to worry too much about patents. Because of that, it has increasing mindshare and there's already a lot of support in place in recent browsers, eg, Chrome and Firefox 4.0b6 at the time of writing, and considering Chrome is webkit, that covers many embedded scenarios too; Apple have committed themselves to HTML5 support in order to screw over Adobe... uh... I mean as part of their love of open standards.
Adobe did make actionscript a standard, but they have never been able to get away from being denounced as the main cause of browser crashes. HTML5 moves all the hard work Adobe tried to do by themselves in terms of cross-platform media support to the people writing the browser and eliminates the need for Flash.
<h2>Websockets</h2>
Websockets are a new part of HTML5 that allow the client to get away from the ancient bias of browsers that any network connection is ultimately there to serve some kind of ...ML, HTML or XML or whatever. Websockets start off life as an HTTP connection, but the client immediately sends a request to the HTTP server to "upgrade" the protocol to websocket protocol.
<img src="http://warmcat.com/websocket-lifecycle.png">
After a complex handshake confirming both sides really speak websocket, websocket protocol is MUCH simpler than HTTP. In the case of UTF-8 text packets, it's as simple as sending 0x00 <vari-size payload> 0xff to terminate. Binary payload packets have a slightly more complex length descriptor and then the payload with no terminator.
The value of it over http is the javascript on the client side can just get the raw binary or UTF-8 payload, and the socket stays open for async traffic in either direction. There is no HTTP header overhead on each packet, as mentioned for UTF-8 the protocol overhead is 2 bytes per packet only. There's no huge XML encode / decode overhead either, so this is a great transport for low-latency data like speech, and it's no-messing async nature lets it carry event information too ajax-style.
Because (once the connection is established) the protocol overhead is so low, it's very suitable for weak embedded devices that have some kind of network connectivity but no real UI capability or CPU cycles for bloating data into formats browsers otherwise prefer.
<h2>Websocket servers</h2>
Sounds good right? Well, to use it practically you need server-side support, because you are literally using a new socket-level protocol other than http. There are Java and Python implementations suitable for Apache... but... unlike http there are no C library implementations suitable for embedded devices. So, I wrote libwebsockets to allow embedded devices to participate in the new UIs possible with HTML5 and websockets.
<h2>Introducing libwebsockets</h2>
libwebsockets (in git at <a title="libwebsockets git" href="http://git.warmcat.com/cgi-bin/cgit/libwebsockets/">http://git.warmcat.com/cgi-bin/cgit/libwebsockets/</a> ) is a lightweight GPL2 http and websocket server that hides all the protocol handshakes and detail from the user code driving the server.
<img src="http://warmcat.com/websocket-library.png">
Because it supports file serving on http, it is able to provide a single listening socket that can serve your html script page normally and then when the browser starts running your script, come back and make websocket connections to the same port.
A test server is provided
<a title="test-server.c" href="http://git.warmcat.com/cgi-bin/cgit/libwebsockets/tree/test-server/test-server.c">http://git.warmcat.com/cgi-bin/cgit/libwebsockets/tree/test-server/test-server.c</a>
because everything to do with the protocols is handled by the library, it's very simply able to serve http and websockets using a single callback.
Don't let Production Test Be Special2010-02-12T00:00:00+08:00https://warmcat.com/2010/02/12/dont-let-production-test-be-special<h2>Lesson 3: Test is not special</h2>
Commonly in embedded work test is the "red-haired stepchild", nobody wants to take care of it and by common, silent consent it is always left until last. Eventually the need for a test plan becomes overwhelming as the date to go to the factory nears, and the task is assigned to the most junior engineers available, since everybody knows that test is the death knell of your career.
Coming cold to and excluded from being inside an already-existing project, the engineers try to create some kind of test coverage the best way they can. At openmoko two giant test suites were created, DM1 and DM2, written by people who were learning C for the first time. I got the job of modernizing this code so I know from experience the code was already truly terrible and bitrotted at an alarming rate. However I had to admire the guys who wrote it, with everything against them and little experience they did manage to create something that did provide test coverage at the factory, however much it was on life-support.
<h2>Totentanz</h2>
Similarly, Openmoko used production test jigs, special additional PCBs that formed a kind of custom test environment for the PCB under test. At one version of GTA03 there were so many test points added it was a serious concern that the board would break down under the overall pressure needed to mate the spring-loaded test probes to the test points.
Jigs and test points have an obvious advantage in terms of test throughput, but there are some big disadvantages.
First, you have to design and build the jig, and track changes to the actual device with it. This effort is completely disconnected from moving your actual product on, except that it's meant to help in production.
Second, test points don't test your connectors; the test point may be connected OK but not the connector pin the user actually accesses.
Third, you need something else outside the device to assess what is happening on the test points, the code for that also has to be written and maintained against changes in the actual product. It also means that it's not possible for the tests to be casually performed outside the factory, or maybe by the original engineers if they have access to the ATE gear themselves.
<h2>Pain into torture</h2>
Additionally the bringup of GTA02 required special versions of U-Boot and kernel which had added "test magic" created by the test guys and unknown to anyone else. These versions were seldom uplevelled.
Since GTA02 had raw NAND, it needed filling up at the factory with the rootfs. The way to do this was via a very fragile OpenOCD using a custom USB - serial based device that was bitbanged. It only worked with certain versions of the usb library needed to talk to it.
All of these quirks and requirements at the factory made production runs difficult and expensive to get right.
<h2>I only hurt you because I love you</h2>
I spent a lot of time thinking about how to avoid this end result next time I would design something. The mistakes started in having anything special for test I concluded. The jig: special, and so evil. Test kernels or bootloader: special -> evil. Test rootfs -> Evil. test software, like Openmoko's DM1 and DM2, evil. The device should naturally be able to test itself with the arrangements that already exist inside it to operate at all.
The answer to the problem of "production test" is to completely subsume it into the rest of the design. So it is the responsibility of Linux drivers to provide enough functionality by probe errors, or sysfs features, that one can perform test and diagnosis. The "test suite" should boil down to a bash script that is using features exposed in a normal shipping rootfs and kernel. Bash is ideal because most of the test action will be calling existing commandline tools like ifconfig, ping, l2ping and grepping or looking at their return code, this is what bash is best at. It's also easily understood and edited by anyone who has worked with Linux for a while.
The bootloader is required for test in only one capacity, it is the only part of the system that is capable to run the SDRAM tests; once you enter Linux you can't perform a full SDRAM test any more. But even that should be done by the one shipping bootloader image.
In many cases, device interfaces can be tested by external loopback connectors, this proves connectivity through the connectors and it leaves open the possibility of end-users being able to run the same tests on the shipping rootfs.
Fosdem and the Linux Cross Niche2010-02-08T00:00:00+08:00https://warmcat.com/2010/02/08/fosdem-and-the-linux-cross-niche<img class=" alignleft" title="fosdem" src="http://warmcat.com/fosdem.png" alt="fosdem" width="121" height="116" />
I was at Fosdem over the weekend, there were several interesting talks I attended but the most interesting one for me was a roundtable about the future of Cross distributions. I was invited to give a 5 minute talk there which I gave, but unfortunately it was right at the end and the people before had overrun, so there was no time to make much of a coherent case. So I am going to write some articles covering the issues involved here.
<h2>Cross as a niche</h2>
Cross itself remains absolutely necessary for systems below a certain level of horsepower. For example, 8051, ARM7, cortex M3 are not really capable to consider native build. But processors get faster each year, a lot of things we would have used an 8051 on use an ARM7 or cortex M3 now, in a few years it is likely that baseline has moved further up and it's an ARM9 equivalent. What I am suggesting then is that over time, the niche where you need cross is shrinking.
All four of the cross distros at FOSDEM target a CPU that's powerful enough to run Linux, but not powerful enough to build its own binaries. That is the niche that I believe will shrink to the point that it won't support all these cross Linux distro projects, possibly none of them in the end.
<h2>My background with cross Linux</h2>
A few years ago I created an RPM-based cross distro singlehanded, and used it on a product for a customer This was AT91RM9200-based, a 200MHz ARM9 with 32MBytes of SDRAM. The amount of effort needed to create a set of cross packages sufficient to create a workable rootfs was huge, it took me many weeks. Some packages like perl were just so cross-unfriendly that they were basically out of reach (although I later saw other people have done the invasive magic necessary). It did work well, and I added patches for busybox RPM support that allowed it to do more useful things like erase and keep a package database. The packaging was valuable in itself but a nice advantage was the source RPMs it generated ensuring GPL compliance.
<h2>My background with Openmoko</h2>
Subsequently I spent 14 months as (mainly) the kernel maintainer for Openmoko. Openmoko had an OpenEmbedded basis for it's rootfs, also a cross system. I attempted to use it for "hello world" while I was at Openmoko, but it broke because I was on a newly released Fedora. How it broke was very revealing, the official way to get started with it was to run a huge script that wgetted and locally built 1100 packages. It died due to some assumption somewhere breaking while it tried to build <strong>host</strong> dbus libraries.
What I wanted was a cross toolchain that would let me package "hello world". What I got was a massive host build action including host dbus libs. I have perfectly good host dbus libs in my Fedora install, I enquired about it and was told they were the "wrong" libs for the expectation of the rest of the packages, so they had to be rebuilt.
I gave up on trying to use OpenEmbedded, as I guess most of Openmoko's customers did.
After Openmoko imploded, I designed the software architecture (and influenced the hardware design in some aspects) for the txtr reader device. On this device, I put into action various lessons I had learned in how not to do things from Openmoko. I will write further about the other lessons in future articles, but here's the first one:
<h2>Lesson #1: Don't compile your own rootfs</h2>
I was told by a manager at Openmoko that Openmoko had hired most of the main devs of OpenEmbedded and were paying for that accordingly. This was a pretty big drain on their resources over a long period.
In contrast, nowadays you can head over to <a title="Fedora ARM project" href="http://fedoraproject.org/wiki/Architectures/ARM" target="_blank">http://fedoraproject.org/wiki/Architectures/ARM</a> and download a generic <a title="rootfs tarball" href="http://ftp.linux.org.uk/pub/linux/arm/fedora/rootfs/rootfs-f12.tar.bz2">rootfs tarball </a>of prebuilt binaries for ARMv5 and above[1]. It's made from unpacking prebuilt binary packages. Once you boot into it, you can install further packages with the usual yum install type action. You can be up in a high quality rootfs in five minutes flat.
You do not need to go around compiling everything personally when binary packages exist from a reputable distro already. Normal distros provide -dev and -devel packages for you to link against too, so you do not need to recompile the universe just because you want to build "hello world" either. That's how we do things on desktop and server systems, as the processors involved get stronger embedded does not have to be different.
If you want to cross-build specific packages, you install the <a title="cross toolchain" href="http://fedoraproject.org/wiki/Architectures/ARM/CrossToolchain">Fedora ARM Cross Toolchain RPMs</a> on you host via yum and you are ready to go in a couple of minutes. This is very useful for cooking the kernel on your host both to get started and during development; you can't native-build the bootstrap stuff needed to boot your platform. But that's just a cross compiler and related pieces, it's not a cross distro. (The guy from emdebian at this FOSDEM talk also made this point that you do not need to get into making your own toolchain, your distro should have one you can just install).
Fedora ARM's strategy is native build. So you install gcc and other dependencies into the actual device, and use standard rpmbuild to build your package there; you can also just configure ; make ; make install for development too down there. If something's missing on the rootfs you can yum install it.
<em>(1 To make the comparison fair to openmoko Fedora ARM came along too late for them to choose it from the start, and the GTA02 s3c2442 was not a v5 class processor, they would have been into a distro recook after changing the distro-level compile options. However my worry is not repeating Openmoko's errors and today Fedora ARM is available.)</em>
<h2>Quality and Quantity</h2>
Another major issue is distro quality. I was so surprised to hear at Fosdem Dr Mickey Lauer of OpenEmbedded boast about the number of devices that managed to use that distro (including the sad shape of the GTA02) and say that unlike the other cross distros, OpenEmbedded focused on "Quantity not Quality". From my experience I think he's right alright about not focusing on quality, and he did go on to explain there are problems with OpenEmbedded they are trying to address.
In the near future, there will be a carcrash between these difficult cross distros that have relatively poor quality and strange requirements to use them and standard, "proper distros" like Fedora ARM, because on higher-end ARMv5s say 400MHz and above, it is already perfectly possible to compare the two worlds on the same device. I think many devs currently are trained by their experience with buildroot type systems to assume they have to personally build everything Gentoo style. However as CPUs increase in power at the same price point, the ways of working with these systems efficiently change, and desktop / server "treat it like a PC" lessons like the value of packaging start to really show their traditional advantages over rootfs tarballs.
Like Debian, Fedora has all kinds of rules and requirements about packaging to ensure high quality, there are a huge number of users of these two normal distributions that leads to tested and debugged basic packages and their dependencies. OpenEmbedded's boast about number of users is not even a blip in comparison to Fedora or Debian's consumers and contributors.
<h2>Cross distros are locked into local patch hell</h2>
A worse problem against their quality even than not many users is the patch load these projects are carrying, I think all of the cross distro projects bemoaned that they were carrying huge patchsets across a large number of packages to get them to build cross at all, and that most upstreams did not care to take them (I assume they don't want to have to get into testing them). To uplevel packages, which distros have to daily when they have a large package universe, it can become a nightmare of breakage because of the private patchsets being dragged around.
(BTW I also saw in another presentation that the <a title="limo" href="http://www.limofoundation.org/">limo foundation</a> are carrying around more than 80MBytes of diff between their distro and the upstream projects, and these are the guys who sent out a <a title="limo whitepaper" href="http://www.limofoundation.org/images/stories/pdf/limo%20economic%20analysis.pdf">whitepaper</a> explaining the massive cost of delaying sending patches upstream in dollar terms.)
There was proposed a unified crossbuild patch promoting effort, but the effort seemed only to consist of a domain like "sends-patches.org" that you could use when sending patches instead of your own project name, which seems to just be tea and sympathy rather than a solution.
It's clear that quality will tend to be higher if you are getting packages built with normal distro specfiles and no pile of local patches to get them to build cross (because they were built native). Combined with higher quality thresholds at the project level and sheer number of users, native Fedora (or Debian) rootfs basis will provide Quantity <strong>and</strong> Quality if your processor is appropriate.
A couple of hours after the talk I had an interesting conversation with <a title="openinkpot" href="http://openinkpot.org">OpenInkpot</a> dev Mikhail Gusarov, who I found also <a title="openinkpot and openembedded" href="http://openinkpot.org/wiki/FAQ#Whyareyouusing.debsandIPlinux">shared my lack of enthusiasm for OpenEmbedded</a>, although he is trapped still in the cross niche generally by the weak processors he targets at the moment.
[update Feb 10 09:00] Mikhail has <a href="http://fossarchy.blogspot.com/2010/02/cross-build-systems-and-their-future.html">written his own response</a>, he still likes the speed of cross (and still hates OpenEmbedded). But there's some confusion about what Fedora ARM offers, it's a generic ARMv5 rootfs, it doesn't care what exact kind of CPU, vendor or peripherals available. Build farms are less of a requirement when you are no longer building your rootfs but installing it from distro binary packages. <a href="http://en.wikipedia.org/wiki/SheevaPlug">Sheevaplug</a> makes available a 1.2GHz Marvell ARM compatible with 512MBytes of SDRAM that Fedora ARM can work on if you need a native build machine. Shortly fast dual processor Cortex A9 machines will become available.
Bootloader Envy2010-02-08T00:00:00+08:00https://warmcat.com/2010/02/08/bootloader_envy<h2>Lesson #2: A bootloader is to load and boot Linux</h2>
<img class="alignleft" title="Qi" src="http://warmcat.com/qi.png" alt="" width="126" height="183" />On the first day of FOSDEM I sat through a presentation on what could be called another "U-Boot derivative". One of the greatest asspains at Openmoko was the various kinds of Hell caused by the U-Boot bootloader and its philosophy, which can be summed up as "I wanna be Linux when I grow up".
<h2>Configure system is a bad alternative to good bootloader design</h2>
First, it has a config system. That should be good though, right? The problem with the config system is that if anything differs from your current config, you must build another incompatible binary with another config and take care of that. When you have more than a handful of different boards, you are in a maze of incompatible bootloaders. Openmoko took it one step further, they mandated a different bootloader binary per PCB revision, so left unchecked there would have been a continuous proliferation of incompatible bootloaders, all basically the same.
<h2>All persistent bootloader private state is EVIL</h2>
Second, U-Boot thinks it's a good idea to have these environment "scripts", because it's "configurable". Actually, the job of a bootloader is to Load, then Boot Linux. You don't need any configurability for that if the bootloader can figure out what it's running on and therefore where the memory is and how much there is. These scripts expose a really deadly trap I call "private bootloader state". It means the bootloader stores stuff in nonvolatile memory on the PCB and acts different according to what it hides there. The end result is that two boards from the same factory may act totally different even with the same rootfs due to "bootloader secrets". This is totally needless and ALL private bootloader state can be eliminated by correct design of the bootloader leading to completely deterministic boot action per rootfs.
A good example how that lead you to the path to hell is hardcoding in the U-Boot environment of the amount of kernel image you will copy from somewhere. People commonly set it to 2MBytes, forget about it and one day they generate a 2.1MB kernel image and wonder why decompress blows up. Actually, that whole procedure is insane, the kernels are uImages that report their length in a header. The bootloader should examine the header and compute the length of image to pull. But that doesn't fit with this "environment" nonsense.
<h2>Do Linux Stuff In Linux</h2>
In any of these bloated U-Boot style bootloaders, is there even one feature they do better than the same feature in Linux? The startup time should be better by a few 100ms. Other than that, no, every single bloated "I will add it to the bootloader beacuse I can" feature is shittier than you get in Linux. Every single feature!
If you need some advanced capability or backup / recovery boot action, check for a button held down at boot-time in the bootloader and go fetch a different Linux partition + kernel. Use standard Linux tools and shells. In return, get really high quality network stack, proper USB support, NAND access that's compatible to your main Linux system access in BBT / ECC terms, and all the other advantages of Linux.
<h2>Do your peripheral bringup in drivers in Linux</h2>
Typically you do not need ANY bringup in the bootloader except SDRAM controller and chip init, since it's a prerequisite to put Linux in the RAM that it's initialized.
That's right, all the megabytes of source spent in U-Boot providing support for so many kinds of peripheral is a waste of time, effort and maintenance. I am being kind saying "maintenance", because the drivers in U-Boot are typically "dumbed down" versions of the equivalent Linux driver that were forked irretrievably the moment all the Linux APIs were ripped, so there's no coherent effort to keep them up to date with the Linux ones . Lately I saw that they try to ape some Linux APIs there... why not go the whole hog and just <strong>load and boot real Linux</strong>? After all, modern CPUs can be running your driver probes in Linux in ~2 seconds from power using a bootloader that doesn't get in the way.
You typically don't even need to talk to the PMU in the bootloader, after all, you are running code fine already, right? Otherwise you wouldn't be able to run the bootloader code itself.
<h2>Fat girl in Ibiza</h2>
At least at Openmoko, code quality inside U-Boot was awful bad. I called U-Boot on the lists there "the fat girl in Ibiza" because you know she's going to do anything you want. All kinds of constant-only code, weird new scripting keywords were added for test undocumented, you name it. Hardware guys felt up to writing such code secretly by themselves once they learned the software engineering marvel that is *((unsigned int *)0x...) = 0x...;
<h2>Your bootloader just tests SDRAM</h2>
There's only one test action your bootloader is suited to do, and that is SDRAM test. Once you are in Linux, it can't perform a full SDRAM test while it's running. But the bootloader is typically starting from on-CPU SRAM, it can actually run a true SRAM test from there. Otherwise, the bootloader should be completely absent from the test plan. All other tests should be performed in Linux via standard driver and rootfs tools.
More about board and test and board bringup will feature in another report of a lesson learned.
<h2>Qi</h2>
While at Openmoko (mainly) I wrote a bootloader that meets these ideals, you can find it <a title="Qi git" href="http://git.warmcat.com/cgi-bin/cgit/qi/log/?h=txtr">in git here</a> One of the nicest things about it is that unlike the bloated bootloaders whose job never finishes trying to become Linux cargo cult style, Qi has been pretty much complete for a few months. It's a new job to support a new CPU, a much smaller job to add a new board and it doesn't want to talk to your peripherals anyway so no problem there.
Qi creates one binary per CPU, that supports all boards with that CPU. That sounds like a big job but we don't care about your peripherals so all boards with the same CPU look almost identical. You have to find something that can detect your particular board at runtime, for example NOR device ID read check. So there is zero build-time config and Qi generates all CPU support when it's buit, it takes 3 sec or so typically.
Typical bootloader binary size per CPU is 28-30KBytes. That supports VFAT, ext2/3/4 typcially the SD controller as well. The single Qi image also supports being booted from NAND, JTAG or SD Card on processors that support it just by being copied into place and without any changes.
There is zero bootloader private state, however Qi can look in the rootfs and append kernel commandline text from the content of a filesystem file. This maintains the rule that boot should be completely deterministic per rootfs.
Whirlygig Verification and rngtest analysis2009-05-21T00:00:00+08:00https://warmcat.com/2009/05/21/whirlygig-verification-and-rngtest-analysis<h2>ENT</h2>
Here is 300MB of random from the device checked by ENT (notice I am not using -b as I was before, without it it is checking entropy on BYTE scale which is tougher):
<pre>$ ./ent dump
Entropy = 7.999999 bits per byte.
Optimum compression would reduce the size
of this 306380800 byte file by 0 percent.
Chi square distribution for 306380800 samples is 253.74, and randomly
would exceed this value 51.06 percent of the times.
Arithmetic mean value of data bytes is 127.5022 (127.5 = random).
Monte Carlo value for Pi is 3.141608288 (error 0.00 percent).
Serial correlation coefficient is 0.000074 (totally uncorrelated = 0.0).</pre>
ENT gives better results for Whirlygig in line with how much you feed it. With a 40MB test file, it reported entropy of 7.999996. That makes sense when you consider the data being really random, it shows its true colours only in the longer term since sample by sample, it can be doing anything at all.
<h2>rngtest</h2>
Rngtest had always puzzled me so most of this post is devoted to picking apart the meaning from these results from 1.27Tbits of Whirlygig randomness (1271Gbits).
<pre>rngtest: bits received from input: 1271367467008
rngtest: FIPS 140-2 successes: 63517865
rngtest: FIPS 140-2 failures: 50508
rngtest: FIPS 140-2(2001-10-10) Monobit: 6560
rngtest: FIPS 140-2(2001-10-10) Poker: 6444
rngtest: FIPS 140-2(2001-10-10) Runs: 18865
rngtest: FIPS 140-2(2001-10-10) Long run: 18947
rngtest: FIPS 140-2(2001-10-10) Continuous run: 12
rngtest: input channel speed: (min=39.329; avg=8626.930; max=19531250.000)Kibits/s
rngtest: FIPS tests speed: (min=332.192; avg=105801.561; max=114217.836)Kibits/s
rngtest: Program run time: 155670833366 microseconds</pre>
Considering it calls itself "rngtest", at first sight there are a shocking number of "failures". Over 63,568,373 "tests", 50508 "failed". Is something wrong with Whirlygig? I went and studied the <a href="http://gkernel.cvs.sourceforge.net/viewvc/gkernel/rng-tools/fips.c?revision=1.5&view=markup">rngtest sources</a> to figure out what it was actually doing.
<h2>FIPS 140-2</h2>
rngtest is based on a <a href="http://www.scribd.com/doc/11305936/NIST-Statistical-Test-Suite-for-Random-and-PseudoRandom">document</a> from NIST which goes into detail about assessing random output. It's based on 2500-byte blocks of random data which have various tests applied to them. But since the source is meant to be truly random, what does it mean to "test" the packet? Any bit pattern can come in there, each is equally likely as any other, including a whole packet of 0 or 1. How can some be considered "bad"?
Actually a "bad" packet cannot be considered "bad" in isolation. Instead you have to look to the spread of packets meeting and "failing" the test criteria against the theoretical probability of their occurrence over time, to see if your random source has one kind of bias or another. An individual "bad" packet can't be said to be bad unless the history of failures is suggesting that there is a bias to generate these bad packets.
Unfortunately, I could not find any documentation about rngtest that explained the expected rate of failures from a genuinely random source. I managed to calculate two of the five.
<h2>Monobit</h2>
monobit is just looking for a 50% distribution of 1s in each 20000 bit packet. If a packet comes with 275 more 1s than 0s or 275 more 0s than 1s, then it's a fail. Obviously a packet with 1 or 10 extra bits is highly probable. I found out that these should follow a "normal distribution", but I was unable to calculate where on the curve "275 more or less 1s" should fall -- it's 0.0275 skew on the expected figure of 10,000.... if anyone can help me it would be most welcome.
In our case, the observed probability of a monobit packet from my Whirlygig was 0.000103, or 1:9690.
<h2>Poker</h2>
Poker is just looking at the distribution of nybbles It takes each byte as two 4-bit nybbles, and for each of the 5000 nybbles in the test packet, maintains a count of occurrences of 0 - 0xf. These counts are squared and then compared to two constants, greater than 1576928 or less than 1563176 for any nybble value gets you a fail.
Again I have no idea how to calculate the theoretical probability of a "failure" here, but our observed probability is 0.000101 or 1:9864.
<h2>Run</h2>
A run is a series of "all 1s" or "all 0s". rngtest is counting how many times it sees a run of length 1 through 6 (and run longer than 6 bits is counted as being six bits). The result for each count of run length occurrences is then compared against a magic table:
1-bit: 2315 < run < 2685
2-bit: 1114 < run < 1386
3-bit: 527 < run < 723
4-bit: 240 < run < 384
5-bit: 103 < run < 209
6-bit: 103 < run < 209 (sic)
Once again I couldn't find any estimate of probability of failing this test with a true random source. Our observed probability of failing it was 0.000296 or 1:3369.
<h2>Long run</h2>
For rngtest a "long run" is seeing 26 or more bits the same level at once. For any 26 bits, the chance of seeing a 26-bit run exactly is 2 in 2^26, or once every 32Mbits (there are two chances because it can be 0x3ffffff or 0x000000). However, to start the run it's also a requirement that the previous bit is the opposite level, so it's 2 in 2^27 chance, or 1 in 1^26 overall, 1.49 x 10^-8. For a 20000-bit test packet, that's 0.000298 or 1:3355 chance per packet.
We observed 18947 of these out of 63,568,373 test packets, it's <strong>exactly</strong> matching the theoretical chance of 0.000298 or 1:3355.
<h2>Continuous Run</h2>
A "continuous run" is just seeing the same 32-bit pattern twice in a row, considering 32-bit boundaries. For every 32-bits generated, there's a 1 : 2^32 chance that it matches the previous one (without having to know what that was). So the theoretical probability of these "failures" is <number of bits> / 32 / 4G, for 1.27TB in our sample it comes to 9.5. We observed 12. So this doesn't seem unreasonable.
So overall after studying each test, it's clear that a random source must fail rngtest with specific probabilities for each test. In no way is a "failure" on the rngtest tests in itself indicating a problem with the random source. But if your source does not cause the right amount of failures over time, that is indicating a problem with your source.
It seems wrongheaded then that rngd will reject individual packets that "fail" the rngtext / FIPS140 tests.
<h2>Dieharder with a vengence</h2>
Next I ran the current dieharder suite again, this is from the latest RPMs on Rober G Brown's site http://www.phy.duke.edu/~rgb/General/dieharder.php. I started running it directly hooked up to the RNG device /dew/hwrng, but then I realized that since a lot of the tests are looking for lagged correlation, in fact I needed to give it a file that it could meaningfully rewind into.
So I generated a 12GByte random file and fed it to dieharder -a (run all the tests). This got us the following summary (grepped just for the decision)
<pre>Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for Diehard Birthdays Test
Assessment: PASSED at > 5% for Diehard 32x32 Binary Rank Test
Assessment: PASSED at > 5% for Diehard 6x8 Binary Rank Test
Assessment: PASSED at > 5% for Diehard Bitstream Test
Assessment: PASSED at > 5% for Diehard OPSO
Assessment: PASSED at > 5% for Diehard OQSO Test
Assessment: PASSED at > 5% for Diehard DNA Test
Assessment: PASSED at > 5% for Diehard Count the 1s (stream) Test
Assessment: PASSED at > 5% for Diehard Count the 1s Test (byte)
Assessment: PASSED at > 5% for Diehard Parking Lot Test
Assessment: PASSED at > 5% for Diehard Minimum Distance (2d Circle) Test
Assessment: PASSED at > 5% for Diehard 3d Sphere (Minimum Distance) Test
Assessment: PASSED at > 5% for Diehard Squeeze Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: POSSIBLY WEAK at < 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for STS Monobit Test
Assessment: PASSED at > 5% for STS Runs Test
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: POSSIBLY WEAK at < 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: POOR at < 1% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for Lagged Sum Test</pre>
No way! Two "possibly weak" and one "poor". I read the manpage for dieharder and got the advice from there to run the tests more times, because if the data is bad, feeding it more skewed badness will make the failing distribution of p-values "unambiguous". Dieharder has a default of 10,000 tests, I cranked it up to 20,000 and ran them all again on the same 12GByte sample.
<pre>Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: POSSIBLY WEAK at < 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Bit Distribution Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Generalized Minimum Distance Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Permutations Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for RGB Lagged Sum Test
Assessment: PASSED at > 5% for Diehard Birthdays Test
Assessment: PASSED at > 5% for Diehard 32x32 Binary Rank Test
Assessment: PASSED at > 5% for Diehard 6x8 Binary Rank Test
Assessment: PASSED at > 5% for Diehard Bitstream Test
Assessment: PASSED at > 5% for Diehard OPSO
Assessment: PASSED at > 5% for Diehard OQSO Test
Assessment: PASSED at > 5% for Diehard DNA Test
Assessment: PASSED at > 5% for Diehard Count the 1s (stream) Test
Assessment: PASSED at > 5% for Diehard Count the 1s Test (byte)
Assessment: PASSED at > 5% for Diehard Parking Lot Test
Assessment: PASSED at > 5% for Diehard Minimum Distance (2d Circle) Test
Assessment: PASSED at > 5% for Diehard 3d Sphere (Minimum Distance) Test
Assessment: PASSED at > 5% for Diehard Squeeze Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Runs Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: PASSED at > 5% for Diehard Craps Test
Assessment: PASSED at > 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for Marsaglia and Tsang GCD Test
Assessment: PASSED at > 5% for STS Monobit Test
Assessment: PASSED at > 5% for STS Runs Test
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for STS Serial Test (Generalized)
Assessment: PASSED at > 5% for Lagged Sum Test</pre>
So the "poor" and "possibly weak" guys became happy when we doubled the number of tests, and there's a new "possibly weak" guy. But when I looked up the new guy's p-value, it was only 0.02888045, which is 1 in 34 chance, it doesn't seem that improbable (real dieharder failures tend to look like 0.00000001 or 0.99999998 an should look more like that the more tests you run).
<h2>Conclusion</h2>
So far as I can tell these results are good.
If anyone has enough math power to calculate the theoretical distribution of the rngtest monobit, poker and run rngtests I would be very grateful, so I can compare all the numbers. On the two I was able to calculate, we seem to be very close.
Dieharder seemed happy with double the tests and the one test it flagged then only had a probability of 1:34 which is not unreasonable.
Whirlygig PCB2009-05-21T00:00:00+08:00https://warmcat.com/2009/05/21/whirlygig-pcbI built the first prototype Whirlygig PCB last weekend, it's working well. For testing I left out the noncritical inductors and some caps. I also found the total current consumption at the USB side is 250mA with the CPLD macrocells in low power mode and 350mA with them in high power mode, comfortably within the 500mA USB budget. I decided to use the higher power mode because it should increase the ring oscillator frequencies and hence the randomness. The CPLD runs hot, around 40 degrees C.
<img src="/wb-pcb1-top.jpg" alt="" align="left" />
<img src="/wg-pcb-1-bot.png" alt="" align="left" />
<h2>Improvements</h2>
I took the opportunity to make some improvements:
- Added JTAG programming of the CPLD to the SiLabs microcontroller over USB. This allows change or update of the CPLD logic from the host PC without any hardware needed. However because the kernel module blocks the logical USB interface, it's safe from being rewritten while in use.
- Changed the random logic. I'll explain the changes and results in the rest of this article.
- Decreased the polling rate of the CPLD but increased the total USB random throughput, 1.0MBytes/sec sustained (for as long as you like) by making the code in the microcontroller "multithreaded". You can also plug in more Whirlygig devices to linearly increase random production; the kernel module allows hotplug and unplug without problems and combines the output seamlessly all in /dev/hwrng.
- I was pleased to see the kernel module had hardly bitrotted at all, it only needed a one-line edit to build a working module against a current Fedora Rawhide kernel.
The second LED lights while the PC is requesting random packets from the device. It lights briefly on plugging it in while the driver's cache is filled, then it only lights when something is using the hard random numbers on the PC.
<h2>New random scheme</h2>
I had three main ideas about improving the random hardware inside the CPLD.
First I realized we can decrease predictability by having more oscillators than are used at one time to change an output bit. We have 8 output bits, but we now have 16 oscillator sets. Instead of combining them all, on average several will not be used on any given operation.
<img src="/ring-rng-block.png" alt="" />
The second idea was that now we have a pool of oscillators greater than needed at any one time, we can randomly select from them for each output bit operation. So I added an additional 32 oscillator sets (4 for each output bit) which are only used to select which of the pool of 16 we use for any operation. The end result is that at least 8 oscillators from the pool will be unused for each operation, and which oscillators do get used for which bit are individually "random" with "no" correlation between output bits. This makes any attacker's attempt to model the pool oscillator states very tough because there's no longer any knowledge about which bit contains information about which pool oscillator, or even if its state has affected any output bit.
Lastly we now operate from a clock (24MHz) that is 14 times faster than the sample rate. This lets us mix 14 randomly chosen oscillator states by xor before the output is sampled for each bit. Even if two output bits were mixed with the same 14 oscillators, the order would have to be the same as well to get the same result, since the oscillators are never standing still. For this same reason selecting a pool oscillator more than once in the 14 operations is not equivalent to a NOP.
I added another small tweak, all of the random generators shift ther oiginal state by 1 generator on each clock. This is intended to reduce the impact of any hard nonliniarity in individual generator routing on the CPLD.
There were no problems with the PCB, but to save myself a headache working with the crossbar in the CPU I blobbed together pins 26 and 27 on the CPU.
In the next article we look at the random performance again with the new scheme.
Exhaustion and the GPL2008-05-23T00:00:00+08:00https://warmcat.com/2008/05/23/exhaustion-and-the-gpl<img class="alignleft" margin="5" src="http://warmcat.com/exhaustion.png" alt="exhaustion" />Some years ago I came across a guy Alexander Terekhov who worked then for IBM and had outspoken views about the viability of the GPL.
If I understood it, his opinion was that the license terms of the GPL would not survive resale, due to the well established <a href="http://en.wikipedia.org/wiki/First-sale_doctrine">"first sale doctrine"</a> and its EU equivalent <a href="http://en.wikipedia.org/wiki/Exhaustion_of_rights">"exhaustion"</a>. It basically means that the copyright holder cannot stop you reselling your software, and that the license terms will not apply to the guy receiving it.
I tried to understand this further, but Alexander was not always easy for me to comprehend and had then a habit of linking to his own posts elsewhere to bolster his position, leading to a kind of echo chamber of Terekhovs all nodding vigorously at each other. He also back then and evidently more recently too explained legal decisions that did not fit his understanding by <a href="http://www.mail-archive.com/gnu-misc-discuss@gnu.org/msg06021.html">calling the Judges in question "morons"</a>, etc. Well the forum I met him at had a very high trolling quotient so it just joined the rest of the anti-GPL sentiment there for me in the end and I ignored it.
<h3>GPL is a license too</h3>
But I was reminded of this last night when I read about a recent <a href="http://williampatry.blogspot.com/2008/05/first-sale-victory-in-vernor.html">decision against Autodesk</a> which is being widely seen as a victory for Joe Softwarebuyer. From the Patry blog post link above:
<blockquote>...many software companies have taken the position that they can convey the copy to the customer in an over-the-counter transaction for a one-time payment, but describe that transaction as a license; as a license, the first sale doctrine doesn't apply, meaning copyright owners can prevent further distribution of the copy...</blockquote>
Doesn't this vindicate Alexander's position? How can GPL terms stick past resale if Autodesk EULA ones don't? Nothing stops "built-in" or "automated" resale to clense software of any licensing restriction.
A lot of people seem to be happy about the paid-for world being freed from license conditions, are they going to be happy if it turns out that everyone is also freed from GPL conditions?
<h3>Civil infringement and Punishment</h3>
What effect would this have on contribution I wonder. It seems to me the real-world advantages from being active in a project by contributing will still apply. But it will enable private proprietary forking for products, the kind of thing that Harald Welte's <a href="http://gpl-violations.org">gp-violations.org</a> has had success attacking and punishing to date. Contributors will see their work used in commercial products without the changes being open.
But the BSD folks seem to survive this outrage without it removing their motivation. And from time spent looking at music licensing over the years, I kind of recognize an element of proprietary vindictiveness in gpl-violations... of course the member companies hiding behind the RIAA attacks are also "perfectly within their rights" to embark on much worse vindictive destruction, but they are not entirely dissimilar and that always bothered me.
<h3>Playing ball or going home?</h3>
Well, this decision is subject to appeal, will only apply to the jurisdiction of that court, etc, so the sky didn't fall in already. But there is quite a bit of harmonization of copyright law thanks to the insistence of rich rightholder companies mainly from the US side. But if this is upheld, it may come to contaminate most Western countries and turn GPL terms in unenforcable noise -- the choices would be in effect public domain or closed.
I guess some people will go closed rather than have their work exploited, but I expect most people will just continue on, and contributions will continue to come perfectly fine. The advantages from being a visible contributor and taking upstream directly are still going to apply, so will the bitrot that happens to any additional code put on top and maintained privately.
<h3>Too mature to care?</h3>
Maybe now we reached a point that the social, financial, engineering and public advantages from cooperation are ingrained enough that we don't need a license to protect them anyway? But I read this and I feel a sinking feeling about the naivity of such a proposal.
Whirlygig GPL'd HWRNG2007-11-24T00:00:00+08:00https://warmcat.com/2007/11/24/whirlygig-gpld-hwrng<img src="/whirlygig-logo.png" align=left hspace=5>
<h3>Hardware random for the masses</h3>
I made available the result of the ring oscillator random generator as a GPL project <a href="http://warmcat.com/_wp/whirlygig-rng/">called Whirlygig</a>. It's a 2.75cm x 4cm PCB with a mini USB connector, it provides a sustained 5.5Mbps (~620KBytes/sec) of apparently very high quality random bits using the Linux hw_random API. The large amount of randomness should make it useful for statistical tests as well as hard crypto.
I prototyped it using a couple of boards I had lying around, so I know it works fine, but I am waiting for the PCBs to come back from fabrication to actually build a final one. I placed the CPLD VHDL, the board hardware design, the driver software and the firmware for the USB controller into <a href="http://git.warmcat.com">http://git.warmcat.com</a>.
<h3>Dieharder</h3>
I spent some time worrying about how to test the quality of the result -- I found that "diehard" mentioned in an earlier post has been superceded by <a href="http://www.phy.duke.edu/~rgb/General/dieharder.php">"dieharder"</a>. This has a much tougher general testing regime, even though many of its test are reproductions of the diehard ones -- it runs each test many times and forms histograms of the p-value results from the many runs, and gives an assessment of fail, poor, possibly weak or pass on the spread of results rather than a single result.
At first the RNG failed three of the 18 tests, but on looking closer one of the tests (#2) currently fails for all RNG input and is marked up as not for use with assessing RNG quality, and the two others required by default more than the 400MBytes of randomness I had prepared. Unfortunately in that case they simply rewind the randomness file and re-use the same data to make up the balance! Of course this is no longer quite "random". When I adjusted those two tests to use a smaller sample that fitted into the 400MBytes without repetition, the output of the RNG get a "pass" on all 17 of the relevant dieharder suite tests.
<h3>Max Entropy</h3>
During the validation phase I changed the RNG algorithm in the CPLD significantly. The scheme is described on the project page, but basically I moved away from a bit-centric to a byte-centric design with 8 identical sets of 3 oscillators. To stop any characteristic of a particular oscillator's routing from being associated with a particular bit of the result byte and creating a bias, I introduced a "mixer" that first generates 8 random bits by combining six oscillator outputs each with XOR, then rotates these oscillator sets between the result bits sequentially at 24MHz. I also removed the toggling action and used the random bit directly.
I also found the Linux rng-tools suite which repeatedly runs FIPS-140-2 tests on the bits, this fails 1 in 1200 or so packets of testing over 20 billion bits, I believe this is normal for a real random generator that it will produce sequences with low probability that don't look very random in the short term.
Aside from passing dieharder and FIPS-140-2, the changes also got me a reported 8.000000 bits of entropy per byte from the ENT test, so there are reasons to imagine the quality of the output is very good.
FIPS-140-2 and ENT validation vs ring RNG2007-11-15T00:00:00+08:00https://warmcat.com/2007/11/15/fips-140-2-and-ent-validation-vs-ring-rng<a href="http://csrc.nist.gov/groups/ST/toolkit/rng/batteries_stats_test.html">NIST</a> lists some more test suites. NIST also have their own suite, but it is now Windows-only, and lacks a necessary DLL to run there. The last UNIX version segfaulted here before giving any results... sigh.
I ran the last 10MByte sample against <a href="http://www.fourmilab.ch/random/">ENT</a> and <a href="http://www.iro.umontreal.ca/~simardr/testu01/TestU01.zip">TestU01</a>... to cut a long story short
<blockquote><font size=-2>$ ./ent ../die.c/dump3
Entropy = 7.999980 bits per byte.
Optimum compression would reduce the size
of this 10002432 byte file by 0 percent.
Chi square distribution for 10002432 samples is 281.26, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.4958 (127.5 = random).
Monte Carlo value for Pi is 3.140111525 (error 0.05 percent).
Serial correlation coefficient is -0.000212 (totally uncorrelated = 0.0).</font></blockquote>
7.9999 bits of entropy per byte! TestU01 is less turnkey than the other suites -- it's literally a test library with some example code. I amended an example to call the FIPS-140-2 tests:
<blockquote><pre><font size=-2>============== Summary results of FIPS-140-2 ==============
File: dump3
Number of bits: 20000
Test s-value p-value FIPS Decision
--------------------------------------------------------
Monobit 9933 0.83 Pass
Poker 11.88 0.69 Pass
0 Runs, length 1: 2482 Pass
0 Runs, length 2: 1227 Pass
0 Runs, length 3: 630 Pass
0 Runs, length 4: 319 Pass
0 Runs, length 5: 161 Pass
0 Runs, length 6+: 166 Pass
1 Runs, length 1: 2466 Pass
1 Runs, length 2: 1302 Pass
1 Runs, length 3: 620 Pass
1 Runs, length 4: 311 Pass
1 Runs, length 5: 140 Pass
1 Runs, length 6+: 146 Pass
Longest run of 0: 16 0.14 Pass
Longest run of 1: 14 0.46 Pass
----------------------------------------------------------
All values are within the required intervals of FIPS-140-2</font></pre></blockquote>
So the design's output is compliant to FIPS-140-2, a requirement for many uses.
Diehard validation vs ring RNG2007-11-14T00:00:00+08:00https://warmcat.com/2007/11/14/diehard-validation-vs-ring-rng<img src="/catbowl1.png" align=left hspace=5>
<h3>RNG Quality assessment</h3>
A timely article flew by on Reddit about the <a href="http://en.wikipedia.org/wiki/RANDU">RANDU</a> pseudo-random generator algorithm widely used in the 1960s, which it turns out was very flawed indeed. It was explained to one student that ''We guarantee that each number is random individually, but we don't guarantee that more than one of them is random''. Basically it produced numbers that belonged to one of 15 "planar" groupings and nothing in the gaps between the planes. It isn't just a minor annoyance, because many statistical studies in the 60s and 70s used it, and it can easily have contaminated their results. That's definitely not what I am trying to reproduce with the ring oscillator device -- so how can I figure out how "good" the randomness is in an objective way?
<h3>RNG quality test suites</h3>
It turns out that empirically testing RNG outputs has been the subject of a lot of work for decades, and there are some established testing suites available online. A major one seems to be the "<a href="http://stat.fsu.edu/pub/diehard/">diehard</a>" suite -- I guess it is a pun on die as the plural of dice.
It needs you to fetch 10M bytes of random numbers or more and let it run a bunch of tests on them. The output was a little hard to assess initially: most tests issue a "p" number which only suggests something is bad if it is 0.000... OR 0.999.... All other numbers inbetween are to be taken as a good result as I understood it. Except there is a warning that even good RNGs can produce the occasional test fail.
<blockquote> Thus you should not be surprised with occasional p-values near 0 or 1, such as .0012 or .9983. When a bit stream really FAILS BIG, you will get p`s of 0 or 1 to six or more places. By all means, do not, as a Statistician might, think that a p < .025 or p> .975 means that the RNG has "failed the test at the .05 level". Such p`s happen among the hundreds that DIEHARD produces, even with good RNGs. So keep in mind that "p happens"</blockquote>
I duly fetched 10M bytes of 115kbps randomness from the device and fed it to diehard. It seemed to give fine results except on "Count the 1s stream" and "Squeeze" (devastating p=0.000000), "Count the 1s specific" for bits 1-11 (p=0.000030) and 9-16 (p=0.000064), and QQSO 2-6 (p=0.000005). It passed the dozens of other tests but it was disappointing, looks like a big fat 'failed'.
<h3>Triple Scoop</h3>Well, since my test CPLD was an XC95288XL with 288 Macrocells to burn, I naturally wondered if I could improve matters by tripling the amount of ring oscillators getting Xor-ed -- that is to implement the three varying sized oscillators 3 times each, totaling nine, and sum them with a big XOR. They'll all be drifting around individually as much as together, it should be a mighty noise-fest.
I edited the VHDL and blew it into the CPLD... visually the summed RNG output "bit" was an awful lot more noisy than before. I pulled another 10M bytes from that setup: but just looking at the byte distribution as I did before told me something is still up.
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-2.png"></p>
That sawtooth type distribution is "not random" to coin a phrase. If you look at the large jump at 0x80 (128) it is telling us that we are more likely to get 1000000 binary than we are to get 01111111, in other words, since this is over 10M bytes, there is a distribution problem favouring '0'. When I analyze the distributions of 1s and 0s I find
<table><tr><td><pre>0: 40436204, 1: 39563804... delta=872400, skew=1.090500%</pre></td></tr></table>
You can see the same thing even better looking at 0x00 (42,000 hits) vs 0xFF (36,000 hits), they are like 8% off the median of 39,000. Clearly that distribution of 1s and 0s has to have a very small skew to stop these kinds of effects showing up, and equally clearly this is telling us something deep about the RNG hardware.
<h3>Spiky</h3>Although the individual oscillators are quite slow thanks to the number of inverter stages, at 4 - 6MHz, the way they are being summed makes for trouble from bandwidth limitations inside the CPLD. At the moment it just uses a dumb asynchronous XOR action, that means that potentially very fast spikes can be seen when one "slow" oscillator changes state very shortly after another "slow" oscillator. For example:
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0002tek.jpg"></p>
You can see on the left (this is 5ns/div notice) a runt pulse where this happened, the XOR was convinced to rise by one oscillator changing and then countermanded when another oscillator changed state less than 5ns later, resulting in a doubtful pulse that was probably not visible as a '1'. This also happens when going from '1' to '0', but maybe the threshold for the transistors in the CPLD is not at exactly 50% of the 3.3V supply. So we suddenly have it seeing more '0's than '1's on average when spikes are involved.
This whole high bandwidth summing step is completely needless, it's only there because it is a literal interpretation of the diagram in the original RFC. I changed it instead to have nine latches sample the nine oscillators every 125ns (there is an 8MHz clock on the prototype board) and sum those results with XORs into a single bit. In turn this output is sampled by another latch at 8MHz to hide any metastability.
<h3>Latched up</h3>
The latched summing version performs much better and has gotten rid of most of the bit skew, and the sawtooth behaviour:
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-3.png"></p>
...but there is still a problem with 0x00.... the bit skew looks like this
<table><tr><td><pre>0: 39960076, 1: 40039932... delta=79856, skew=0.099820%</pre></td></tr></table>
so the skew is now on the side of '1's but only by 0.1%. You can see the byte count spread is much tighter than before too -- 1800 instead of 6000 counts before.
<h3>Balancing out the skew</h3>
Well if the remaining skew is something to do with the ratio of rise to fall times, or the non-squareness of the oscillator outputs for some other reason by something as low as 0.1%, that is hard to do much about, especially as it may vary on the specific silicon die.
But it shouldn't matter -- now the bandwidth situation at the XOR summer is sane, if we invert the summed output 50% of the time it should spread any excess on '1's or '0's to the opposite as well, cancelling any bias. I added a couple of terms to the summer to xor against the UART bit index LSB and a bit which toggles after every byte sent by the UART. It's the equivalent of xor with 0x55 for the first byte and then 0xAA for the second byte, over and over.
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-4.png"></p>
That glitch in the middle is actually at 134 (0x86), maybe it is random but I guess we will see.... the skew is further reduced as anticipated
<table><tr><td><pre>0: 39974218, 1: 40025790... delta=51572, skew=0.064465%</pre></td></tr></table>
<h3>Diehard sequel</h3>
I ran 10M bytes from this version through Diehard again... the really bad p-value results are gone. For example Squeeze was a deadly 0.000000 before and is now 0.255260.
I made one last adjustment, I added the current state of the latched random value to the XOR term. That means it decides whether to keep or invert the latched value, it no longer directly accepts the value from the RNG. This got me to the promised land: 0.0005% skew between '1' and '0'.
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/byte-dist-10m-5.png"></p>
<table><tr><td><pre>0: 40000206, 1: 39999802... delta=404, skew=0.000505%</pre></td></tr></table>
This also gets me the apparently good diehard results with no obvious failures on any tests, you can see the actual results <a href="/diehard.txt">here</a>. So it seems the current version can tentatively be called a "real RNG".
Ring oscillator RNG performance2007-11-12T00:00:00+08:00https://warmcat.com/2007/11/12/ring-oscillator-rng-performance<img src="/dawg.png" align=left hspace=5>
<h3>Pretty random</h3>After some scrabbling around porting my Jtag SVF interpreter to Octotux and creating a kernel module for the PIO end of it -- and moving to a different board with a XC95288XL CPLD to prototype it, the triple ring oscillator RNG is working. It issues a 9600 baud result, but after some initial confusion I modified it 1/8th of the time to sit out a sample time leaving "break" on the serial line. This should make sure that the receiving UART does not get confused by the data as a start bit. The true data rate is something like 800 random bytes per second at 9600 baud.
Here are the three chains of inverters (19, 23 and 29 long) oscillating at the different fundamentals
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0016tek.jpg" height=263></p><br>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0017tek.jpg" align=center height=263></p><br>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0018tek.jpg" align=center height=263></p>
... and here is what the xor summing looks like, first over 1s then sampled once.
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0019tek.jpg" align=center height=263></p><br>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/f0020tek.jpg" align=center height=263></p>
Although the single shot sample doesn't look very random, the oscillators are drifting around all the time. If you wait a little while between samples (currently it is 104us, a 9600 baud bit-period) it's pretty hard to guess what phase all the oscillators have drifted to -- at least, that's the plan.
<h3>Distribution of binary levels</h3>The first test I did was to see what the distribution of '1' and '0' in the results was... clearly if the device is really random it should on average be 50% each. I fetched 1M random bytes, or 8Mbits:
<table align=center><tr><td>0: 4008913, 1: 3991095... delta=17818, skew=0.222725%</td></tr></table>
Its okay for a really random source to deviate to 50:50 at any given time, although on average it should be 50:50.
<h3>Octet distribution</h3>Next I looked at the distribution of the results from 0x00 through 0xFF as the result "random byte". This would show up if the RNG fails to ever issue some result or favours certain results over others -- every result should on average have an equal chance of showing up and so an equal count. I ran it for 1M random bytes...
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/rng-dist-1.png" align=center></p>
This is pretty decent, every possible result is seen with a frequency within +/-200 counts of the 3,900 average after 1M bytes.
<h3>115200 baud results</h3>Encouraged by this I cranked the baud rate up to 115220 or 8.68us between samples and around 10K random bytes per second. The skew is increased somewhat and the spread of result counts is increased a little.
<table align=center><tr><td>0: 4028746, 1: 3971262... delta=57484, skew=0.718549%</td></tr></table>
<p style="text-align:center; margin-top:0px; margin-bottom:0px; padding:0px;"><img src="/rng-dist-2.png" align=center></p>
So far so good!
Adding entropy to /dev/random2007-11-07T00:00:00+08:00https://warmcat.com/2007/11/07/adding-entropy-to-devrandom<img src="/buffalo.png" align=left hspace=5>
<h3>A hard RNG is good to find</h3>The recent statistical analysis for drumbeat reminded me I could do with a proper source of random numbers, not generated by a pseudorandom feedback action. Back in the early 1990s I was looking at statistical profiling of execution on microcontrollers, I was surprised then to discover that only by making the sampling period random could I get a true picture of execution distribution. If the address bus was sampled at a fixed rate, say 100kHz, instead of a true picture it would be distorted by activity that was happening at some fraction or harmonic of the sampling frequency. So you would alias out pieces of loops completely or get a bloated count for other areas. Only by true randomness in the sampling timing could you see the reality -- a paradox.
<h3>Analogue RNG methodologies</h3>A Google or two around showed that most of the techniques are analogue one way or the other. Many of the methods suffer from a problematic need to amplify some very tiny source of noise, a Zener diode or avalanche transistor junction, by really huge amounts, 90dB or more. There are a couple of suppliers of RF "noise diodes" with flat spectra across a wide frequency range, but they are hard to source.
<h3>Digital non-pseudorandom technique</h3>However there is one technique which while still relying on analogue noise is basically digital -- to run multiple chains of unlocked inverting oscillators and xor the outputs. The unlocked oscillators have no reference at all, they're basically an inverter fed back on its own input -- in fact a chain of inverters. Such a circuit oscillates according to the period of the total delay through the inverter chain... and that is highly sensitive to temperature. Normally with synchronous digital design we choose a clock rate for a circuit that is just below the maximum possible at the worst temperature it is expected to operate at -- and after that we can forget about temperature. But with this asynchronous unlocked oscillator concept, the micro- and macro- temperature dependence is revealed in all its freaky glory, causing the oscillation to drift unpredictably slightly every cycle and over larger period with gross temperature fluctuations.
<h3>RFC4086</h3><a href="http://tools.ietf.org/html/rfc4086">RFC4086</a> mentions a recommendation for a RNG based on unlocked inverter chains that is found in IEEE 802.11i.
<blockquote><pre>
|\ |\ |\
+-->| >0-->| >0-- 19 total --| >0--+-------+
| |/ |/ |/ | |
| | |
+----------------------------------+ V
+-----+
|\ |\ |\ | | output
+-->| >0-->| >0-- 23 total --| >0--+--->| XOR |------>
| |/ |/ |/ | | |
| | +-----+
+----------------------------------+ ^ ^
| |
|\ |\ |\ | |
+-->| >0-->| >0-- 29 total --| >0--+------+ |
| |/ |/ |/ | |
| | |
+----------------------------------+ |
|
Other randomness, if available ---------+</pre></blockquote>This has three unlocked, wandering oscillator chains of different lengths being summed at an XOR gate.
<h3>Implementing the RFC4086 RNG</h3>Since it needs 71 inverters, you would need 12 74hc04 or similar, it makes more sense to put it all in one CPLD. I have an old XC95108 lying around, so I wrote up the design in VHDL and added a UART interface to issue the sampled random data. This brings up the issue of how quickly it can be sampled and still get high quality randomness... clearly if we sampled it at 10ps it wouldn't be very random at all, since it didn't have time to change between samples. On the other hand if we sampled it at some high multiple of the fastest free-running oscillator period, then there is a lot of opportunity for each oscillator phase to have been affected over the longer time. By using the UART we can control how often we sample the RNG by the baud rate... I initially set it to 9600 baud or 104us/sample. The oscillators should have periods on the order of 150 - 200ns (5 - 6MHz), so this is allowing 500+ cycles of jitter to accumulate in each oscillator before the summed sample is taken.
I'm currently waiting for a programming tool to be delivered so I can program another device to allow programming the XC95108 -- I no longer have any PCs with a printer port I realized yesterday. I am very interested to see what the performance and quality of the randomness is like!
Drumbeat2007-10-25T00:00:00+08:00https://warmcat.com/2007/10/25/drumbeat<img src="/drumbeat.png" align=left hspace=5>The magic code project has gained a name and there are some new results to share.
The full sources for the experimental modem using the correlator codes is available at <a href="http://git.warmcat.com">http://git.warmcat.com</a> under GPL2+ license.
The big change is what I call "scrambling". The 126-bit magic correlator sequence is disordered and xor-ed in 258 random ways,which are selected to not correlate well with each other at any offset (less than 43/128 match). This allows us to issue whole bytes in one code by selecting which disordered correlation code to transmit. The other two codes are for start and end of packet markers. I first tried 22 scrambles to allow 4 bits per code and some extra codes, this worked fine but I was able to extend it to 258 without really damaging the "best" scramble-scramble false correlation score too much.
The gain here is that the self-ordering and threshold properties of the code now reaches up to entire bytes: you always get a clean, aligned byte or you don't get anything: with high probability you don't get a wrong byte.
I also changed the demodulator code to something that is currently quite expensive to run. Instead of tracking a reference phase at the receiver, which is tough to do in the presence of extreme noise and phase wrapping, at each sample it tries to demodulate using that sample as "0 degrees" and a 50% phase slicer to get the bits. It's expensive because it has to do that against each of the 258 scrambles every sample -- but on the plus side the correlator does not run <strong>at all</strong> for 124 symbols out of 126 (98.4% of the time) after there has been a successful decode, since it knows it is partway through a 126-symbol code. Still the number of demodulation attempts can definitely be reduced, maybe by subsequently just doing it on the last winning phase and a few either side.
I found that the noise performance of the demodulator is strongly dependent on the relationship between the sample rate and the carrier frequency. It's much less dependent on the number of carrier cycle periods per symbol, so long as you have 4 or 5 or above (2 works but is killed by hardly any noise). Since there is plenty of filtering to the carrier going on I didn't really expect that. Here is a graph of the noise performance when there are 96 samples per carrier (48kHz sample rate, 500Hz carrier, 100baud -- these are intra-code symbols, it's 6.25bits/sec effective):
<img src="/scram258-48kHz-500-100.png" align=center>
The noise performance doesn't look too bad, at least with these synthetic tests and unrealistic 48kHz sampling. It can recover all 7 bytes that are sent even at -28dB, when only 4% of the received power is the signal and the rest is white noise. And because of the code properties, these are definite bytes being captured with quite high probability, not the kind of uncertain decode that would normally be expected in such noise.
The "average quality %" shown in the graph is the percentage of demodulated bits in the matched code scramble that actually were "right", averaged over all the bytes. If a byte is missing because its quality was below the threshold of 70/128, it counts as 0% quality. Up until -20dB, the demodulator is doing well and the recovering individual bits almost perfectly. Without the Correlator code, after -20dB you would be dealing with a rapid increase in bit errors from the demodulator and relying on ECC. By using the Correlator coding though, we are able to push performance another 8dB into the noise (we even recover half the symbols at -30dB, or 10dB further) and still maintain the alignment and high probability of correct decode advantages.
This shows the effect of keeping everything else the same, but bringing the sample rate down to 8kHz from 48kHz
<img src="/scram258-8kHz-500-100.png" align=center>
You can see the BPSK demodulation starts to fail at -10dB instead of -20dB and the correlation code again buys you another 8 - 10dB into the noise after that.
Here is the spectrum after bandpass and lowpass filtering that is actually "transmitted", the peak is the carrier at 500Hz in this case.
<img src="/drumbeat-tx-spectrum.png" align=center>
Another aspect of this setup is that although the demodulator is currently pretty expensive in CPU (and power), the modulator is much simpler. It just requires a 1KByte table for the scrambles and a precooked integer sine table if you are running a separate carrier than the RF one itself. Then it just needs enough logic to walk through the scramble table entry at the symbol rate (and the sine tables at the sample rate if you're using it). That can fit in part of a tiny flash controller, using a 1-bit output of a shifter to switch the phase of the transmitter. It can be simplified even further by not using scrambles but just sending a bit per code sending the code forwards or backwards to signal a '1' or '0'.
It seems that I can begin to understand where this system fits into the existing high noise codings already used by ham radio folks. The correlator code is a special case of using an ECC code, in this case where 8-ish bits are exploded into an 126-bit "space" filled with correct decodes, damaged but recoverable decodes and invalid decodes. The error correction performance of adding 8-10dB "coding gain" is probably a bit (not so much, from what I can work out) poorer than an optimal use of 126 bits to code for 8, but the advantages that attracted me to the code in the first place can offset that for some applications, the guaranteed self alignment right down from demodulation to byte boundaries, and a quite firm decode success threshold. In addition, it seems some of the more optimal decoding schemes for stuff like Viterbi can be very compute-intensive, whereas although at times we do a lot of it, the correlation action is simple and lends itself to being done in parallel. Turbo codes are patented. So it's not a one size fits all technique, but it has its niche.
I guess the next stage is looking to BPSK modulate and recover an RF carrier directly, but that will need some thinking on because it will be a very frequency-specific design, unlike these baseband tests where you can just edit the important variables and recompile. For example if it can be done initially at the UK 40MHz ISM band it will be possible to consider logic looking at carrier zero-crossings for phase assessment easily enough, and to autocorrelate just the averaged recovered phase at 4 times the symbol rate.
CE Technical Documentation2007-10-25T00:00:00+08:00https://warmcat.com/2007/10/25/ce-technical-documentation<img src="/standards-compliance.png" align=left hspace=5>The other week I went on a workshop to learn more about the new 89/336/EEC regulations that came into force in the UK on 20th July 2007. Here are some notes cribbed from my notes, they're intended to be an overview: obviously for something this important you should get your own advice.
<h3>Out with the old ways...</h3>
For a long time it has been a requirement to certify that any product you manufacture for sale in the EU meets the "relevant standards", so it can have a "CE" mark. Until this July in the UK you could either do that by:
<ul>
<li>paying for test at a "competent body", a company with a ton of test gear that will empirically test your device against the emission and immunity standards, or</li>
<li>Writing up a Technical Construction File, or TCF, which described the product design in a deep way, and included tests and logic showing why you are compliant</li>
</ul>
<blockquote><h3>Device torture at the House of Pain</h3>
The last design I completed, for a smart 4-channel Analogue telephony device that can hook to the Internet, I went down the empirical test route. At a total cost of several thousand pounds the production device was tested at a real competent body with calibrated receivers and emitters, blasted with wideband radio signals, zapped with +/- 8kV discharges. The resulting report gave a very clear okay except on a minor issue to do with the AC power supply we had used. We also had to do specialized testing for the analogue telephony end, which we again passed, although not until getting a component supplier to make a special that actually complies with the standard.
(Actually walking the device through this testing is a pretty sweaty business, since time is literally money at the test facility and wiggle room in the case of trouble is also in short supply. In one instance for example I was able to patch the sources in realtime when an issue came up during ESD testing that broke normal operation but wasn't enough to trigger the watchdog, turning a fail into a pass. So I would never send a device unaccompanied for testing, or go to a test house without a laptop with full sources to expect the unexpected.)</blockquote>
<h3>In with the new ways...</h3>
However the demands in the new regulations have changed significantly. You must now generate "Technical Documentation" for any product you will be selling in the EU. This is basically the old TCF route to compliance, but it doesn't itself necessarily remove or even perhaps reduce the need for absolute tests for a given device.
Less well known is that if you are still selling devices first sold before 20 July 2007 come 20 July 2009, you will need to have made a new style Technical Documentation for them, or stop selling them. A lot of tech products from 2007 will be old hat by 2009 solving the problem, but it is not true in all markets.
Typically the Technical Director of the company is the "responsible" as the French say who must sign off that the device meets the standards. What you are signing off on is that WHEN it is properly installed and maintained, and used for the purpose it is intended for:
<ul>
<li>the device creates an EM disturbance low enough that radio and telecoms equipment can operate as intended</li>
<li>it has a level of intrinsic immunity which is adequate to enable it to operate as intended</li>
</ul>
<h3>So what is done with this "Technical Documentation"?</h3>
Nothing if you're lucky. The only people who can ask to look at it are the regulatory authorities, OFCOM in the UK. You don't publish it or register a copy of it. But you have to keep it for ten years after the last sale of the device for the authorities to ask for. It literally only exists to keep the signatory out of jail if the authorities ask for it. Not kidding about the jail -- if you don't have a satisfactory Technical Documentation to show, the criminal penalties can include a GBP5,000 fine and/or 3 months in jail.
The key words about the Technical Documentation are that it should be "reasonable" and "duly diligent", as in "All reasonable steps are exercised and all due diligence to avoid committing the offense". <strong>That really sums up the job of writing it, you are trying to have an answer for anything that could be said was unreasonable or not duly diligent.</strong> While meeting budget constraints from the customer :-/
<h3>Spread of outcomes</h3>
The ways that problems might pan out were discussed informally. It was proposed that roughly a third of companies, the presenter reckoned, have their head totally in the sand about it, and could expect trouble. Another third had made some effort in the right direction and another third spent the money and were golden. Another factor in how much shit would rain down in the event of problems was the number of devices sold, if it was millions and they were crap, expect maximum warp to jail. If it is five and they don't quite comply despite obvious efforts to prove it, maybe that won't be so bad. But who knows, some overkill is called for.
<blockquote><h3>How likely is my Technical Documentation to be demanded?</h3>
In Germany, we were told, the authorities have a system of testing 10,000 models of devices a year, spread over the various types of product. In the first year (IIRC) it resulted in 105 prosecutions :-/
Another tidbit is in the UK, OFCOM are allegedly looking at training up 85 new enforcement officers. The mobile phone companies, due to the ruinously expensive spectrum auctions of a few years ago, are apparently agitating for more enforcement of the cleanliness of their expensive 3G spectrum.</blockquote>
<h3>What goes in the Technical Documentation then?</h3>
Here is the briefest outline:
<ul>
<li>Description of apparatus - brand/model/manufacturer, intended function, limitations on operation... Technical description - block diagram, technical drawings, interconnections, variations, versions of design documents referenced</li>
<li>Procedures used to ensure conformity - Technical Rationale: what you're testing against, why you did particular tests; Details of design: EMC features, component specifications, QA to control variation; Test data: Logical processes to decide if the tests are adequate, EMC tests and their results, external test reports on subassemblies/components</li>
</ul>
You can also get a Competent Body to "comment" on your Technical Documentation, as some fairly convincing assurance that it is adequate. This is really a seal on the "due diligence" aspect so you can really show you totally ticked every box to make sure it was compliant, but I guess only large companies can afford it.
<h3>Conclusion</h3>
If you manufacture or import stuff to sell in the EU, you are going to have to have Technical Documentation to keep yourself out of jail.
For a standalone device, that means you're really going to have to not only look to dealing with EMC early in the design, with some kind of inhouse testing ability, but find the budget of a few thousand pounds to take it for testing at a Competent Body so you have something convincing and calibrated to put into that Technical Documentation.
Not only that, but even determining which are the applicable standards is a huge headache if you try to do it yourself, there are hundreds of them: a Competent body can also help select the basic issue of which tests you are targeting.
But it's not all bad -- if you make product variations around the same base, you can choose which variation to actually test as a baseline, and then for each variation see if it stands up to show they would not push the original base design over the edge. I have done this in the last couple of weeks, creating for a customer Technical Documentation for a sister device to one that went through actual testing at a Competent Body, and using the very large similarities to limit the amount of retesting needed.
There are definite advantages to requiring this level of design scrutiny and justification, but the change to requiring Technical Documentation and the trend to increased enforcement over the ten years you must keep the documentation has definitely pushed the minimum cost and effort of bringing something to market up several notches.
Heading deeper into the noise2007-09-28T00:00:00+08:00https://warmcat.com/2007/09/28/heading-deeper-into-the-noise<img src="/trout.png" align=left hspace = 5>
<h3>QPSK abandoned for BPSK</h3>The noise performance of the QPSK decoder wasn't what I was hoping after several iterations. I pretty quickly threw out the Costas loop because loss of phase lock was the weakest link in the chain, and moved to a system with a tight bandpass filter at the carrier frequency, and measuring zero crossings of the carrier against a local oscillator. But still the noise performance was not impressive at 300baud / 3kHz carrier.
I had hoped to go back to a damaged QPSK code recovery and to correct the data phase on the broken code bits, because we know what they should be according to the code. But there was no visible commonality between what was happening on the phase information that carried the code and the phase information carrying the data.
So I have moved back for now to a two-phase BPSK system interleaving the code and the payload symbol by symbol, in order to understand better how far the code can be pushed with noise.
<h3>Improved noise performance</h3>Currently I use a 4.8kHz carrier and 1200baud symbols. Here is the performance today:
<img src="/1200baud-psk-noise.png" align=center>
The RED line is the best decode seen for the correct match offset, the BLUE line is the best decode for all other (wrong) offsets. The PURPLE line is the mean decode for the correct match offset over the 100 runs. GREEN is the "quality cutoff" basically keeping us from getting near the blue line's false hits: because of the properties of the code it is extremely unlikely noise will get near the green line.
Here is a close-up on the noisy end of things:
<img src="/1200baud-psk-noise-2.png" align=center>
So far in terms of detecting the code, we can on average do so when the input energy is 82% noise (-15dB SNR). We can still detect codes ~1% of the time even at 92% noise (-21dB SNR). Here is a further close-up (it is 1000 runs, not 100)
<img src="/1200baud-psk-noise-3.png" align=center>
suggesting you can still get good recoveries ~0.1% of the time at 95% noise (-25.5dB SNR). And of course we are now measuring the whole system performance here including the demodulator part, not just damaging the code bits directly.
<h3>Current BPSK receiver</h3>Here is the receiver for the current BPSK method:
<img src="/rx-psk1.png" align=center>
I spent several days meddling with the QPSK version and arriving at the carrier zero-crossing method for phase detection. The original plan to have a symbol sampler running from a locked LO hung around causing lots of problems. The indications of a change in symbol -- detected by zero crossings -- were variously delayed by the phase itself and the filters used, making it difficult to convert their jittery indications into a guide for the symbol recovery clock. This resulted in double bits being sampled.
<h3>The code is the symbol clock</h3>Because of this I eventually realized that no symbol clock was needed, by running the correlator at the sample frequency, and sampling symbols at a fixed period, because of the false match rejection properties of the code it would "discover" the correct phase and offset from the behaviour of the correlator output. And when the code was recovered best then the data interleaved with it will be recovered best too. This sounds power-unfriendly, but after the first offset is used, you don't run the correlator until enough time has passed for 256 symbols to be acquired, and then you should still be locked from when you did the first 256 symbols: if not you run the correlator a few dozen times to find lock again.
Also of note is that what is stored in the ringbuffer is a weighted average of the last four phase results, this includes information from the phase of multiple carrier cycles for the same symbol, helping to reduce the effect of noise on the decode.
<h3>The code is the data!</h3>Well recovering the code at high SNR is interesting, but how useful is it if the data bits interleaved with it have been subject to the same beating without the properties of the code to protect them? We can use the autosyncing properties of the code to help with trying to get some payload signal gain through averaging, but I think I have seen where this is headed now... the code IS the signalling system for the payload data. That means throwing out the interleaved data concept.
A '0' can be signalled as the normal code, and a '1' as a time-reversed code. Because it's intended for low data rate communication at VERY low signal levels compared to noise, it's okay if we are reduced to 1 byte/sec, which will be the end result of this at 1200 baud. Basically with this we carry over all of the great robustness qualities of the code to be attributes of the payload data.
So what is the point compared to just blasting 128 symbol times of the same phase carrier, which is a hell of a lot simpler?
<ul><li>Three high accuracy results, "no result", a '0' or a '1' detected. The carrier-only method will happily return a bogus result if there is noise energy at the carrier -- if the code method ever claims a '0' or a '1' you can be almost certain it is genuine</li><li>"Fuzzy" robust damage-tolerant signal detection, better performance than a simple threshold comparator</li><li>Automatic bit sync ("bit clock recovery") with almost no chance of wrong sync, sync recaptured each symbol; bit sync for the carrier-only version in high noise is unreliable</li><li>Absolute phase polarity can be recovered from one symbol despite the 180 degree lock uncertainty for BPSK -- you can even lose lock one or more times inside the symbol and the code with absolute phase can still be recovered; the carrier-only concept needs a coding at a higher level to determine absolute recovered phase, in turn needing multiple correct symbols recovered without losing lock</li></ul>
QPSK demodulator / slicer / correlator vs noise2007-09-18T00:00:00+08:00https://warmcat.com/2007/09/18/qpsk-demodulator-slicer-correlator-vs-noise<img src="/of-cats.png" align=left hspace=5>Well, the first cut of the QPSK demodulator, bit slicing and correlation code works, and this stochastic performance graph sums it up.
It shows 1,000 Monte Carlo runs each from 0% to 100% noise on the raw channel, considering correct correlator matches only, the blue line shows the WORST correlator match result at that noise level, the red line is the BEST correlator result seen at that noise level, and the purple line is the mean correlator match result seen at that noise level.
The Green line is the +64 threshold as a reference... below this we don't consider the correlator to have matched, something we chose based from studying the code response to noise a couple of posts ago.
<img src="/corr-demod-vs-noise.png" align=center>
<b>Basically it shows that up to about 20% channel noise there is a very strong probability we return perfect results</b> (+128 result means that no bit errors were present in the recovered correlator code).
<b>After that there is a region up to about 50% noise where perfect results are sometimes seen, normally there is some corruption, but on average we can still recover a corrupted but high probability correctly sync'd code</b> (whether the attached data payload can survive that beating is another issue... we can at least have reasonably correct bit-sync to whatever is there, allowing payload averaging for example).
<b>But after 60% noise, the probability of finding a usable recovered code is less than 1 in 1000.</b>
If you consider the purple mean line as the overall average recovery capability, it's clear that <b>at the moment after 40% noise things stop being much fun</b>.
These figures reflect the whole reception system: at the moment the data slicer is a primitive edge-triggered type thing, pretty sure that is the limiting factor here. When I eyeball the recovered data payload when it is challenged by heavy channel noise, It is often broken in a "sticky" way:
<code><pre>hello magic code
hello ma--------------------------
hello magic code
gllo magic co$------
hello magic code
--------------oo magic code
hello magic code
Yello magic co------7
hello magic code
hello magic cod</pre></code>
I think the sequential bit errors will be harder to create if a more robust bit-slicer is figured out. Here is an idea of what 45% noise in the channel looks like:
<img src="/corr-signal-noise45.png" align=center>
and what happens to that nice clean, digital-looking demodulation when you give it that instead of the nice clean sine waves:
<img src="/corr-demod-noise45.png" align=center>
That last one is the input into the bit slicer, which has the job of choosing where the bit boundaries are... at the moment it looks at zero-crossings, if it were possible to do a more sophisticated job than that a lot more packets could be saved you would think. I have an extremely cool idea about this I will look into next.
Magic Correlator and baseband QPSK2007-09-18T00:00:00+08:00https://warmcat.com/2007/09/18/magic-correlator-and-baseband-qpsk<img src="/hothothot.png" align=left hspace=5>Time for some practical experiments with the robustness of the magic correlator code. Rather than build any hardware, although some interesting RF hardware is available, I decided to first model the system in software, so I can change things around much easier while there is a lot uncharacterized about the performance and capabilities of the coding.
<h3>Testing plan</h3>
The general plan is to bind payload data to the magic correlator code, such that the correlator code alignment acts as both a frame sync and assurance that the recovered bit clock is correct (in fact it should act as a clock recovery synthesis pilot as well, since we know the sequence). Because of the properties of the correlator code, it should be possible to just add FEC-coded payload without further ado, either interleaved bit-by-bit or bound another way.
At the moment I am sending plain payload without FEC. Further I am going to initially do all the testing at baseband directly, in fact specifically at audio frequencies. This will eventually allow real world testing using a laptop "transmitting" the coding through its speakers and my moving another microphone-enabled laptop around the house seeing how far we can push the data recovery vs what I can hear myself. My house it pretty noisy, with cars going by, kids jumping around and so on, there is a good mix of whitish noise and short duration dropout crud in the audio spectrum. I am really interested to see how far it can be pushed, particularly if averaging is possible to get working.
<h3>Modulation strategy</h3>
Currently the test C code modulates the data and the magic correlator code bits together on QPSK. QPSK has four "modulation states" encoded as four phase angles of the carrier, so you are signalling two bits per symbol -- one payload bit and one magic correlator code bit. Eventually, because we can recover even badly damaged correlator code quite often, it will be possible to score the likelihood of false <b>payload</b> recovery bit by bit based on looking back at the magic code bits that turned out to be wrong in a particular symbol: the payload and the coding bits were transmitted in the same symbol. This "distrust" indication per payload bit can open the door to some novel, if possibly expensive, error correction.
The current C code uses 44.1kHz sample rate and a 3kHz carrier modulated with QPSK carrying 300 baud symbols, ie, there are 10 carrier cycles per symbol. Each symbol contains two bits due to the QPSK coding. Maybe as we go on it will be necessary to drop the symbol rate to allow more cycles and better recovery, but this is a starting point. You can listen to the QPSK coded WAV file <a href="/magic-coding-qpsk-3kHz-300baud.wav">here</a>, this has the magic code and a 16 byte ASCII message payload modulated on QPSK. The symbol transitions look like this:
<img src="/corr-tx.png" align=center>
So, the "transmitting" part is fine... don't be fooled by those little spikes where it changes phase, those easily lost spikes are not carrying the information. The whole rest of the bit period carrier goes on at the new phase after that discontiguity, the next ten cycles of carrier phase encodes the information. The spike could and probably should be completely filtered out and not cause trouble on recovery.
<h3>QPSK symbol recovery</h3>
QPSK recovery requires a coherent oscillator in frequency and phase sync with the incoming carrier. You recover the symbols by watching the gyrations the local oscillator has to perform to keep the phase sync.
The RF guys were on to all this stuff back in ancient times, a version of a Costas loop capable of separately dealing the with quadrature "carriers" in QPSK was invented way back when. Its basic idea is to feed the coherent local oscillator to two mixers, one with the local oscillator via a 90 degree phase shift, then lowpass filter the result from the mixers and use that to feed back an error term to the local oscillator. The lowpassed mixer outputs are the "result" from the loop for the two bits encoded in the QPSK modulation.
Now described there and the literature one can find in Google, it's simple, right? Some of the PDFs on QPSK recovery from Google even had beautiful smooth digital recovery pictures. But the actual implementation is a lot trickier. The problems come from the need to tune various filters in the Costas loop, they need to be tuned for the carrier and VCO lock performance considering the symbol rate. I got started by looking at the Costas loop implementation in GnuRadio, but this has no filters at all in it (I guess this is for flexibility you can add the filters outside it). No doubt someone somewhere has been through all this back in 1970 and written up all the equations, but I couldn't find it. In the end I found some general advice about matching the lowpass filter 3db point to half the symbol rate and meddled around until it worked. Of course doing it in software I didn't even have the nightmares of filter matching for the two mixers which bedevil an analogue implementation. I also found a lot of 2 x carrier noise in my loop, which I tried to notch out with some success. Anyway here is what the recovery looks like right now, being fed "perfect" noise-free signal... it looks okay but I am pretty uncertain about how it will react to noise
<img src="/corr-recovery.png" align=center>
<h3>QPSK absolute phase uncertainty</h3>
The receiver locked phase compared to the transmitter 0 degrees phase is unknown, therefore the decoded bits can appear on either of the two output bitstreams and be inverted. This has to be taken into account when looking at what you're getting. Initially at least you have to run the four possible correlations against the magic code to find out the effective phase offset / symbol coding you have locked at. I guess while the receiver does not lose lock (one can study the error term in the Costas loop I guess) you can just use the lock you previously determined.
Incidentally received absolute phase determination is yet another exploitation of the magic correlation code properties of robust matching.
<h3>Next stop</h3>
Next task is to recover the bit clock from the received bits and perform the correlation action, and to try to recover the message. All of will still be in pure software with no noise yet.
AT91RM9200 FIQ FAQ and simple Example code / patch2007-09-17T00:00:00+08:00https://warmcat.com/2007/09/17/at91rm9200-fiq-faq-and-simple-example-code-patch<img src="/parachute.png" align=left hspace=5>One of the coolest features of the AT91RM9200 we have been designing with for a couple of years now is the FIQ, or Fast Interrupt Request. This is basically the NMI of the ARM world. It is a bit difficult to get working (Milosch Meriac helped me with the initial version some years ago) because its job is to interrupt WHATEVER is happening and start running your FIQ handler within 1us or so NO EXCUSES. This is the very very hard end of hard realtime, it does what it claims but it does not respect any privacy that Linux may need as an OS and we will see that needs care.
<a href = "/at91-fiq.patch">Here is a patch against 2.6.20 with the AT91RM9200 patches</a>: it should probably apply okay to later kernels. The patch adds many comments so you probably want to read that and this at the same time. First the things you can't do, which will save you much pain from finding them out yourself.
<h3>Things you can't do with FIQ</h3>
One of the "private times" Linux needs to itself is the virtual memory pagetable management action. It shuts off all interrupts and rewrites the pagetable at intervals, and then goes on as before. That stops driver interrupts coming in and trying to do stuff while the pagetable is empty, or incomplete or just full of garbage.
However FIQ ignores any claim to privacy performed by shutting off normal interrupt response. That means your FIQ ISR code can come in at a "bad time", if it tries to access memory areas mapped through the pagetable it will instead find nothing or the wrong thing or... the end result is that <b>the FIQ ISR cannot touch any memory mapped by vmalloc</b>.
Unfortunately, when a kernel module is loaded, its various memory footprints including the module code are allocated by... yep, vmalloc. That means <b>your FIQ ISR code cannot live in a module, it has to be part of the monolithic kernel</b>.
Finally all sorts of Linux code also wants "private time" or to guard against multiple access to objects by spinlocks or whatever. FIQ ISRs cannot play those games, it comes it in the middle of anything and has to get out quickly again too. So <b>unless it is a simple macro, you can't use any Linux APIs in the FIQ ISR</b>.
<h3>Things you can do with FIQ</h3>
Well reading those constraints, you're probably wondering if it is still useful. It sure is!
You can touch the memory-mapped IO in the chip using the AT91RM9200 APIs.
FIQ has the super power it will run your ISR within ~1us NO EXCUSES. That means you can rely on the ISR code to act like hardware, you trigger it and 1us later your programmed sequence occurs without fail. In turn that means FIQ is perfect for many hardware interfacing tasks, in particular management of low latency (small buffer) PDC DMA setup.
Low latency for audio traffic for example is highly desirable, but of course if there are ANY delays setting up the next PDC DMA, you get dropouts and clicks. If you allow the FIQ to handle generation of samples and management of PDC DMA, there won't be any delays for sure, you will have perfect audio.
We found that the AT91RM9200 at 180MHz can easily handle 8kHz FIQs (a common rate for telephony) with an ISR duration of ~8us, without affecting the Ethernet or USB performance.
<h3>IPC between the FIQ and kernel worlds</h3>
The general communication for FIQ ISRs with the "real world" in the patch is to define a struct in include/asm-arm/arch-at91rm9200/at91rm9200_fiq_ipc_type.h that contains all of the data that is shared between FIQ and normal kernel code. The example one looks like this:
<code><pre>struct at91rm9200_fiq_ipc {
int nCountFiqEvents;
};</pre></code>
One of these structs is defined in the main part of the patch code in arch/arm/mach-at91rm9200/at91_fiq.c like this
<code>struct at91rm9200_fiq_ipc at91rm9200_fiq_ipc;
EXPORT_SYMBOL(at91rm9200_fiq_ipc);</code>
That means in your other kernel code -- which can be in a module, only the FIQ ISR must be in the kernel -- you can have
<code><pre>#include <arch/at91rm9200_fiq_ipc_type.h>
extern struct at91rm9200_fiq_ipc at91rm9200_fiq_ipc;</pre></code>
and use the same struct to communicate with the FIQ ISR.
<h3>How to customize summary</h3>
1) Change struct at91rm9200_fiq_ipc in include/asm-arm/arch-at91rm9200/at91rm9200_fiq_ipc_type.h to have the data types you need
2) Add your FIQ ISR code to arch/arm/mach-at91rm9200/at91_fiq.c where it says "your C code goes here"
3) Import extern struct at91rm9200_fiq_ipc at91rm9200_fiq_ipc; in your own kernel module and communicate with the FIQ ISR using that.
<h3>FIQ shadowing with IRQ</h3>
Ultimately the FIQ actions are going to need to interface to Linux kernel objects sooner or later, perhaps there has to be some locking or blocking action for usermode access. But we are banned from using Linux APIs in the FIQ ISR.
A powerful solution is to physically tie the FIQ signal to an IRQ input additionally. Code that is REALLY hard realtime, like the PDC DMA management, goes in FIQ, and a count of FIQs is kept. Data out of FIQ can go into a software FIFO. The less reliable IRQ watches the count of FIQs and compares it to its own count of IRQs, if it sees it has blacked out and has fallen behind, it will loop up to a certain number of times "catching up", using the data placed by FIQ in the FIFO.
In this way, by splitting the code into "no excuses" realtime in the FIQ ISR, and "reliable only on average" realtime in the IRQ ISR, it is possible to bind actions in the FIQ to code in the ISR which can execute Linux APIs, and have the best of both worlds.
Magic correlator code analysis2007-09-12T00:00:00+08:00https://warmcat.com/2007/09/12/magic-correlator-code-analysis<img src="/waitress.png" align=left hspace=5>Intrigued by the magic correlator possibilities, I wrote some code to simulate a proper worst-case Monte Carlo analysis of the performance vs noise, with fascinating results. (Although I tried to choose reasonably large number of random runs considering the CPU time needed, please bear in mind the numbers identified in the rest of this are only as accurate as the number of runs allows.)
<h3>False indication rejection when given only noise</h3>
What about the reaction to having no signal at all... can it tell there is no transmission or does it falsely detect correlation? What is the highest false correlation result seen when challenged with noise? Here is the distribution of highest correlation results for 50 Million runs feeding it only white-ish binary noise with no correlation sequence component.
<img src="/corr-noise-dist.png" align=center>
The largest false response seen even once in the runs is +58, out of a full-scale match with a 0% bit-error rate of +128: one can put it that the probability of seeing a match better than +58 from noise is something greater than 1 in 50M. So we learn from this we can't trust any correlation result lower than, say, +64, to allow some margin. (This +64 requirement is shown with a blue line in the following graphs).
<h3>Random bit-error rate response</h3>
In this graph I ran the self-correlation 10,000 times per offset with different noise each time, and picks the worst (lowest) correct "position 0" sync match value (red) and plots it against the best (highest) wrong offset match value (green) in absolute match quality. The thin blue line shows the absolute correlation value of +64 we selected based on the first graph.
On the left where there is no noise, we can tell the correct sync by a wide margin. Where the red line crosses the green, at around 0.2 bit-error probability, it means the correct sync position can no longer be distinguished from a false match. But before then, the absolute correlation value for the correct offset has fallen below our +64 limit (selected because noise can create a +58 result) so detection is lost first at a 0.12 ber.
<img src="/corr-noise.png" align=center>
Here is a plot of the ranking of the correct offset vs all of the other offsets. I expected the correct one to start at #1 and then slip down the rankings, but instead it starts at #1 and falls right to the bottom when it can't be selected as #1 any more.
<img src="/corr-ranking.png" align=center>
What it means is that up to around 0.12 - 0.15 ber (equates to 15 - 19 randomly selected flipped bits of the 128 in the pattern) you can detect the pattern VERY reliably. Any higher ber - with randomly selected bit errors - and your probability of detecting the pattern is very low.
<h3>Multibit dropout tolerance</h3>
From my WiFi work I know that a common failure mode in RF packets is a multibit continuous dropout, that's different from the random bit errors introduced above. These graphs show the effect on worst correct offset margin from dropouts of all possible lengths randomly placed in the packet, where the dropout is filled with white noise, all zeroes or all ones.
<img src="/corr-drop-white.png" align=center>
<img src="/corr-drop-0.png" align=center>
<img src="/corr-drop-1.png" align=center>
Clearly it is beautifully insensitive to multibit contiguous dropouts. If the problem is that you have white noise crapping on the transmission, the loss of 39 contiguous bits can be sustained without dropping below the +64 result limit. If the problem is events that cause continuous static 1 or 0 to be read during the disturbance, the code is <b>very insensitive</b> to this and can still be detected with fully half of the bits sequentially zero'd out or up to 50 set to '1'. So the sync detection performance faced with contiguous dropouts actually exceeds that of random dropouts.
This last dropout graph shows performance when there are TWO dropped-out areas randomly (5,000 runs at each dropout length) placed in the packet at various dropout lengths (the dropout length is the same for both and they can overlap, explaining the noise at the end as they grow larger).
<img src="/corr-drop-dual.png" align=center>
Again looking at the absolute result values for the graph (blue line) the optimal absolute result cutoff of +64 is seen at two blocks of 18 contiguous bits contaminated with noise. These are very severe insults that still allow a correct sync detection.
<h3>Conclusion</h3>
This means (to the accuracy of these simulations) if you draw a line at 15% bit-error rate, <b>if you ever see any offset of the correlator giving an absolute result of +64 or better, there is a very high probability that:
<ul><li>there is a genuine transmission in progress</li><li>the offset reporting that result is the correct sync offset, and <li>your bit-error rate is 15% or less</li></ul>Conversely if no correlator offset gives +64 or better:<ul><li>the bit-error rate is higher than 15%, or</li><li> there is no transmission</li></ul></b>
This is a very robust correlator pattern! It can be improved further: at the moment the "score" for correlation adds 1 for a matched binary bit level and subtracts 1 for a binary mismatch. If the demodulator that is providing these bits gives a probability of a '1' or a '0' instead of a binary '1' or '0', then the result can be made from more information. A few "looks a bit like a 0" inputs will more weakly override many "definitely a 1" inputs, for example.
There is another great advantage to interleaving this pattern with the payload. If the sync pattern can be recovered considering the 15% bit-error rate that is allowed, it is possible to identify then which bits of the pattern were corrupted. Because the correlator code bits are interleaved with the payload, it suggests that if the payload is broken, that the problem is coming from the payload bits next to the known-bad correlator code bits. For example, if it is shown that say three contiguous bits of the correlator code channel are wrong, one has to wonder about the two payload bits that are inbetween them. If there are a small number of bits involved, it can be possible to "fuzz" the suspected bad payload bits to see if an otherwise unrecoverable ECC error can be solved.
One more advantage is that the robustness margin of 15% allows channel bit-error rate to be continually assessed during reception.
Autocorrelation code and weak signal recovery2007-09-12T00:00:00+08:00https://warmcat.com/2007/09/12/autocorrelation-code-and-weak-signal-recovery<img src="/grin.png" align=left hspace=5>Looking at weak signal capture at the moment, there has been considerable work done on this by Radio hams. The extreme cases for these guys are bouncing signals off the moon or meteors to reach other places on the planet. The most recent protocol I could find is called <a href="http://www.arrl.org/FandES/field/regulations/techchar/18JT65.pdf">JT65</a>, and it makes some pretty extraordinary claims for data recovery: 100% recovery at -27dB SNR, ie, the noise floor is 27dB above the signal. Unfortunately it seems the author of this otherwise cool and interesting protocol took it a step too far, and used <a href="http://www.sm2cew.com/jt65.html">"forbidden Black Magic"</a> in his implementation to get results at that level.
However removing the black magic the claim of 100% recovery at -22dB SNR using another "forbidden" but less magical technology is not being disputed. This is a patent-encumbered "soft" Reed-Solomon decoder which is able to recover from more damage faster than the normal "hard decision" decoder: this means you have to give up another few dB to get a distributable implementation. An open source implementation exists at <a href="http://developer.berlios.de/projects/wsjt/">berlios</a> but it's written in freaking Fortran. Multithreaded Fortran with a Python GUI. This provides a normal Reed-Solomon FEC implementation which is used if you don't have the external forbidden one.
One awfully limiting "trick" and two really interesting techniques are used in the protocol. The bad news is that very very long symbol-times are used for transmission, 372ms per 6-bit symbol. Considering the various bloatages it's about one byte per two seconds. They are sending one of 64 "tones" to encode the six bits during that time... obviously the symbol duration helps with recovery. This "trick" is the core feature of weak signal recovery... repeat what you are doing a lot, in this case repeat the "tone" cycles a lot to "amplify" the signal at a receiver which knows how to take advantage of looking for something happening multiple times to increase probability of detection.
The first interesting trick is just the amount of Reed-Solomon used... this is not new to me since I used it as part of Penumbra. But in this protocol, every 72-bit packet has an additional 306 bits of error correction attached to it :-O. That's more than 4 times as much ECC as data, and despite that it still pays off for capturing the signal.
The second cool technique is to interleave the payload data with a binary autocorrelation "clock". Since the noise level is so crazy, it's of little use to expect a 1-bit channel in the data to be usable as a "start of frame" marker or somesuch as you would normally expect with digital serialized communication. Instead, they spread the sync information in this interleaved "channel" using a 126-bit sequence which has a magically cool property... if you autocorrelate the sequence with itself, even in the presence of a fair bit of noise, every correlation offset except the right one matches MUCH worse than the 1:1 lineup. Here is the sequence extended to 128 bits and correlating with itself. The y axis is the number of bits that match.... obviously that is 128 when it compares itself to itself at the 0 offset on the X axis. The cool part is how low the self-correlation is everywhere else, no better than 20, or a 14dB "SNR" between a match and a non-match.
<img src="/ac0.png" align=center hspace=5>
This remains the case even under pretty bad noise, up to 25% of the bits being trashed (still 9dB sync SNR):
<img src="/ac25.png" align=center hspace=5>
but at 30% of the bits being trashed, the performance falls off a cliff:
<img src="/ac30.png" align=center hspace=5>
Not only does the noise floor rise due to falsely improved correlations, but the one true correlation is also falsely degraded. After about 28% bit errors the reliability is gone. (Note the noise is one-shot with the test program, rather than being Monte Carlo'd, but I ran it several times and the graphs shown are representative).
But that isn't the end of the story for this code. First the correlation action is a filter for transmission presence all by itself. And if you detect the transmission by the presence of the correlation code, you have also sync'd the receiver to the transmitted frame, since the correlation bits are interleaved with the actual data and the "0" offset marks the start of the frame.
With deep memory and a known period of retransmission from the source, temporally averaged autocorrelation can take place to increase the chances to find the presence of a transmitter and to sync up to its data. After a transmitter "sync" has been found in the averaged data with high probability, the averaging memory can be turned to only store the times when a transmission was expected from the known schedule of the transmitter.
Here is the magic code with the 128-bit sequence and the test loops
<code><pre>
#include <stdio.h>
static unsigned int u8Auto[] = {
0x19, 0xbf, 0xa2, 0x89, 0xf3, 0xf6, 0x58, 0xcd,
0x2a, 0x81, 0x01, 0x4b, 0xab, 0x4c, 0xc2, 0xbf
};
#define AC_LEN 128
char GetAc(int n)
{
n = n & (AC_LEN - 1);
return (u8Auto[n >> 3] >> (n & 7)) & 1;
}
int main(int argc, char ** argv)
{
int n, n1;
int nSum;
int nNoise = 0;
int nSeed;
FILE *f = fopen("/dev/urandom", "r");
fread(&nSeed, sizeof(nSeed), 1, f);
fclose(f);
srand(nSeed);
if (argc == 2)
nNoise = (1024 * atoi(argv[1])) / 100;
fprintf(stderr, "Noise: %d%%\n", nNoise);
for (n = -(AC_LEN - 1); n < AC_LEN; n++) {
nSum = 0;
for (n1 = 0; n1 < AC_LEN; n1++) {
char c = GetAc(n + n1);
/* simulate white noise */
if ((rand()&1023) < nNoise)
c = c ^ 1;
if (GetAc(n1) == c)
nSum++;
else
nSum--;
}
printf("%d %d ", n, nSum);
}
return 0;
}</pre></code>
and the graph command that generated the graphs (the 28 is the percentage of noise to graph)
<code>gcc test.c -o test ; ./test 28 | graph -Tpng --bitmap-size 1200x1200 -FHersheySans>temp.png && convert temp.png -scale 300x300 png:temp1.png</code>
Embedded procmail and dovecot2007-09-06T00:00:00+08:00https://warmcat.com/2007/09/06/embedded-procmail-and-dovecot<img src="/spiral.png" hspace=5 align=left>For over a year I have been using an 32MB ARM9-based board I designed with a 1GB USB stick as my mailserver. It is powered from a USB port on my firewall box and takes around 1W.
I use our <a href="http://octotux.com">Octotux Linux distro</a> with Postfix as the MTA, gps for the greylisting and Dovecot IMAP to provide secure access to the mailstore over SSL. This has worked out really well, the warmcat.com MX record points directly to the external IP here, and the firewall box port-forwards port 25 to the embedded device. It's silent and runs cold 24 hours a day and has never missed a beat.
A couple of weeks ago I had to look at the box again because the greylisting software was hanging. I discovered that we were being bombarded with spam, one new spam every two seconds on average, from all over the world. I adjusted the ordering of the filtering in postfix to first reject on an unknown username, that stopped so many concurrent gps sessions being needed. The server weathered that storm and the spam people gave up a few days later without getting a single one through. (They were also targeting the warmcat.com A record IP, I suppose in case it was a backup for the real MX, but they had zero luck with that either).
However it reminded me of the one inadequacy of this mailserver... when you wake up your laptop in the morning, thunderbird takes ages to run all the filters and move the new mails remotely into the right IMAP folders. That's pretty annoying when you see the titles of mails you want to read but the USB stick on the server is maxxed out for a couple of minutes sorting eight hours worth of new emails into folders on the server. I have been pondering changing the box to one with USB2 High Speed, but it occurred to me that otherwise, the existing USB 1.1 "Full Speed", that is, 12Mbps is completely adequate. Changing folders and moving to other emails in thunderbird is snappy. It's just the client mail filters under the load of 500 mails in the morning. So I decided to port procmail to ARM9 Octotux, in effect to do the folder sorting as each mail came in, so there would no longer be any processing done at the client for that.
<!--more-->
Procmail was a little bit of a beast, the sources look horrible and it uses a nonstandard autotools script, which of course does not support crosscompile. After a couple of hours I got it to build nicely in an Octotux RPM (Octotux is entirely crosscompiled RPM-based). Not having used it before, I was surprised to see there was no configuration down /etc in the package, just four small binaries in /usr/bin and some manuals (which I broke out into procmail-docs package so as not to waste space on the target). The binary package only came to 60K all told.
The first move was to direct postfix to deliver not to the maildir as before but through procmail. This simply involved editing /etc/postfix/main.cf and adding this
<code>mailbox_command = /usr/bin/procmail</code>
After some googling I found that procmail is going to look at the unix user's homedir that the mail is directed to, for a file ~/.procmailrc in order to find out what it should do. Now my embdedded mailserver is setup to use Dovecot IMAP with the mailstore symlinked to the USB stick. So my user "andy" on the embedded mailserver has this in the home dir
<code># ll /home/andy
lrwxrwxrwx 1 root root 25 Jan 17 2007 Maildir -> /media/usbstick/mail/andy</code>
I found looking around in the Dovecot Maildir structure in there that the mail needs to be delivered into /home/andy/Maildir/.(foldername)/new/ in order to get Dovecot to understand that it was given a new mail in a folder. The resulting /home/andy/.procmailrc looks like this:
<code>SHELL=/bin/sh
PATH="/usr/bin:/usr/local/bin:/bin:/sbin:/usr/sbin"
LOCKFILE=/home/andy/lockfile.loc
DEFAULT=/home/andy/Maildir/
BITBUCKET=/dev/null
LOCKTIMEOUT=10
LOGFILE=/tmp/procmail_log
LOGABSTRACT=no
VERBOSE=no
:0:
* ^List-Id: For users of Fedora <fedora-list.redhat.com>
$DEFAULT/.fedora-list/new
:0:
* ^List-Id: Development discussions related to Fedora
$DEFAULT/.fedora-devel-list/new</code>
You simply repeat the last stanzas with some unique identifying header part of mails that are to be filed in a given folder, and give the destination folder name preceded by a '.' and followed by /new as shown. You don't need to run anything or restart anything after making changes here, they will be immediately used on the next email that is delivered. If your folder lives in a hierarchy, the folder on disk looks like, for example, .INBOX.CentOS for a folder CentOS that appears as a child of INBOX. The stanza for that would look like
<code>:0:
* ^List-Id: CentOS mailing list <centos.centos.org>
$DEFAULT/.INBOX.CentOS/new</code>
You can also direct procmail to use a central procmailrc file, presumably in /etc, by giving the path to it on the procmail invocation line in /etc/postfix/main.cf. In my case I only take mails for myself, so I stuck with the per-user ~/.procmailrc.
No changes were needed anywhere else in the server setup, all I had to do was turn off all my thunderbird filters and enjoy immediate access to the emails via Dovecot IMAP in the morning, without upgrading any hardware.
selinux magic for gitweb2007-09-05T00:00:00+08:00https://warmcat.com/2007/09/05/selinux-magic-for-gitweb<img src="/elbow.png" align=left hspace=5>The last remaining problem for the F7 upgrade was a conflict between getweb cgi and selinux. I fixed it by allowing the transgression that was reported in the log. There is quite a bit of conflicting information on the web for how to make a local policy change.
First I found out what would allow the action that was being defeated using audit2allow
<code><strong># echo "avc: denied { read } for pid=3736 comm="gitweb.cgi" name="cgi-bin" dev=md7 ino=5079272 scontext=system_u:system_r:httpd_sys_script_t:s0 tcontext=system_u:object_r:httpd_sys_script_exec_t:s0 tclass=dir" | audit2allow</strong>
#============= httpd_sys_script_t ==============
allow httpd_sys_script_t httpd_sys_script_exec_t:dir read;
</code>
Basically the gitweb cgi calls some perl that does the equivalent of getcwd(), and this was being disallowed. The advice that was correct for setting local policy on F7 was found <a href="http://docs.fedoraproject.org/selinux-faq-fc5/">here</a>. In short I did
<code># <strong>mkdir /root/tmp; cd /root/tmp</strong>
# <strong>touch local.te local.if local.fc</strong>
# <strong>yum install selinux-policy-devel</strong>
# <strong>vi local.te</strong>
policy_module(local, 1.0)
require {
attribute httpdcontent;
type httpd_sys_script_t;
type httpd_sys_script_exec_t;
}
allow httpd_sys_script_t httpd_sys_script_exec_t:dir read;
# <strong>make -f /usr/share/selinux/devel/Makefile</strong>
# <strong>semodule -i local.pp</strong></code>
Immediately after doing this gitweb was back working normally again.
Forcing 1&1 to make F72007-09-05T00:00:00+08:00https://warmcat.com/2007/09/05/forcing-11-to-make-f7<img src="/running-man.png" align=left>The new server at 1&1 has been showing signs of unreliability, it has crashed and died mysteriously three times, the last while I was away for a couple of days. Late at night when I got back, I decided it was time to actually make it into a Fedora box with a kernel later than 2.6.16 and to get rid of the xfs-formatted partitions, which I suspect of causing the instability. So here are my notes on how to force-upgrade the weird FC4-based OS on those boxes to fully true Fedora 7, grub, ext3 and selinux. The notes might not be complete, but they contain all the major steps and will be useful for anyone contemplating changing their server over to "Genuine Fedora".
<strong>Don't embark on this unless you have some Linux-fu and know how to get yourself out of trouble, because at every step you can easily trash your server and lose all your data. We are literally going to format the main filesystems and install a new bootloader on a remote server... we can call that "not a beginner project".</strong>
<!--more-->
<h3>Sanity check that we have the same layout</h3>
Zero'th move is a sanity check. Confirm that your 1&1 server still has the same Raid 1 layout (the result of fdisk /dev/sd[a-b] and pressing p):
<code>Disk /dev/sda: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 123 987966 fd Linux raid autodetect
/dev/sda2 124 367 1959930 82 Linux swap / Solaris
/dev/sda4 368 19457 153340425 5 Extended
/dev/sda5 368 976 4891761 fd Linux raid autodetect
/dev/sda6 977 1585 4891761 fd Linux raid autodetect
/dev/sda7 1586 19457 143556808+ fd Linux raid autodetect
Disk /dev/sdb: 160.0 GB, 160041885696 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 123 987966 fd Linux raid autodetect
/dev/sdb2 124 367 1959930 82 Linux swap / Solaris
/dev/sdb4 368 19457 153340425 5 Extended
/dev/sdb5 368 976 4891761 fd Linux raid autodetect
/dev/sdb6 977 1585 4891761 fd Linux raid autodetect
/dev/sdb7 1586 19457 143556808+ fd Linux raid autodetect
# mount
/dev/md1 on / type ext3 (rw)
none on /proc type proc (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md5 on /usr type xfs (rw)
/dev/md7 on /var type xfs (rw,usrquota)
none on /tmp type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
</code>
If that is the same you are in with a chance with these instructions.
<h3>Create missing raid config file</h3>
First create /etc/mdadm.conf ... NOT /etc/mdadm/mdadm.conf
<code>ARRAY /dev/md5 devices=/dev/sda5,/dev/sdb5
ARRAY /dev/md6 devices=/dev/sda6,/dev/sdb6
ARRAY /dev/md7 devices=/dev/sda7,/dev/sdb7
ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1</code>
Grub will need this later and mkinitrd in the next step for the new kernel will use it also.
<h3>Dist-upgrade to F7</h3>
Next move is to dist-upgrade to F7 from inside the mutant FC4 environment.
<code># <strong>wget http://www.mirrorservice.org/sites/download.fedora.\
redhat.com/pub/fedora/linux/releases/7/Fedora/i386/os/\
Fedora/fedora-release-7-3.noarch.rpm</strong>
# <strong>rpm -Uvf --nodeps fedora-release-7-3.noarch.rpm</strong>
# <strong>yum clean all</strong>
# <strong>yum update yum</strong>
# <strong>yum update</strong>
# <strong>yum install grub</strong></code>
This package contains the yum repo set down /etc/yum.repos.d configured for Fedora 7 repos. So the next yum update you do is going to "update" you to F7 since all the packages it can see are now of F7 vintage. However, on reboot, you are going to come back up in the old mutant 2.6.16 Debian kernel. If you try to use the Fedora kernel, you will have immediate deadly trouble with lilo and xfs support. Nice "Fedora" they got there, with freaking lilo.
<h3>Reformatting all xfs into ext3</h3>
Next move is to get rid of this xfs-format crud and replace it with ext3. I started out doing it inside the normal boot environment for /usr, and then was forced to change to using the rescue environment to do /var, so I will describe both ways as I did them.
I logged into the serial console from my local machine here using
<code>$ <strong>ssh (magic userid)@sercon.onlinehome-server.info</strong></code>
and gave my "original root password". You can find this and the (magic userid) in your 1&1 control panel under "serial console"
Then I logged into the box there using my real root credentials, and did
<code># <strong>telinit 1</strong></code>
this will kill all your networking and disable all services. Confirm everything went down with ps -Af and kill anything that is still up, except your sh session.
Now we get rid of any junk in the yum cache and then backup /var
<code>sh-3.1# <strong>yum clean all</strong>
sh-3.1# <strong>tar czf /usr/backup-var.tar.gz /var</strong>
sh-3.1# <strong>umount /var</strong></code>
At this point, the 2.6.16 weirdo kernel blew a warning, which I ignored because that XFS formatted filesystem is about to get a well-deserved deletion
<code>Badness in mutex_destroy at kernel/mutex-debug.c:458
Call Trace: <ffffffff8014557f>{mutex_destroy+109} <ffffffff802593f4>{xfs_qm_destroy+140}
<ffffffff802594ed>{xfs_qm_rele_quotafs_ref+165} <ffffffff8025a265>{xfs_qm_destroy_quotainfo+18}
<ffffffff80298c4a>{xfs_mount_free+160} <ffffffff80299feb>{xfs_unmountfs+171}
<ffffffff8029f4a3>{xfs_unmount+275} <ffffffff802af365>{vfs_unmount+34}
<ffffffff802aee50>{linvfs_put_super+49} <ffffffff801755c7>{generic_shutdown_super+153}
<ffffffff801760b1>{kill_block_super+36} <ffffffff80175495>{deactivate_super+103}
<ffffffff80188952>{sys_umount+111} <ffffffff8010b4f1>{error_exit+0}
<ffffffff8010a85e>{system_call+126}</code>
Oh well, let's destroy the /var filesystem by rewriting it ext3.
<code>sh-3.1# <strong>mkfs.ext3 /dev/md7</strong>
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
17956864 inodes, 35889184 blocks
1794459 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1096 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:
done
This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.</code>
Edit <strong>/etc/fstab</strong> to reflect our change (shows from -> to)
<code>/dev/md7 /var xfs defaults,usrquota 0 2
<strong>/dev/md7 /var ext3 defaults 0 2</strong></code>
Okay now we mount our clean empty ext3 /var back in place
<code>sh-3.1# <strong>mount /var</strong>
kjournald starting. Commit interval 5 seconds
EXT3 FS on md7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
sh-3.1# <strong>ll /var</strong>
total 16
drwx------ 2 root root 16384 Sep 4 20:49 lost+found
sh-3.1# <strong>cd /</strong>
sh-3.1# <strong>tar zxf /usr/backup-var.tar.gz</strong>
sh-3.1# <strong>rm /usr/backup-var.tar.gz</strong></code>
At this point /var is back how it was, but it is now on ext3. Now we backup /usr and /home into our nice new /var
<code>sh-3.1# <strong>rsync -a /usr /var</strong>
sh-3.1# <strong>rsync -a /home /var</strong></code>
Now unfortunately I was unable to get /usr into a state that I could umount it cleanly... the sh had handles open to /usr/lib/ libraries. So I had to use a different technique to reformat /usr and /home in place.
Go to your 1&1 control panel and select the "recovery tool" option. Make sure "reboot now" is unchecked, and select "Linux Rescue System (debian/woody - 2.6.x) ". Confirm it and you will get a one-time login password generated for the rescue system. Wait a couple of minutes and then
<code>sh-3.1# <strong>shutdown -r now</strong></code>
your server. When it reboots, it will come up in the recovery system, which is a network boot with none of your local partitions mounted... this is most excellent and a really powerful solution for the kind of work we are doing on this server. However, save yourself some time and go back to the "recovery tool" page now, and again with "reboot now" unchecked, select "Normal System" again and confirm it. Otherwise you keep coming back into the rescue system in future boots too.
Next we reformat the /usr partition, /dev/md5
<code>rescue:~# <strong>mkfs.ext3 /dev/md5</strong>
mke2fs 1.40-WIP (14-Nov-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
611648 inodes, 1222912 blocks
61145 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=1254096896
38 block groups
32768 blocks per group, 32768 fragments per group
16096 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.</code>
Let's mount our new, empty ext3 /usr partition at /mnt in the rescue filesystem
<code>rescue:~# <strong>mount /dev/md5 /mnt</strong>
kjournald starting. Commit interval 5 seconds
EXT3 FS on md5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.</code>
and we can mount our ext3 /var partition we made earlier at /opt
<code>rescue:~# <strong>mount /dev/md7 /opt</strong>
kjournald starting. Commit interval 5 seconds
EXT3 FS on md7, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.</code>
Restore the contents of /usr from the copy we made in /var, and nuke the redundant copy: finally unmount the new, filled /usr
<code>rescue:~# <strong>rsync -a /opt/usr/* /mnt</strong>
rescue:~# <strong>rm -rf /opt/usr</strong>
rescue:~# <strong>umount /mnt</strong></code>
Do the same for /home
<code>rescue:~# <strong>mkfs.ext3 /dev/md6</strong>
rescue:~# <strong>mount /dev/md6 /mnt</strong>
rescue:~# <strong>rsync -a /opt/home/* /mnt</strong>
rescue:~# <strong>rm -rf /opt/home</strong>
rescue:~# <strong>umount /mnt</strong></code>
Alright, xfs is GONE, everything is ext3 and has its old content back in it :-)
Let's get our / filesystem mounted at /mnt now and update our fstab to record the demise of xfs
<code>rescue:~# <strong>mount /dev/md1 /mnt</strong></code>
and edit <strong>/mnt</strong>/etc/fstab (notice the /mnt!) to reflect that we got rid of xfs on /usr and /home and replaced it with ext3 (from -> to again)
<code>/dev/md5 /usr xfs defaults 0 2
<strong>/dev/md5 /usr ext3 defaults 0 2</strong>
/dev/md6 /home xfs defaults,userquota 0 2
<strong>/dev/md6 /home ext3 defaults 0 2</strong>
</code>
Alright, the filesystem messing is done.
<h3>Goodbye lilo</h3>
Next job is to expunge lilo and replace it with grub.
<code>rescue:~# <strong>grub-install --no-floppy /dev/sda</strong>
rescue:~# <strong>grub-install --no-floppy /dev/sdb</strong></code>
You need to change the bogus /boot/grub/grub.conf that 1&1 mislead you with, into this (change the kernel and initrd version number to what you actually have from your f7 kernel in /mnt/boot ).
<code>rescue:~# <strong>vi /mnt/boot/grub/grub.conf</strong>
serial --unit=0 --speed=57600 --word=8 --parity=no --stop=1
terminal --timeout=5 console serial
default=0
timeout=10
title Normal Fedora
root (hd0,0)
kernel /boot/vmlinuz-2.6.22.2-42.fc6 ro root=/dev/md1 console=ttyS0,57600 panic=30
initrd /boot/initrd-2.6.22.2-42.fc6.img</code>
<h3>Reboot into F7 with F7 kernel</h3>
Finally we should umount and reboot into your new Fedora kernel
<code>rescue:~# <strong>umount /mnt</strong>
rescue:~# <strong>umount /opt</strong>
rescue:~# <strong>reboot</strong></code>
On reboot you should see a grub menu on the serial console which will timeout and boot you (all being well).
<h3>Adding selinux</h3>
The bogus FC4 install from 1&1 did not include any selinux. This is a pretty bad omission, and we can fix it now. Edit /etc/sysconfig/selinux and set initially to be
<code>SELINUX=permissive</code>
Then
<code># <strong>touch /.autorelabel</strong>
# <strong>reboot</strong></code>
This will cause the initscripts to relabel all your filesystems according to your Fedora 7 policy. It will reboot automatically after doing this, when it comes back up selinux will be working in a "firing blanks" mode. It just reports any errors and lets the access occur anyway. You can judge from this what will break when you enable it properly. In my case there were three areas that were broken, first one user has ~/public_html, it needed to be enabled in selinux and then marked up as okay to serve by httpd
<code># <strong>setsebool -P httpd_enable_homedirs 1</strong>
# <strong>chcon -R -t httpd_sys_content_t /home/user/public_html</strong></code>
Second for some reason named couldn't start because it wasn't allowed to write its own pid in the chroot, I worked around it with this
<code># <strong>setsebool -P named_write_master_zones on</strong></code>
The third problem was gitweb, I am asking about it on the selinux mailing list and will update when I have a resolution. UPDATE 2007-09-05: No response from the fedora-selinux ml, I resolved it myself as <a href="http://warmcat.com/_wp/?p=36">described here</a>.
When you are sure that any remaining avcs (you can find them in /var/log/messages) are trivial or there are no more being generated, you can properly turn selinux on by editing /etc/sysconfig/selinux again and this time setting
<code>SELINUX=enforcing</code>
and rebooting.
<h3>Conclusion</h3>
Hopefully I recorded everything that was needed to convert the craptasic bogus unsafe FC4 install in 1&1 servers to clean and true Fedora 7. Certainly being able to yum update kernels as usual is a major step forward, getting you 2.6.22 (in fact I installed a Fedora development repo kernel which is 2.6.23-rc5) from the original non-Fedora 2.6.16. And it's crazy to not have selinux when it's provided by normal Fedora.
It's Fedora, Jim: but not as we know it2007-07-30T00:00:00+08:00https://warmcat.com/2007/07/30/its-fedora-jim-but-not-as-we-know-it<img src="/noodle-girl.png" align=left>Pretty strange version of Fedora running on 1&1 dedicated Linux servers.
First it is FC4, which is out of security update coverage, and Fedora Legacy has gone away too. I update it to FC6 via yum (worried about the libata change in the F7 kernels making it unbootable... needn't've worried since I can make it unbootable all by myself). After the update the /boot/grub/grub.conf looks a bit strange, grubby did not make an entry for the FC6 kernel so I add it by hand.
On reboot, it ignores the new kernel and boots the old one. Further digging reveals that it is set up to use LILO, not grub. They provide and cook their own 2.6.17 kernel which was built on a Debian box and does not use an initrd: it has all the drivers it needs built into the monolithic kernel. Hm.
<!--more-->
I google through pages from 1998 to learn about LILO, I make a mistake: I saw they had a symlink /boot/vmlinuz to point to the kernel they boot from, so I changed the symlink, reran lilo and rebooted... it doesn't come back up. Now 1&1 have a cool serial console server concept, you can ssh into their central site with your per-server credentials and you are looking at your server's serial console. From this I see it can't find the root filesystem. Well no problem, I will choose one of the backup lilo.conf configurations at the prompt, right? Nope, they all rely on the same vmlinuz symlink I changed.
So at this point after an hour or so of having the shiny new server, it is borked. However 1&1 offer a free network boot recovery feature you can select from the web page: I did this and came up in a Debian recovery boot presumably over PXE. From there I could mount my root fs /dev/md1 on /mnt and undo my kernel symlink change, and so got back a working system. Whatever else, that is a pretty robust setup, I could trash the thing into an unbootable state and recover it all by myself without any tech support or even having to wait. Great!
However considering they advertise it as a "Fedora" system, aside from not being able to use Fedora kernels, the jarring strangeness continued. There is no selinux set up. This is pretty bad considering the support is everywhere in Fedora for it and it doesn't cause any trouble nowadays. Nor is it possible to enable selinux simply: because the partition-happy Debian admins that set it up decided to format some (not all) of the partitions as xfs.
There is no firewall enabled... all of your entrails like network MySQL access are hanging out for the world to see. I installed system-config-securitylevel and had it set up a bare Fedora-style firewall on top of which I copied over my long, long (and growing) list of DROP netblocks.
Some evil and perverse web admin stuff was on by default, dozens of PHP apps, that involved redirecting your mailserver log to somewhere crazy on /usr. This seems like an invitation for bad things to happen, so I tore them all out with yum remove.
After some hours all of the virtual hosts on Apache were back up except yahoeuvre, which was creating problems in the error logs and not working properly. Since it has been deprecated for a long while due to Yahoo format changes, I didn't bother fixing it and set it to redirect here instead.
However, I am left wondering... is it fair to call that... well, "heavily customized" Debian-Fedora hybrid OS "Fedora"? The Fedora kernel does have an xfs module, but they don't allow to format stuff xfs in Anaconda so it's "not really supported". They provide great admin tools though, not the PHP garbage but the serial console server and the recovery netboot are fantastic remote server admin powers: really allowing you to get out of jail when you need to. Maybe it will be possible to come up in the recovery console, copy out the contents of the xfs partitions somewhere and reformat them ext3 and gradually convert the thing to "proper Fedora".
EDIT 2007-09-05: In fact I have now converted this 1&1 server to "proper Fedora 7", see <a href="http://warmcat.com/_wp/?p=35">this post for details</a>.
mac80211 Injection patches accepted in Linus git tree2007-07-16T00:00:00+08:00https://warmcat.com/2007/07/16/mac80211-injection-patches-accepted-in-linus-git-tree<img src="http://warmcat.com/driving.png" alt="ultracompact car" align=left valign=top />After getting on for four months, my mac80211 injection patches have been accepted by the powers-that-be and have made their way into the Linus git repo, the crucible from which vanilla kernel versions are forged, and the upstream from which all major distros are ultimately basing their kernels on. (Edit: they are present in 2.6.23-rc1 also).
Assuming nothing bad happens in the next few weeks leading to their being reverted (unlikely I would think, since they don't interfere with much existing code), then a standardized driver-independent ieee80211 packet injection methodology will soon be available by default in all major distro kernels. Currently if you want to perform packet injection, you enter into a dark underworld of individual driver patching, having to cook custom kernels and make animal sacrifices to forgotten Gods. But now with the injection patches, for the devices with mac80211 drivers all that crap is blown away and every 2.6.23 kernel will offer the capability built-in.
Here are the list of mac80211 drivers and whether I have actually seen good injection. All of the drivers are expected to work, but I don't have all the hardware.
<table><tr><td><b>mac80211 driver</b></td><td><b>Personally Tested</b></td></tr>
<tr><td>adm8211</td><td>no</td></tr>
<tr><td>bcm43xx</td><td>yes</td></tr>
<tr><td>iwl3945</td><td>yes</td></tr>
<tr><td>iwl4965</td><td>no</td></tr>
<tr><td>p54</td><td>no</td></tr>
<tr><td>rt2x00</td><td>no (pending)</td></tr>
<tr><td>zd1211rw</td><td>yes</td></tr>
</table>
I have also provided the <a href="http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer">Packetspammer</a> commandline applet to show how to use the injection API from userspace, this provides a simple, GPL2 tested base for making your own injection code for your own apps.
Work started on it in Dec 2006 by providing the old Linux stack driver patches. The real reason for the sustained effort is to enable <a href="http://penumbra.warmcat.com">Penumbra</a> to work "out of of the box" on not only Linux desktops and laptops but generic embedded devices as well.
Jamendobox2007-06-08T00:00:00+08:00https://warmcat.com/2007/06/08/jamendobox<img align="left" alt="icon" title="icon" src="http://warmcat.com/blobber.png" />Well I already knew that <a title="Jamendo" href="http://jamendo.com">Jamendo</a> allowed you to download their catalogue in Ogg, but by itself it was just an interesting side-note. (Jamendo being the primary site to get liberally licensed music, often biased towards francophone nymphs)
But to my surprise I tried Fedora 7 Rhythmbox a few minutes ago, expecting nothing much but the pile of Gnomic crud that assaulted me last time I tried it. Ah no, mashed up with Jamendo (but apparently not as planned Magnatune) Rhythmbox has become more than the sum of the parts and has crashed uninvited into iTunes territory.
It has become a native client app you can start up and listen to a big catalogue of liberally licensed music, not only without the expectation of having money sucked from your living veins but even with the expectation of being moved to voluntarily donate to the people sending you their hard work for free, when you especially appreciated what they offered.
And there on the menu bar is the legal, sanctioned, intentional "download album" button. In the face of this must you go and give money to the lawyer loving corporate coke snorting beast-creatures for mainstream crud? Or should you not set out to make a direct connection with the artists and show them your appreciation in a direct and personal way?
An extraordinary advance for standard Linux media players!
The Alignment Monster2007-05-25T00:00:00+08:00https://warmcat.com/2007/05/25/the-alignment-monster<img align=left src="http://warmcat.com/illustration-sausagehead.png" alt="insanity at the laptop" />Currently I am working on an embedded Linux ISDN-2 device I have designed... the hardware works fine but it's clear that the challenges lie in the software stack. ISDN uses a cryptic, stateful protocol called LAPD to manage call state and features many layers of protocol stacks to get the job done. You know you are dealing with the old school when they refer to 64kbps log coded PCM as "3.1kHz voice", meaning the audio bandwidth.
Naturally at this.. mature... stage of ISDN development (ie, I am plundering the ancient dusty tombs of a dead protocol that happens to be in wide use) I am not anxious to become a guru capable of winning arguments at the bar on ISDN protocol trivia, instead I need the freaking thing to work. If it's a new technology, exploring the byways and understanding it closely can often pay off in the future, but there is much less chance of that when dealing with something old and basically deprecated (cf ADSL). So I chose to use <a href="http://www.misdn.org">mISDN</a>, an attractive proposition with a driver for the chipset I am using and capable of working in both NT and TE modes -- basically allowing acting as the exchange and the customer.
mISDN is getting some reasonable use as part of Asterisk via chan_mISDN, so I hoped for an easy ride. However it is clear that mISDN has not had much of a life outside of x86. The Makefile is not set up for crosscompile and indeed the thing from git when I started on it would not compile against a current kernel source. In fact the thing caused a segfault in the kernel build process on contemporary kernels, the two-line fix for which represents my first contribution to the actual Linux kernel tree. (Rather a weird bug... the text CONFIG_MODULE appearing in any source file will cause the build to fail with a segfault in a script. This text appeared accidentally in mISDN -- CONFIG_MODULES was meant which did not trigger the bug)
<!--more-->
However, there are at least two other strugglers which suffered before me on ARM crosscompile and offered some support on the list. Well the actual crosscompile action I managed fine... the problem is that the resulting sources did not work properly in a very curious way. The downstream symptom of it, that was actually noticable, was that an opaque handle for a resource was wrong. When it later tried to dereference the handle to an actual in-memory object, no object matched that handle. But the handle was broken in a curious way. Here is the actual packet that returned the handle
<code>0000: 00 00 00 00 81 23 0F 00 00 00 00 00 08 00 00 00 .....#..........
0010: 80 01 00 40 00 00 00 00 ...@....
</code>
This chunk of data is represented in mISDN with an explicit struct for the first 16 bytes, and then an unformatted "argument" block of data follows, in this case a further 8 bytes of it.
<code>typedef struct _iframe {
u_int addr;
u_int prim;
int dinfo;
int len;
union {
u_char b[4];
void *p;
int i;
u_int ui;
u_int uip[0];
} __attribute__((packed)) data;
} __attribute__((packed)) iframe_t;</code>
The opaque handle where the problem comes from is found in the first 4 bytes of the arg region at +0x10 in the dump above, represented by ->uip[0]. The correct result is to walk away from the packet understanding that the opaque handle is 0x40000180, the ARM9 I am using being little-endian.
However to my surprise in fact the code left under the impression that the opaque int handle was 0x80. I confirmed that the pointer was at the right offset from the start of the struct: somehow dereferencing a 32-bit int pointer that looked at memory containing 80 01 00 40 gave the result 0x80!
Further, if instead of using my own int * based on ->ui to do the dereferencing, I used ->uip[0] directly from the struct I got the correct result of 0x40000180. And I confirmed that & ->ui and & ->uip[0] are exactly the same!
Diego Serafin on the mISDN mailing list had seen this crap before. He provided the solution: on ARM, <strong>misaligned</strong> that is, b1 b0 of the address bus being nonzero for 32-bit access, reads are silently BROKEN. What happens in this case is that the read happens at the address & 0xfffffffc BUT rotated according to the original address &3. An example.... if at address 0x0 one finds 11 22 33 44 55 66, then it's clear that dereferencing an int * pointing to 0x0 will result in 0x44332211 on both x86 and ARM. On x86, dereferencing an int * to 0x1 will give you 0x55443322. BUT on ARM, the same dereference of an int * to 0x1 would give you instead 0x11443322.
And indeed that is what had happened in the example case above, where the pointer to the start of the (struct & 0x3) was 0x3... it was some address 0x...f. In this case it read the 0x80 and then filled in the upper bits from byte offset +0xd, which are all zeros.
The reason that ->uip[0] gave the right result is purely down to it being marked as __attribute__ ((packed)). In such a case, the compiler understands it cannot use a 32-bit bus access but has to use four byte reads and or together the 32-bit result.
So: the takeaway from this is that it is not enough that the C code be "correct" for x86. If it is to be portable, ALL int accesses must be aligned to int boundaries. It is NOT enough that the compiler pads ints inside a struct definition to an int boundary either: because the start address of the struct may not be aligned to an int boundary.
I resolved this by adding macros to check pointers for int alignment (to find the instances where sensitive pointers are misaligned and can cause trouble) and macros to allow single step allocation of storage on the stack that is allocated to int boundaries, disallowing misalignment. But still it is an education to find perfectly sane C code that can work on x86 can blow violent chunks on ARM or other processors that insist on width-based bus alignment.
Bonsai code-kittens2007-03-31T00:00:00+08:00https://warmcat.com/2007/03/31/bonsai-code-kittens<img align=left src="http://warmcat.com/illustration-80col.png" alt="80 column limit" />The last few months I have been working on the <a href="http://penumbra.warmcat.com">Penumbra</a> project. I started off patching wireless drivers in and out of the kernel tree to achieve the anonymous broadcast action that the project needs, but it became clear this would be completely unworkable for general use... getting wireless up in Linux can still be a struggle and hoping people will patch their driver or kernel in addition isn't going to happen. After trying a couple of other methods in the end I created a radiotap-based packet injection patch for the mac80211 stack (formerly dscape / d80211), and bound it together with a patch from Michael Wu that provides radiotap-based Monitor mode. At the moment it is still in front of the linux-wireless folks and it's not clear what the result will be. If the patch is accepted, then the code should make it into the mainline kernel and all mac80211-based wireless drivers will work with Penumbra out of the box in the future. The patchset provides generic radiotap monitoring injection that "just works" with libpcap both ways, so I am hoping it will get accepted without people having to form an opinion about Penumbra.
But one of the biggest hurdles in creating the patch was not technical, since I already had the core functionality working, but in fact the Linux kernel coding style. In some ways the coding style fits well with my own personal style (formed over 20 years of writing C and C++), we basically use the same K&R style. There are some spacing and commenting rules that are actually better than my style and I will adopt them wholesale. But that's where the fun stops and the recrimination begins!
<!--more-->
The basic problem is the combination of three rules which has a terrible effect on eliminating what I consider good coding practice due to the constraints introduced by those rules.
<ul>
<li>Tabs are 8 chars NO EXCEPTIONS</li>
<li>Lines are less than 80 cols NO EXCEPTIONS</li>
<li>Everything in { } is indented by a tab (except switch cases!)</li>
</ul>
Now almost everything is inside a function body, so that gets you down to 72 chars already. And if your function is doing something non-trivial, your code is probably inside a while() for a for() and there are one or more levels of if() to decide to do it or not. Pretty soon you are writing code crushed up against the 80-col limit with only 30 chars that are usable and 50 spaces behind it. It strongly puts me in mind of the Bonsai Kittens fake website that showed how to push kittens into bottles so they would grow into the shape of the bottle.
Under these abnormal circumstances, certain things become very difficult to do:
<ul>
<li>\t\t\t\t\t"any kind of long " \n\t\t\t\t\t "string has to be " \n\t\t\t\t\t "artificially chopped up"</li>
<li>nested if()s may make perfect sense to explain your code logic. But you can no longer afford them because of the tab each one adds. So you have to invert the if() sense and use a goto (I kid you not, this is preferred due to the coding style rules)</li>
<li>I strongly prefer descriptive variable names which include type. Type is part of the information you need to understand what that variable is when looking at it. "nCountWaysILoveHer" tells me (now, and in 6 months when I have forgotten the code) it is an int that is counting a specific thing. "i" or "cnt" could be anything, although "cnt" is better. But you can't afford a long variable name with the rules above, you can get into a situation where there is not even enough room left after the tabbing to hold just the variable name on a single line.</li>
</ul>
On that last point there is some handwaving nonsense in the coding style doc for Linux that "C programmers don't use long variable names"... well I call bullshit on that one. The truth is that because of the other tabbing and length rules, <strong>Kernel</strong> C programmers <strong>can't</strong> use long variable names even if they realized it was better: they ran out of room for it.
To be fair to the coding style doc it does have a point when talking about what to do when the indents get too much: it suggests to break the indented content out into a new function and to call through to it. It also says that massively indented code means you were screwed anyway, because the logic was too complicated, and that can also be true. But calling through to functions can be a very bad fit if the code you are migrating out touches many variables defined at the parent function top level.
I am still working through the style rules trying to see what I should take on board to replace my own style and what I have to "fake" just for kernel code, but it seems to me life would be better for everyone if they relaxed the line length to 120 chars instead of 80.
Nasty Crash at Luton Airport2007-03-04T00:00:00+08:00https://warmcat.com/2007/03/04/nasty-crash-at-luton-airport<img src="http://warmcat.com/luton-airport.jpg" alt="Plasma display at Luton Airport shows Windows BSOD" />
Makes no sense to have a license to Mordor for each of the hundreds of plasma displays at Luton Airport. It sparked a conversation with my stepson about why it was chosen: presumably because the devs didn't know anything else than Windows, they will spend the rest of their working life acting as agents for Microsoft. This is the fruit of the Jesuit priest Ballmer's creed of "give me a developer until he has 7 months in the industry and I will give you a Microsoft Trained Monkey".
Using the same or cheaper hardware, an embedded Linux implementation would have had at least the same performance and much more flexibility. Ogg Matroska is a patent-free high quality video solution that plays back out of the box on Fedora for example. And if you wanted a hundred or a thousand displays your software licensing costs would stay at exactly $0.
Out of your tree2007-03-03T00:00:00+08:00https://warmcat.com/2007/03/03/out-of-your-tree<img hspace=8 align=left src="http://warmcat.com/illustration-tree.png" alt="Out of your tree" />The willingness of the kernel devs to refactor stuff is both a huge strength and weakness for the kernel. The strength is in the extraordinary continual optimization and improvement in the codebase, not just locally to an area of code but for cross-kernel concepts, like the recent workqueue changes.
But this has a pretty harsh cost for people writing or maintaining code that is outside the kernel tree and which therefore does not get the reworking applied to it as part of the core kernel. Whatever code they put out is invalidated and broken again and again sometimes in just the space of a few weeks.
The freedom to refactor despite breaking external code is a huge luxury for the devs seldom seen elsewhere in the coding world. Some projects take some care to allow compilation of their drivers for all recent kernels, using conditional compilation based on the kernel tree it is being compiled against, but other projects have an attitude that it will only compile against the current Linus tree.
The foaming churn of change makes for pretty hard work trying to make any kernel code that is not in the main tree work for any length of time. Greg KH at least is on record that his concept of the solution is to bring everything inside the kernel tree, but I don't know how that will ever scale, and it loads the devs with having to understand a work with an ever growing amount of device-specific code. Aside from that, it makes the kernel devs gatekeepers for what will be accepted, and since not everything that can exist will be deemed acceptable, there will always be a class of device driver that is living out of the comforts of the main tree.
Anyway the end result is that for many projects that people need drivers for, the shelf-life of any instance of the driver sources is extremely narrowly defined. A Wifi driver for example touches many subsections of the kernel that have a history of changes in the recent past, yet requires a pretty recent kernel to compile at all with the stuff that it actually needs. So each driver tree has a quite narrow slot of kernel versions that it will work with, annoyingly current CVS from many drivers will not compile against current kernel source, not -git either but -rc versions. It means that out of tree drivers are a lottery for any recent kernel any kind of driver is a high commitment project, that needs constant revisiting to keep it alive.
There doesn't seem to be an answer except that over time more and more critical subsystems in the kernel will surely mature to the point that they get fiddled with less and less, and things should therefore die down on the breakage front. But in truth the adolescent codebase of Linux shows no signs at the moment of slowing down its crazed foaming froth of reinvention and massive damage and breakage to the code around it.
Octotux Packaged Linux Distro2007-01-27T00:00:00+08:00https://warmcat.com/2007/01/27/octotux-packaged-linux-distroFor the last couple of years I have been working with the <a href="http://www.atmel.com/dyn/products/product_card.asp?part_id=2983">AT91RM9200</a> ARM9 chip from Atmel. This is a very nice, fast, cheap CPU when combined with some SDRAM and NOR flash. One of the bigger problems I faced getting started with it was managing the sheer amount of sources that go into a modern Linux implementation. Many embedded folk just generate a giant tarball which is their whole root filesystem, they don't track what versions are where and as a result providing sources for their magic tarballs can be a pain in the ass for them.
I knew I didn't want to repeat this method from the start, so I began to work towards being able to use the RPM packaging system even with my crosscompiled stuff. Having the full RPM app and database doesn't make sense for an Embedded system though, for example our standard AT91RM9200 platform only has 8MB of flash for all nonvolatile storage. I found that the excellent busybox had the start of an rpm implementation already, but although the code was really well-written, the implementation was not complete enough to do anything useful: it did not keep a database and there was therefore no possibility to erase packages, for example.
Over a few months I added more functionality to the busybox applet, including keeping a "database" of installed RPM headers down /var/lib to emulate most of the capabilities of real RPM. With this in place, I then went on to package all of the apps that I was interested to run on my distro. There are numerous advantages to formalizing the management of code into packages once you accept that you will only place packaged stuff on the device.
Creating an RPM binary also creates a source RPM (SRPM) that is contains all of the sources and build specification for that binary, and both the binary and source are versioned. Therefore so long as you offer the matching SRPM at the same time as you give someone the binary RPM, your GPL requirements for giving sources are solved in one stroke.
The patchlevel of any device can now be interrogated and used to select updates as well, in a formal manner. You can also express the need in a package to have "required" dependent packages of specific versions, reflecting a common situation where after some release you need a library of a certain version for it to work.
The "cost" for this level of control over your device boil down to: no more ad-hoc building. Even development builds are done through rpm packages. We provide a script that will remotely install over scp and ssh the latest version of a given package from the host commandline, even one that rebuilds the package before installing it remotely, and for every single edit you can be certain the sources are captured in the SRPM. We found a typical system is using only 100K of uncompressed data (typ 50K on jffs2) to maintain the RPM headers that make the packaging work. That is not significant even on a box with only 8MBytes of Flash.
Check out the Wiki at http://octotux.org for more details and the repository which contains the packages.
Your code might be Free, but what about your data?2006-10-03T00:00:00+08:00https://warmcat.com/2006/10/03/your-code-might-be-free-but-what-about-your-data<strong>Two Three Letter Names of FOSS bring their ships around for a Vision Collision</strong>
RMS (Richard Stallman) just had an <a href="http://www.redherring.com/Article.aspx?a=18757">interview</a> with the fairly major investment magazine Red Herring. A lot of the content isn't new from him, but as usual some of it is spot on.
<blockquote>Q: Do you think you will ever achieve that goal?
A: I don't know because it depends on you. That's why I resist these self-fulfilling prophecies. If enough of us demand freedom we're sure to win and if few of us demand freedom we will almost surely lose. It's entirely up to the readers of this article. As so many issues are to the extent that we still have democracy; so if people were told by businesses they want this and you know you can't [oppose] businesses so just get used to it, go along, suffer. If people lie down and take it then they will lose. So what do [businesses] do? They are smart: they encourage people to lie down and take it.
[This happens on] many issues and not just this one. Pick any political issue in which things get worse and you'll find people telling the public: It's inevitable. Don't try and fight it; it's useless. of course, if we did bother trying to fight it, we might win.
</blockquote>
A couple of items down on LWN where I found the link, there is another story, about ESR (Eric Raymond) <a href="http://www.prnewswire.com/cgi-bin/stories.pl?ACCT=104&STORY=/www/story/09-27-2006/0004440536&EDATE=">joining the "Leadership Team" of Freespire</a>, and it is the juxtaposition of the two stories that is the real story.
Although the Press Release says people were surprised "in recent weeks" about ESR's speaking out about the need to work with proprietary data formats, his interest in the issue goes back as far as March 2006. He was on fedora-devel then making the same suggestions with the same urgency, basically that although Linux was doing quite well, it was really let down by the lack of support for proprietary codecs. There followed a discussion about why there was no MP3 support out of the box in Fedora; the patent situation appeared to be news to him. He was <a href="http://www.redhat.com/archives/fedora-devel-list/2006-March/msg01286.html">basically campaigning</a> for Redhat to start defying patentholders and ship mplayer and other such contraband to allow the most complete possible proprietary data format out of the box. It was <a href="http://www.redhat.com/archives/fedora-devel-list/2006-March/msg01296.html">explained to him</a> that if we as generally pretty penniless individuals decide to download and use mplayer, that is one thing; if a cash-rich American corporation like Redhat decide to start distributing some of the mplayer stuff, which is fairly clearly containing copyright and patent infringements in the US at least, it would strongly motivate the rightsholders to mount an attack to separate the proposed Redhat-idiot from his pile of cash. Redhat are lawyered up enough to be completely alive to the danger, even with regards to MP3. After several days of back and forth, with Alan Cox weighing in quite negatively towards ESR, he moved on to greener, well, less Red, pastures.
So as it happened it was my good self that perhaps brought linspire's more commercial attitude towards Linux media player apps to the attention of ESR. To underline a point I will be developing in a moment, I sent him a link to linspire's legally patent-licensed DVD player app several months ago. (Michael Robertson, Linspire honcho, earned some serious gratitude from my family and I for funding, via a prize after the fact, the original Xbox Linux hacking work that I got a fair chunk of, in fact that kept us afloat for about a year).
Now one of the things I realized during that thread, which is the point of this post, is that the conspiracy between copyright law enabling the licensing of works how the rightsholder sees fit (consistant with compulsories that exist in the case of music) and patent law enabling the holder of the patent rights to control the ability of people to play back content that can only be decoded according to their patent, gives proprietary software like Windows a niche that it can't be winkled out of by FOSS equivalents.
If content rightsholders insist on patented codecs for their content, well, that defines their content as needing licensed playback devices, and that in turn (exceptions like the recent weirdo-licensed free MP3 license for FOSS aside) insists that there is paid-for player doing duty, which violates the share and share-alike basis of pure FOSS. If content rightsholders insist on end-to-end TPM-backed crypto lockups in addition, well that requires a proprietary hardware system with a proprietary OS.
Therefore proprietary software is validated and given meaning by the rights conferred by Copyright and Patent laws. It's given a future by the deeply embedded and accepted laws that underpin expressions of creativity in developed countries. FOSS can't equally compete without violating laws that have been proven many times to have vicious teeth: this is an area where FOSS can't do what it is doing in the areas that are not so wrapped up with globally enforcable rights.
And now we come back around to the elements at the start of this post. RMS knew this long before I worked it out in the middle of an argument with ESR. RMS does not have a lot of time for the traditional media channels
<blockquote>Q: What is the solution to making the free software movement successful?
A: People should boycott all digitally restricted media and if you can't get your computer to copy it then you shouldn't buy it. If you don't have free software to read a DVD you shouldn't get a DVD. We are calling a boycott on things like HD-DVD and Blu-ray. The solution is to eliminate DRM. There is no situation in which DRM is excusable. Maybe you will be able to access peer-to-peer networks to these songs and movies; I hope so. At least that won't put chains on you, so it's ethically legitimate.
Q: So, you don't watch any movies on DVDs?
A: I have a few DVDs that are not encrypted and I don't have anything that would play an encrypted DVD. Hollywood sets out to make crap and most people who see it already know that it's crap before they go to watch it. It's not quite the same as boycotting all movies. Boycott all movies that you don't have a reason to feel that they're good. And it's obviously different from the simple boycott but the practical result is the same.
Q: Which movies, according to you, are "good" movies that you have watched?
A: My memory isn't very good but I have seen movies that I feel like are not crap. I like Spike Lee movies and I also liked Galaxy Quest - it's a comedy which makes fun of Star Wars and its fans, and turns into science fiction. It's rather fun. I also like Spartacus.
Q: Which was the last movie you saw?
A: The last movie I saw was Spike Lee's Inside Man and I saw it on an airplane to India, which is where I end up watching most movies.</blockquote>
Well he goes too far with "There is no situation in which DRM is excusable." in fact I demand some kind of DRM on my bank account, you can call it privacy or encryption but it is in fact Digital Rights/Restriction Management. As he correctly points out in his nomenclature, success against DRM depends on the generic consumer rejecting it. In my philosophical terms, the evil is coming out of the DRM consumer. Without the gormless willingness of the consumer to accept the restrictions inherent in what they chose to give money for, an evil action of the rightsholder cannot bear fruit. But where my own data is being kept from me for a larger reason -- so I cannot manipulate my own bank balance by hacking it -- this is in a larger interest shared by everyone.
But still: RMS understands what ESR rejects, it is not enough that the code is Free, the data, the content has to be Free too. mplayer, bittorrented mainstream content, they are free but they are not truly Free, and they cannot be with the laws as they are. It is a shift as big as FOSS vs proprietary to move to a world where the data that you consume has the same Free rights as the code used to render it. How can the philosophical shift from proprietary to FOSS be played out on the content? jamendo.com shows the way but how can the content take advantage of the same aggregation advantages that code can?
Rights and Wrongs of Hacking Source Licenses at Distribution Time2006-09-21T00:00:00+08:00https://warmcat.com/2006/09/21/rights-and-wrongs-of-hacking-source-licenses-at-distribution-timeThe highly interesting busybox license change process rumbles on. For me at least, it is the first time that I saw anyone really try to grapple with the implications of the upcoming GPL version 3 on projects that licensed some or all of their sources under GPL V2 "or later" terms.
As I described in my <a href="http://warmcat.com/_wp/?p=24">previous post</a>, until now nobody seems to have thought through the ramifications of allowing a distributor to specify the license version he chooses to distribute under, or how he should specify what he has done in a non-conflicting way.
There must be a surefire way to use and distribute a GPL 2 "or later" package under GPL 2 rules. For example, so that a recipient cannot try to apply the GPL 3 "give me your crypto keys" rules when the distributor honestly and correctly wants to play only under GPL2 rules. But as I described in the previous post, merely noting that you distribute the sources under specifically GPL2 rules leaves the package you pass on in conflict with itself. The source files in the package are still annotated to allow "modification" under the terms of the GPL 2 "or later".
What I proposed on the busybox list <a href="http://busybox.net/lists/busybox/2006-September/024566.html">here </a>and explained more clearly <a href="http://busybox.net/lists/busybox/2006-September/024568.html">here</a> was that a very strong way to make it clear how you are distributing would be to modify the GPL terms shown in the source files, ie, to show that you are distributing under GPL2 only as allowed to by the license, removing the "or later" part from the originally GPL2 "or later" sources. This has the advantage that the distribution method is consistently shown throughout the sources, and the action is 'sticky', ie, if you were given those sources under GPL2 only then they remain GPL2 only through further distribution.
While the maintainer <a href="http://busybox.net/lists/busybox/2006-September/024570.html">seemed to like the idea</a>, there was a cognet objection from <strong>Glenn L McGrath</strong>, who says
<blockquote>
<pre>GPLv2 clause 9 states "If the Program specifies a version number of
this License which applies to it and "any later version", you have the
option of following the terms and conditions either of that version or
of any later version published by the Free Software Foundation."
I dont see where it implies the right of licensees to remove the "or
later" statement. For all i know removing the "or later" clause may be
adding a further restriction to the license which isnt allowed according
to clause 6.</pre>
</blockquote>
In addition to that concern, Clause 1 of GPL2 also seems to not like the idea as I noted in a reply
<blockquote>
<pre>''1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; <strong> <u>keep intact all the
notices that refer to this License</u> </strong> and to the absence of any
warranty; and give any other recipients of the Program a copy of this
License along with the Program.''</pre>
</blockquote>
Well something will have to give. What is needed is some definitive recipe to tell people who wish to distribute GPL2 "or later" projects under specifically GPL2 terms, as the license grants them the right to do so, such that unfair - unfair because they in good faith chose to play under GPL2 rules - GPL3 demands cannot be applied to such distributors.
Bruce Perens and the current busybox maintainer ended their argument for now by agreeing to both consult with the <a href="http://www.softwarefreedom.org/">Software Freedom Law Center</a>, a place that has found its moment to leap forward with advice I should think. So hopefully instead of IANAL there will shortly be an opinion from someone who IAL and who has a grasp of the confusing niceities of the text for a change.
GPL2 "or later" distributor sends mixed signals when distributing as GPL22006-09-19T00:00:00+08:00https://warmcat.com/2006/09/19/gpl2-or-later-distributor-sends-mixed-signals-when-distributing-as-gpl2A couple of people have pointed out something that is important about the GPL2 "or later" license: the distributor chooses the license version they distribute under. (Actually the busybox maintainer Rob Landley pointed it out last week too). The actual language used on sources licensed as GPL2 "or later" looks like this:
<code> * This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of
* the License, or (at your option) any later version.</code>
Note that this text uses "you" and "your" to apply equally to everyone who gets distributed to and who is distributing.
In the "distributor chooses" reading of this text:
<code> * This program is free software; <strong>you can redistribute it</strong> and/or
* modify it <strong>under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of
* the License, or (at your option) any later version.</strong></code>
a new class of information is generated about the distribution action which I did not see expressed anywhere so far, that is a sort of distribution action license versioning metadata. For example if Nokia have a GPL 2 "or later" package on a secure (crypto-locked) phone platform, in the "distributor chooses license version" reading when distributing they can specify "Nokia gives you this specifically under the GPL2, so no keys". If the guy then distributes the package on, he can choose the terms afresh, but since he has no keys for the Nokia platform that has no impact on Nokia.
The distributor cannot make a GPL V2 "or later" distribution action stick when he gives it out under, say, GPL V3. If he says, here, have this GPL V2 "or later" package under GPL V3 only terms, the recipient need only redistribute it to himself or through a friend to have it validly on GPL V2 terms again. So it seems this distribution action license choice is only useful for one thing, getting the distributor out of distribution conditions found in later license versions, it seems it can't be used as a forced upgrade off older license versions.
Simply by stressing 'modify' rather than 'distribute' from the and/or:
<code> * This program is free software; <strong>you can </strong>redistribute it and/or
* <strong>modify it under the terms of the GNU General Public License as
* published by the Free Software Foundation; either version 2 of
* the License, or (at your option) any later version.</strong></code>
the text can equally be read another very different way, which is that the distributed-to person is granted the right to modify the sources "under the terms of" the GPL version he chooses. Someone who makes this reading of the text can believe they had a right to apply for example GPL v3 duties on their distributor, and ask them for the keys needed to allow installation as specified by the current GPL3 draft[1]. Folks can argue the distributor saying (where? when subsequently challenged?) that he distributed on the terms of GPL v2 beats out this reading, but it's not that clear to me because it was also the distributor himself that gave the recipient the license telling him he had the right to modify and distribute according to the terms in GPL V2 "or later"; if modification and installation according to GPL V3 needs the distributor's keys the GPL V3 gives a way to demand it of the distributor. There are at the least mixed signals coming from the distributor in this case it seems to me.
These proposed considerations only apply to GPL V2 "or later" licenses, stuff like Linux which is GPL2-only seem to have no risk that GPL V3 provisions might be claimed as duties for the distributor.
[1] http://gplv3.fsf.org/gpl-draft-2006-07-27.html
<em>''1. Source Code ... The Corresponding Source also includes any encryption or authorization keys necessary to install and/or execute modified versions from source code...''</em>
I'll make you free if I have to lock you up!2006-09-18T00:00:00+08:00https://warmcat.com/2006/09/18/ill-make-you-free-if-i-have-to-lock-you-upInteresting goings-on over on the <a href="http://busybox.net">Busybox</a> list. Busybox is a single app that masquerades as a large set of common unix tools like ls, a shell and so on. The maintainer is planning to, well, sort of migrate the project to being GPL v2-only. It's a bit complex because there are a variety of copyright notices floating around in there at the moment. He discussed it on the list for some time and although there are some people that want to have GPL2+ (meaning, GPL v2 or any later version at the user's discretion) the proposal seemed to be gaining traction.
The issue is significant, because in the current GPL3 drafts there is language that would require any signing keys to be given up with the sources. If you plan to design a device which, for the security of the customer, would reject code, eg, updates, that were not signed by the manufacturer, then the GPL3 would appear to disallow using GPL2+ or GPL3-only licensed code with such a scheme.
Linus has already come out against this idea as one of the reasons he will be sticking with GPL v2 for Linux. But of course Linux is just one part of the puzzle, and Linux is fairly unusual in having changed the default language of the GPL v2 license from "version 2 or later" to fix it firmly with v2 only. There is a lot of code out there that may suddenly inherit this "I demand your crypto keys because I am treating your code as licensed under GPL v3" problem simply because they left the default "v2 or later" in there.
Bruce Perens, who was in at the start of Busybox but left it many years ago, argued on the busybox mailing list against changing the license to GPL v2-only, but I think he is mistaken. I think GPL v2 "or later" licenses may turn out to be a very ugly can of worms. In any event the result was the maintainer a few hours later announced that busybox was going v2-only.
The problems over on busybox are the first sign for me of what may be a major licensing train crash brought on by the thoughtless handing over of the author's licensing terms to Richard Stallman. Great man that he is, that is a lot of power he is channelling right now, he alone, through the FSF, can randomly dictate the licensing terms of a vast body of "GPL v2 or later" code that is currently in use. Under the banner of "increasing freedom", he is in the position to disallow current usage of existing code that is currently used in accordance with the license. For an unlikely but illuminating example, if he decides that the GPL v4 requires the distributor to donate $10 to the FSF, then recipients who decide they will use "GPL v2 or later" code on the GPL v4 terms can force the distributor to do this. And this is on code that is currently used based on the fairly well understood GPL2 terms where there is nothing like that.
Now you may scoff at such a wild straw man argument, but I discover over the weekend there are people that believe that the GPL v2 requires you to give up any signing keys you may use on a binary created from it! In a subthread starting <a href="http://busybox.net/lists/busybox/2006-September/024451.html">here</a>, Rich Felker proposes the idea as fact and I argue against it. Later in the thread Rich insists he will take people to court if they fail to deliver such keys; I bring up Redhat's own signing of GPL'd packages as a case where he should attack according to his principle. He deflects this by saying that since it is possible to install unsigned packages, he will not need to sue Redhat. However, yum by default will not install unsigned packages, and besides you cannot do so without the root password for the box. For many reasons, a user may not have the root password. Does Rich propose that everyone with a box with GPL v2 software on it must be given root access? There has been no reply.
Anyway, it is all getting a bit chilling this talk of negating the possibility of actions of users of free software in order to make them free. It's starting to sound a little bit like the start of a tortured logic found in Socialist states, where the workers must build palaces "to be free", grub around in the fields to feed people in offices "to be free". Purely by that innocent sounding "or later" found in the default GPL2, a huge amount of power, proportional to the amount of software with "or later" in the license, has landed on one person, Richard Stallman. Was everyone really aware they had elected a Great Leader, no matter how trustworthy, and that their package was part of a mobilization force under orders from above?
<script> digg_url = 'http://digg.com/linux_unix/I_ll_make_you_free_if_I_have_to_lock_you_up'; </script><script />
Old Tech2006-09-14T00:00:00+08:00https://warmcat.com/2006/09/14/old-techLast weekend I started on an ISDN related design for a customer. A neat solution would have been based around a Zarlink chip, I downloaded the datasheet and saw that it was last updated in 2006, always a good sign. Having used Zarlink Echo cancellation silicon it was reasonably priced and did a nice job, so it was my first port of call. I read the datasheet and realized that it would do just fine. ISDN is broadly similar to Ethernet in the transport layer, it needs a specialized "magnetic" (set of transformers and chokes) to isolate and condition the signals. Zarlink recommended a particular magnetic and I downloaded the datasheet for that too, and began to make the schematic over the weekend.
On Monday I rang first the representative for the transformers in the UK and asked approximately how much they are. He sounded surprised and told me these transformers were "ancient" and would need making to order. Now ISDN is a, ahem, mature technology but even so I experienced that sinking feeling, and my next call was to the Zarlink rep in the UK. Sure enough there too the device was "ancient", dating from the late 90s, and although still available (and in RoHS compliant form too), considering the design called for 4 chip and 4 transformers the whole thing was starting to smell bad.
I Google(TM)-ed around for a while and found CologneChip. They had a quad port device that would do the whole job for the price of two ports of the Zarlink method, and the magnetics were a third of the price too. I started reading the datasheet rather resistant to the idea of throwing away the work I had done on the Zarlink implementation, but it was quickly obvious that the CologneChip implementation was radically superior. In addition, the same family offered an E1 compatible part. Oh well. I threw out the Zarlink version and completed the initial CologneChip version in three days. I also had a sales call from a very knowledgeable guy from there offering what sounded like very good design-in support. But in fact the datasheets and an app note seemed to be enough to get started (or get in trouble, we will see!).
ISDN-2, the consumer ISDN, has three virtual channels: two 64kbps B "Bearer" channels that contain the bulk audio data in A-law PCM, and a 16kbps D channel that contains "signalling data", which appears to be a bidirectional packet-based setup. A software state machine is needed to service the D-channel actions according to a protocol stack that includes LAPD, I.465 and Q.931 specifications. Luckily there are some GPL'd implementations, or partial implementations anyway; the CologneChip guy was also saying that many exchanges do not implement all the options. Therefore it will be interesting to see what kind of packets are issued by the exchange as a start on understanding the protocol stack.
Next Generation2006-09-14T00:00:00+08:00https://warmcat.com/2006/09/14/next-generationIn the Electronics and software treadmill world practitioners are constantly having to re-skill themselves to keep up with solutions that make sense in size, power and cost. Software has 'bitrot' where things stop working properly if they are not maintained; while hardware designs should keep on chugging away once made, they may not remain manufacturable for long with components going obsolete and even if the components are available, hardware designs don't stay competitive for long with the constant cycles of improvement in silicon.
Generally small design companies are pretty much as empowered as the very large companies in terms of designing and using the latest stuff. But currently there are two major and important technologies in hardware terms that are out of the reach of small design companies.
First is BGA (Ball Grid Array) packaging. Instead of legs sticking out of plastic shells, BGA has an array of solder balls on the underside of the package. The PCB has exposed metal pads under each ball; the BGA is placed on top and heated slowly according to a temperature profile. The solder balls melt and firmly attach all the connections through to the PCB pads.
The problems with BGA start with the PCB design, many BGA pinouts are far too dense to allow automated routing without taking up a crazy number of layers. Modern BGA chips have solder balls on a 0.5mm pitch(!) which further demands the expense of laser-cut vias. PCB autorouters which are perfectly fine for PQFP or other pin-based technologies make a miserable job of BGAs and they can need to be fanned out (to get the signals spread out from the pads) by hand.
Small design companies are typically making their own prototypes, but this is no longer possible either with BGA, since everything is on the underside of the device. Instead an outside contractor must be used to place the BGAs on the PCB with an infra-red oven, and the result has to be inspected too with technology that is beyond a small company, using X-Rays to see through the chip and to confirm that all of the solder balls are melted and making contact.
The problem is that the most modern and desirable technologies are starting to appear ONLY in BGA form. Unless a designer can specify world-class technologies available to his customers' competitors, clearly he is at a disadvantage. So moving out of the pin-based ghetto into the BGA world is a major and growing concern here.
The second issue surrounds the problems of high-speed transmission-line based technology found for example in using DDR DRAM. There are a bunch of stringent design rules surrounding DDR, the most difficult of which is length matching 70 or more nets. Basically all the signals should arrive at the same time to the chip, this means ensuring that they all travel about the same distance on the PCB. If you look at a modern motherboard, you will see some tracks perform strange "squiggles": they are doing to to add length to themselves so they match the length of a signal that had to travel further. Trying to do this for 70 or more nets in a small region, where each change can impact the length of other nets is... nontrivial ... and completely beyond the midrange autorouters available to small companies. Higher-end autorouters like Cadence Specctra are capable of automating this task, but run to GBP30,000, and demand a king's ransom to keep the updates coming. Failure to maintain a sufficiently tight relationship between the lengths, or to keep signal quality to the necessary level for other reasons, will result in a design that can't operate at the intended frequency or is flaky at any speed.
A related problem is being even able to look at signals operating at these high speeds due to bandwidth restrictions on midrange oscilloscopes.
These problems are for the future: currently we master 90MHz SDRAM bussing and 180MHz ARM9 CPU technology on 4-layer which is more than sufficient for many of today's and tomorrow's designs. But the very high end of today's needs is demanding the ability to attack BGA and DDR and investing in it is going to have to be on the agenda in the coming months.
libtool and .la files2006-08-28T00:00:00+08:00https://warmcat.com/2006/08/28/libtool-and-la-filesWell, today's lesson was about libtool and what those curious .la files are for. I decided to go through the packages made so far (a couple of dozen) and start to clean out all the dirty hacks that I did to get started while I was learning. For example, I made a ~/.rpmmacros setting "crosspath" to set the active crosscompiler base path instead of embedding it in the spec files all over the place, and when I got started I embedded the literal path to the devel-filesystem directory (you install package-devel packages here so that other packages can link to those libs during crosscompile), now I use a relative path.
While I was at it, I decided to fix the few packages that I left getting stored into /usr/local instead of /usr, well, they ended up like that because I didn't know what --bindir= was for in ./configure at that time. Curl and PHP fell into this category. Then I remembered that I had not understood how to link PHP to curl libraries at that time either and I could fix that too now. (When you meet something as big as ./configure and crosscompiling and packaging all at the same time, it is a terrible struggle just to get through to a successful compile and corners get cut, but once the process is familiar it's embarrassing to go back and see what simple things need to be refactored from your early clueless threshing around.)
Anyway I recooked curl and then did PHP. To my surprise PHP kicked out right at the last link stage, it had attempted to link the host /usr/lib/libcurl.so instead of the target-compiled ones which in my system are installed into ./rpm/BUILD/devel-filesystem/usr/lib (by doing an rpm -i on the -devel package with a fake root).
I wandered around trying to understand this for a while changing env vars like LDFLAGS, but it was clear the -L path to my devel-filesystem tree had the right precedence and nothing was on the libtool commandline telling it to go to /usr/lib.
In the end as a test I moved my local /usr/lib/libcurl.so to /root and tried again -- this time it linked fine using target libs as intended. Strange because it proved it was some kind of precedence issue, looking in local /usr/lib first and then using the right path if that didn't exist. But I noticed a warning... it said that /blah/blah/libcurl.la "had moved". I had a look in the la file -- it is actually pretty cool, it is a text file that seems to be generated at library link time and contains paths to the libraries that it linked in. In this case curl has openssl in it, so that was shown and was correct. BUT what is this at the bottom?
# Directory that this library needs to be installed in:
libdir='/usr/lib'
Hm a bit suspicious since my problems come from /usr/lib. I changed the path to point to my devel-filesystem path, and the build completed correctly!
I tried commenting out the libdir line in the .la file, but the link blows chunks complaining it can't find the "library name" of libcurl.la. So the only way forward to allow usage of libs in the crosscompile environment seems to be to add a sed line to mess with the libdir setting in the library .la file!
Greylisting is back in town2006-08-16T00:00:00+08:00https://warmcat.com/2006/08/16/greylisting-is-back-in-townThe last thing that I missed from my oldstyle PC mailserver was the greylisting implementation for postfix, <a href="http://isg.ee.ethz.ch/tools/postgrey/">Postgrey</a>. This was not trivial to port over to the new embedded ARM mailserver box because it was written in Perl. I spent considerable time trying to get Perl packaged and crosscompiled; I did get Perl itself to build but got lost trying to get its modules to build correctly (I since learnt more about libtool that might let me finish the job if I ever have to).
Eventually I realized my life was being wasted struggling with Perl, so I looked around for an alternative greylisting solution for Postfix written in something more reasonable. I must have been having a bad day, because I landed on <a href="http://www.tummy.com/Community/software/tumgreyspf/">tumgreyspf</a> which is written in Python. Cue another couple of days spent beating my head on Python trying to package and crosscompile another crosscompile basketcase. In a slightly shorter period of time I realized my life was wasting away so this time I looked for a greylisting addon that did not need ten tons of arguably dying language dragged in with it, and I found <a href="http://mimo.gn.apc.org/gps/">gps</a>. This is written in C++ (my kinda language) and just needed a library and sqlite, or so it seemed.
I managed to finally get sqlite to crosscompile and be packaged nicely today, so at last I was ready to attack gps. gps uses an interesting database abstraction library called <a href="http://sourceforge.net/project/showfiles.php?group_id=23824">libdbi</a>. I packaged that, then noticed that it has a child project libdbi-drivers which actually contains the interfaces to the various SQL backends that are possible. Finally it was all packaged and sent over to the ARM board, after a little spring cleaning to make room in its 8MByte root filesystem.
Well gps is broken for use with sqlite3, which it turns out is different from "sqlite" target in the config file. Thanks to this commendable sharing of solutions from <strong><a href="http://phorum5.greennet.org.uk/read.php?22,1320,1503">Uffe Vedenbrant</a></strong> and a hack to allow sqlite3 in place of sqlite in the sources I was able to get it cooking. Because it runs as 'nobody' via Postfix, a little care was needed to make sure the config file and the sqlite database file were accessible for that user. However it seems the gps maintainer is a little asleep at the wheel, since Uffe's patch was not taken in since nearly a year to the main project and it means out of the tarball gps doesn't work at all with sqlite.
Anyway the end result is once again I have some protection from spam just like the old days, except whereas before the box to do this was pulling down ~250W 24/7 the embedded solution only pulls 1.5W, for the same capabilities!
Edit: Ooooh yes, two spams waiting for me thismorning instead of 20, lovely
Dead Languages2006-08-14T00:00:00+08:00https://warmcat.com/2006/08/14/dead-languagesPerl and Python. Now Perl there is already angst about its demise, I can second it: Perl is dying, Netcraft may later confirm it, but I can do it now.
Python dying is a bit more eyebrow-raising, since bittorrent and yum and other GTK apps are done in it and until today I assumed it had a future. But not with me. It seems to have become impossible to crosscompile the thing. In both cases the problem seems to have been a lust for bootstrapping because it was a "real language". Because it is a "real language", when it makes itself it just HAS to do it in its own language. So Perl first makes a miniperl app that is then used to complete the build of the rest of perl. Because perl must be so damn indispensible, you can't even think of making Perl without doing it in Perl.
Likewise Python insists on building its modules via a Python script, run using the python it just compiled. These conceits of the language devs mean that crosscompiling is impossible, because the python or miniperl that was built WILL NOT RUN on the host that is doing the building since they are compiled for another type of CPU. The bootstrapping concept, so they can hold their head up in academia as a Real Language, destroys the possibility for crosscompile.
Dudes, you killed your languages for crosscompile. There is a lot of crosscompiling going on and it is only going to increase.
Now PHP was easy to crosscompile. Lots of people sniff at it as not being a real language, but look, I can run it down on my ARM box no trouble, and I can't use Perl or Python. Sell Perl! Sell Python! Buy PHP!!!!
Conexant ADSL Binary driver damage2006-08-14T00:00:00+08:00https://warmcat.com/2006/08/14/conexant-adsl-binary-driver-damageA couple who are friends with Jenny and I asked what must be getting on for two years ago about what could be done to remove the constant virus problems they were having with their Windows box. Naturally after making sure they did not need anything that Linux was poor at, ie, 3D games and so on, I recommended FC2 at that time. I nuked their box with it and the guy has been very happy all this time. I updated him to FC4 a while back.
But now he is upgrading from dialup to ADSL, he needed this taking care of. He had a Zoom PCI Adsl card, model 5506 with a conexant chipset. I found a driver here for it:
<a href="http://patrick.spacesurfer.com/linux_conexant_pci_adsl.html">http://patrick.spacesurfer.com/linux_conexant_pci_adsl.html</a>
Hm so the first sign all was not well was the age of the page and the results from Google, they are all from circa 2003. This project has been continued to be worked up in the last couple of months though. After some struggle trying to avoid the 4kBytes/sec modem download we got the driver and the kernel-devel sorted out and compiled it. It quickly blew chunks, on a #error that our kernel had CONFIG_REGPARM defined. Well we run the stock Fedora kernel and are not much interested in moving off it, why on earth should the driver care about this detail? Hm closer inspection of the site showed:
<table border="1">
<tr>
<td><strong class="em1">''Note:</strong> Linux 2.6.* users should note that their kernel must be compiled without the "use register arguments" (CONFIG_REGPARM) option. This is an experimental option that will almost certainly never work reliably with this driver or any other driver that uses proprietary object code. Newer versions of Fedora and SuSE come with kernels that use this option, in these cases you will have to recompile the kernel.''</td>
</tr>
</table>
Ugh, so the reason it couldn't survive CONFIG_REGPARM is because it has a binary blob which demands stack args! No chance apparently to get two binary blobs compiled with and without. This is a stupid situation, because the site itself documents that Fedora kernels after 2.6.9 on FC3 are compiled with CONFIG_REGPARM, since it should speed things up at no cost. His solution is to insist on a vanilla kernel.org kernel solely to support the needs of the binary blob :-(
We had to give up trying to get it cooking, and instead the guy blew GBP20 on an ADSL router from ebuyer. Just what awesome secrets do they think that binary blob is concealing? What astounding concepts that would set the world on fire if their sources were known?
Binary blobs, causing trouble and bitrotting where ever you find them.
I sent the guy running the project a polite email
<table border="1">
<tr>
<td>Hi Patrick - First thanks for your work on the Conexant ADSL project. I was trying to install a Zoom ADSL PCI card for a friend, we are both running Fedora Core 5. I saw after some time that I was on a loser because there is a binary blob in the project which was basically compiled with different compiler switches to cut a long story short. What is the situation with Conexant and this blob as you understand it? It seems that the chipset dates from 2002 or 2003, is there no chance that this far down the road they might be willing to be more liberal with the sources for it? My friend and I gave up on the PCI card and ordered a GBP20 ADSL router from ebuyer instead, simply due to there being a binary blob. -Andy</td>
</tr>
</table>
I got a reply a couple of hours later, the guy does not have a relationship with Conexant and says they are ignoring his mails.
Autotools crosscompile hall of shame2006-08-11T00:00:00+08:00https://warmcat.com/2006/08/11/autotools-crosscompile-hall-of-shameIn theory <a href="http://www.shlomifish.org/lecture/Autotools/slides/">autotools</a> has good support for configuring a project to be crosscompiled, and for some projects it works very well. Often it is simply enough to set the $PATH to your compiler dir, and say ./configure --host=arm-linux. However there are some patholgical projects that blow chunks violently when presented with a crosscompile action that they claim to support.
A good example of such an evil project is perl. This generates an executable called miniperl which is used in the build process to generate the final perl executable. But it does not take care to generate miniperl using a host compiler that is different from the target compiler, so the build process generates miniperl fine, just that it can't execute on the build host since it is an Arm ELF app. Therefore the build process drops dead.
A common issue is that the configure script wants to compile a short test app with the crosscompiler, which is fine, but it then wants to run it and look for a returncode. I added some infrastructure to the Octotux build tools scripts to enable this by prepending the helper script name to the app to be run remotely via ssh, and to capture the retcode transparently and return it to configure. Since the Octotux system captures any edits into a patchfile anyway, this is a relatively non-fragile solution.
One that caused much pain this last week is sqlite, which suffered from another common problem, the generated configure file wants to determine if include files or libraries exist, but decides that it cannot because you are crosscompiling. In this case it was the configure logic around the readline library that felt unable to survive the crosscompile action with an explicit test and fatal error. Despite hacking that out and overriding various configure defines and makefile definitions in the RPM specfile that tries to build it, there are still problems.
Edit 2006-08-13 sqlite seems to be compiled and packaged okay now, needs readline and ncurses.
Python is another monster like Perl. There are some patches to help here, but they are incomplete
<a href="http://mail.python.org/pipermail/python-dev/2003-October/039690.html">http://mail.python.org/pipermail/python-dev/2003-October/039690.html</a>
The guy who posted them had the whole crosscompiling deal politely rejected in 2003, yet the configure half supports it, even though the build process can't handle it at all. The patches in the link seem to magically require host-compiled python and Parser/pgen, which are used on the host as part of the build process, but they are not generated anywhere, and although python can come from Fedora python, pgen isn't available even in the -devel package. So it looks like you must compile the damn thing twice and hack the host packages into the crosscompile build process!
RT73 Belkin stick depression2006-08-09T00:00:00+08:00https://warmcat.com/2006/08/09/rt73-belkin-stick-depressionSadly I have thrown three days down the toilet on trying to get a Belkin "Wireless G Network Adapter", F5D7050, containing a Ralink RT2571 chip to work using either the Ralink RT73 driver or the newer serialmonkey rt2x00 driver which contains the rt73usb.ko driver, this is on my AT91 platform.
Initially I started, like a happy idiot, trying to get either to work with wpa_supplicant, since we have a WPA2 80211g network here. The Ralink RT73 sources did not initially crosscompile cleanly, there is a bad reference to asm/i386/... in an include, but after that it went better. However, at least when crosscompiled on gcc 4.02, this driver is a useless piece of crap, I outlined the problems <a href="http://forums.ralinktech.com.tw/phpbb2/viewtopic.php?t=2373">here</a> but naturally there was zero response from Ralink.
Well okay, I knew about the alternative serialmonkey driver from getting my elder stepson's laptop working, which incorporated another Ralink chipset. They did not seem to have any support in the form of the modified Ralink drivers, but they do have a beta 2x00 driver which supports the RT73 chipset. This got a lot further, the MAC address was correctly initialized and in the end, with some coaxing, it can be made to show results from iwlist wlan0 scan that include our AP. But it won't associate and stay associated. After I removed the encryption from the AP temporarily, I was once - one time only - able to contact the DHCP server long enough to get an IP, but then it immediately deassociated again. And this is with no encryption! Again I posted to the forums <a href="http://rt2x00.serialmonkey.com/phpBB2/viewtopic.php?t=1743&sid=b5d497eeedba1b98361030f1e75ac857">here</a> and again there was zero response. Perhaps it is the Arm crosscompile that is freaking the devs out, but since it is littleendian and 32 bits, it's really not so wild to expect it to just work.
Another issue - actually here is the one bit of good news from the work - is there are two versions of firmware for the RT73 I found, in the form rt73.bin. One is shipped with the ralink driver and is also available on their site, which claims to be version 1.7. The other was provided in the Win98 directory on the CDROM that came with the Belkin device and is referred to as version 1.0 in the debug output. The Ralink-supplied driver has its own code to grab the file from a specific path - /etc/Wireless/something - and also has a private copy of the firmware in the sources of the driver itself if it can't find the driver in its magic path. The serialmonkey driver does it the proper Linux way using the firmware API in the kernel. Anyway this was the good news, I learned how this worked and created a hotplug script that is compatible with it, allowing it to load the firmware successfully from /lib/firmware.
Anyway, while I have been saying recently that the wifi driver problems are largely resolved in Linux, which has been my experience on x86 laptops, they sure as hell aren't resolved for crosscompile usage :-(((
Edit 2006-08-13: I <a href="http://sourceforge.net/mailarchive/forum.php?thread_id=30145643&forum_id=40708">posted</a> to the serialmonkey project mailing list about it, it's too tantalizing close to forget about it. Head serialmonkey replied, "send hardware". Trying to see how feasible that it, since they need a build env and so on.
Postfix relaying for Dynamic clients2006-08-09T00:00:00+08:00https://warmcat.com/2006/08/09/postfix-relaying-for-dynamic-clientsMy in-laws have a Fedora box on residential ADSL in Spain. They experienced trouble with the terra.es mailservers, so I installed Postfix for them and set Thunderbird to use this local Postfix to directly forward their mail to the destination.
This works well except for a correspondent in Australia, who is on bigpond.com. They use a blackholing service which has blackholed the whole residential ADSL netblock for their ISP, on the basis a lot of spam is coming directly from compromised Windows boxes, meaning that the Postfix on their box doesn't get anywhere talking to bigpond (and annoyingly bigpond rejects the mail with a 450 not a 550, delaying notification that there are problems).
I am in a similar position, I run my own MX at home where I work, so mail is sent directly to me, but I am unable to reliably send outgoing mail directly due to some blackhole lists including my whole netblock. My solution is to run a Postfix instance on warmcat.com, which is not used for incoming mail and is firewalled off from everyone except my home IP address. As a belt-and-braces, the Postfix on warmcat.com is configured to only relay from my IP address anyway.
So the obvious solution to the problem with the in-laws would be to also route their outgoing mail through warmcat.com, which pretty much everybody will talk to since it is sat in a server farm. But the fly in the ointment is that they are on residential ADSL, their IP address is changing every boot. I don't want to add an authentication layer because I don't want to disrupt their mail any further while I get it working and the pinhole in the Firewall method is working fine for me too.
The first move was to regularize their dynamic IP using dyndns.org and the perl client from there. This gave me a reliable FQDN that always resolves to their machine. Then the problem was simplified to "how can I get Postfix to accept a list of clients allowed to relay using FQDNs? And to track changes where the DNS mapping is dynamic?". It seems that you can't, it only accepts netblocks.
To solve this problem I created the following script which runs from a cronjob. See inside the script for instructions.<!--more-->
<pre>#!/bin/bash
# update-valid-postfix-clients
# 2006-08-09 - andy@warmcat.com - v1.0
#
# Allows FQDNs to specify trusted clients to Postifx, including detection of IP address
# change and firewall opening and closing
# list the FQDNs you are allowing to see you server here, separated by spaces
# the DNS for these can be dynamic
# the script will open and close your firewall as the IPs for these change
# the script will take care to notify postfix to allow and disallow these IPs as they change
TRUSTED_FQDNS="home.warmcat.com some.domain.dyndns.org"
# list the networks that are trusted here separated by spaces
# notice that netblocks are handled by specifying the active part only
# eg, 192.168.0.0/24 --> 192.168.0 in this table
# You must open your firewall for these netblocks manually, the script does not do it
TRUSTED_NETS="127"
# VERBOSE=0 No output except fatal errors
# VERBOSE=1 Output only when something changes
# VERBOSE=2 Output each time run, even if nothing changed
VERBOSE=1
#
# Installation instructions
#
# 1) copy this file to /usr/local/bin/update-valid-postfix-clients
#
# 2) edit the above vars to configure for your situation
#
# 3) edit /etc/postfix/main.cf, comment out any existing mynetworks= line and uncomment the following line
# mynetworks = hash:/etc/postfix/network_table
#
# 4) Add this to /etc/crontab
# # open firewall and allow good users in postfix
# 00,05,10,15,20,25,30,35,40,45,50,55 * * * * root /usr/local/bin/update-valid-postfix-clients
#----------------------------------------
# no user serviceable parts below
DIRTY=0
function allow {
IP=`host "$1" | cut -d' ' -f4`
if [ ! -z "$IP" ] ; then
if [ -z "`cat /etc/postfix/network_table | grep $IP`" ] ; then # IP was not in force before
DIRTY=1
if [ $VERBOSE -gt 0 ] ; then echo "IP change $1 -> $IP" ; fi
OLDIP=`cat /etc/postfix/network_table | grep $1 | cut -d' ' -f1`
if [ ! -z $OLDIP ] ; then
if [ $VERBOSE -gt 0 ] ; then echo "Removing firewall setting for $1 -> $OLDIP" ; fi
iptables -D INPUT -p tcp -s "$OLDIP" --dport 25 -j ACCEPT
fi
fi
echo "$IP OK # $1" >>/etc/postfix/network_table-new
# note that we open the firewall always even if we are not marked as dirty for postfix
# this is so a local reboot will get the firewall fixed up even if the remote DNS is unchanged
if [ -z "`iptables -L INPUT -n | grep "dpt:25" | tr -s ' ' | cut -d' ' -f4 | grep "$IP"`" ] ; then
if [ $VERBOSE -gt 0 ] ; then echo "Opening port 25 for $1 -> $IP"; fi
iptables -I INPUT -p tcp -s "$IP" --dport 25 -j ACCEPT
else
if [ $VERBOSE -gt 1 ] ; then echo "(Port 25 for $1 -> $IP already open)" ; fi
fi
fi
}
# give us an empty network_table file if it doesn't exist to avoid harmless errors
if [ ! -e /etc/postfix/network_table ] ; then
if [ $VERBOSE -gt 0 ] ; then echo "Creating /etc/postfix/network_table" ; fi
touch /etc/postfix/network_table
fi
# regenerate list
for i in $TRUSTED_FQDNS ; do allow $i ; done
for i in $TRUSTED_NETS ; do echo "$i OK" >>/etc/postfix/network_table-new ; done
if [ $DIRTY = 1 ] ; then
if [ $VERBOSE -gt 0 ] ; then echo "reloading postfix due to changes" ; fi
rm /etc/postfix/network_table
mv /etc/postfix/network_table-new /etc/postfix/network_table
postmap /etc/postfix/network_table
service postfix reload
else
if [ $VERBOSE -gt 1 ] ; then echo "(No changes)" ; fi
rm /etc/postfix/network_table-new
fi</pre>
The script runs in a 5-minute cronjob, and takes care to do nothing if the IP address situation has not changed for the allowed FQDNs. If it does find a change, it removes the firewall pinhole to Postfix from that IP address, and creates a new pinhole for the new address. It also regenerates the list of allowed clients that can relay in Postfix, hashes the list and does a Postfix reload. The result is that just adding a FQDN to the script at the top will allow that FQDN access to the server no matter if it has a dynamic IP, but nobody else can even see the mailserver thanks to the firewall.
VMware networking in Fedora2006-07-27T00:00:00+08:00https://warmcat.com/2006/07/27/vmware-networking-in-fedoraHum the wonderful czech guy who provides
http://ftp.cvut.cz/vmware/
allows VMware to work fine on Fedora normally (I have an XP install that runs inside VMware to provide Protel). But since I migrated the VM to this laptop, networking seems to be fixed on host-only. I can ping the host from inside the VM but nothing else, despite (or because of perhaps) I selected "bridged".
Going to have a fiddle.
Edit: Hm, looks like arp proxying is broken on the wlan0 interface. Moving to a brdige on eth0 works fine.
Behind the Embedded sofa2006-07-16T00:00:00+08:00https://warmcat.com/2006/07/16/behind-the-embedded-sofaDoes your root filesystem df -h sometimes seem a bit more than the sum of it's du -h / parts? My main embedded filesystem has seemed that way for some months, yet when I run my script that checks that lists all the files in the filesystem that did not come out of a package, nothing really showed up as out of place.
Yesterday my /etc/fstab was partially corrupted so /proc and /tmp (a tmpfs) did not get mounted. To my surprise ll /tmp showed a bunch of stuff in there from Janurary, that had accidentally gotten created at the root filesystem mountpoint dir /tmp, before the tmpfs was placed there. The pile of 'invisible' (because you can't see it after the tmpfs takes up residence there at the real /tmp) junk amount to ~800KBytes uncompressed, in an 8MB root filesystem that is a very welcome new injection of space. Only yesterday I got a board into a bad place by updating the busybox package, but there was not enough space left to create all the symlinks... these symlinks are all of the shell commands like ln, rm, ls etc.... that required a boot into the serially loaded kernel to repair... I guess that won't be happening so much now I got some space out of my ass.
Yahoeuvre broken by Yahoo changes2006-07-15T00:00:00+08:00https://warmcat.com/2006/07/15/yahoeuvre-broken-by-yahoo-changes<a title="Yahoeuvre" href="http://yah.warmcat.com">Yahoeuvre</a> was a PHP project to capture and enhance the Yahoo boards related to the SCOG attack on Linux. It lasted a good couple of years, but today Yahoo have changed the format of their boards. It would require a fair amount of work to change the monitoring software to support the new format. Up until about 18 months ago Yahoeuvre served post content in various ways, including NNTP and email, provided full thread content on one page and so on. Somebody chose to complain about their content, freely visible and downloadable from Yahoo, being served additionally (unchanged) by Yahoeuvre, and I took all the content re-serving down and stopped visiting the forum myself (which was previously pretty addictive to me). It was still useful as a full-text search and archive of post actions by nyms, but it's not worth the effort to me to bring it into line. The sources are GPL'd and downloadable from the old site if someone else wants to try, but I'm not too sure how useful the mix of features chosen to complement the old board, with its limited threading support, are with the new board.
Interesting AT91 clock quirk2006-07-12T00:00:00+08:00https://warmcat.com/2006/07/12/interesting-at91-clock-quirkJust sent this to linux-arm-kernel. I saw the web archive screwed up on my signature, so I include it here to eventually get to Google.
<div lang="x-western" style="font-family: -moz-fixed; font-size: 10px" class="moz-text-flowed">Hi folks -There appears to be a subtle problem with the otherwise neat PLL setting api for AT91 found in./arch/arm/mach-at91rm9200/clock.c
The nice code in at91_pll_calc() does a search at runtime for the best match for the requested PLL output frequency given the base clock rate. So if you tell it you have a 18.432MHz crystal, and want 96MHz, it will find a good PLL multiplier and divide pair. This is commendable and cool.
The problem comes from the code not having the free hand that it thinks it does to choose the PLL ratios. This is because the physical external PLL filter components must be matched to the details of the PLL settings, and of course these are chosen at design-time.</div>
<!--more-->
<div lang="x-western" style="font-family: -moz-fixed; font-size: 10px" class="moz-text-flowed">So for example, the cool code in there at the moment determines that 18.432MHz /24 * 125 --> 96MHz exactly, and (leaving aside the problem that the PLL is specified only to work with inputs > 1MHz after the divide action), this is correct.However, you must have downloaded this (hideous password-protected source VB) spreadsheet
<a class="moz-txt-link-freetext" href="http://atmel.com/dyn/resources/prod_documents/ATMEL_PLL_LFT_Filter_CALCULATOR_AT91_2v1.zip">http://atmel.com/dyn/resources/prod_documents/ATMEL_PLL_LFT_Filter_CALCULATOR_AT91_2v1.zip</a>
and plug the ratios into it (except that it's cunningly password-protected code will not allow the <1MHz violation...) and compute the three passive components that will provide the correct PLL loop filter for this action. And then you must solder these uncomputable passives to your PCB in order to use the /24 * 125 ratio that the clock.c code has initialized PLLB to on boot. And the code does choose that ratio on the basis it is the closest available match.
If your loop filter is cribbed from the DK or EK, it will not work properly with the /24 * 125 ratio the kernel now chooses and sets and give flaky USB communication. In the case of my USB memory stick, for example, it will work for 30 seconds or so and then the stick is disabled by the kernel and the IO errors spew.
I don't know what a good solution is for this irritating situation, but I do know what an ugly hack is for it, which I attach, not at all recommending it for anyone except people with the same problem needing a quick fix. This forces clock.c to set PLLB to a ratio that, while it provides a slightly off 48.05MHz, does so with PLL ratios that are computable by the Spreadsheet Of Doom (/14 * 73). To go with this specific ratio, I have tried and recommend for stability these loop filter components calculated by the spreadsheet and rounded to real values:
R1 = 200R
C1 = 470nF
C2 = 47nF
PLLRC -----+--------
| |
| <
| < R1
| <
= C2 |
| |
| = C1
| |
GNDPLL ----+--------
With the ugly hack patch and these components, my USB memory stick is stable and happy.
-Andy</div>
<div lang="x-western" style="font-family: -moz-fixed; font-size: 10px" class="moz-text-plain">
<pre><hr width="90%" size="4" />
--- arch/arm/mach-at91rm9200/clock.c~ 2006-07-12 18:26:58.000000000 +0100
+++ arch/arm/mach-at91rm9200/clock.c 2006-07-12 21:11:31.000000000 +0100
@@ -599,6 +599,11 @@
unsigned i, div = 0, mul = 0, diff = 1 << 30;
unsigned ret = (out_freq > 155000000) ? 0xbe00 : 0x3e00;
+ if((out_freq==96000000) && (main_freq==18432000)) { // match the PLLB filter!
+ return ret | ((73 - 1) << 16) | 0x0e;
+ }
+
+
/* PLL output max 240 MHz (or 180 MHz per errata) */
if (out_freq > 240000000)
goto fail;</pre>
</div>
That was a pretty hard day of "wandering around in the desert" before I realized that a 0.5Hz frequency modulation that is appearing on the USB clock was not actually the problem. I guess it is slow enough that the loop filters inside the USB device PLLs (if they have one) can track it.
The spreadsheet was pretty evil, it would not run in Open Office, I guess OO scripting and Borg scripting are not compatible. Well never mind, hopefully that was the first and last time I will ever care.
Chip of weirdness2006-07-11T00:00:00+08:00https://warmcat.com/2006/07/11/chip-of-weirdnessTelephony box I am working on is using a pair of AKM2304 quad codecs, they work very nicely most of the time. But they have always been very sensitive to the powersupply. With certain PSUs that issue too high a voltage, eg, 5.4V instead of 5V, they are prone to stopping working and getting hot, too hot to touch. On giving them the correct voltage they start working again. In addition on fitting the chips to a board they have a relatively high dropout rate, again either working or getting irretrievably hot.
Yesterday I decided to examine the problem closer, since we are nearing production. I reviewed the datasheet and saw the configuration of Digital and Analog powersupply decoupling I remembered, a 10R series resistor between the digital DVdd that took the 5V directly and the Analog. But then I did a double-take... in fact the datasheet showed the Analog power getting it directly and a 10R in series on the DVdd side. This made sense when combined with a warning note in the datasheet that AVdd must not fall below DVdd or there could be "damage"... their idea was to kneecap DVdd slightly and give AVdd the full 5V feed to avoid this. I shorted out the 10R series resistor I had wrongly placed in AVdd and now these codecs are happy with 5.4V... subtle...
Broadcomm and WPA2006-07-11T00:00:00+08:00https://warmcat.com/2006/07/11/broadcomm-and-wpaAbout 18 months ago on a periodic trip to gawk at PC World (a superstore for PC stuff here in the UK) I purchased a Belkin PC Card 54g adapter with a Broadcomm 4306 chipset. Of course I took a flyer on the chipset, it was relatively cheap and I figured I would have some fun trying to get it to work with Linux. Yes, the same madness that grips me every time in PC World. The cheaper peripherals that do not have standardized interfaces (unlike, say, USB Audio devices like headsets, which always just work) always have a very new chip from a company that regards the interface to it as part of what makes their IP such a special flower and Must Never Be Told. Webcams seem to be the chief culprit at the moment.
Periodically I took it down from its box of dead things and tried to get it working with a new version of Fedora. Well I read that the BCM43xx driver was integrated to 2.6.17 and that is where Fedora are at (Fedora do a good job of tracking the latest kernels, there is a chart in a Linux magazine here in the UK this month showing Fedora has much later kernels than distros except SuSE). Since I was going to upgrade the laptop Rohan uses here to FC5, I did this and at the same time without too much hope tried the old Broadcom bookstop.
To my pleasure I was able to get it working here after extracting some firmware and sorting out wpa_supplicant, which I gained some experience in from getting this Samsung laptop working. I sat there loading webpages and looking at its power and data lights, which I was never before able to light. Good old Linux!
Hum later that evening the behaviour became intermittent. I ran wpa_supplicant with a debug switch and I see it is having problems maintaining sync with the crypto. Bringing the (eth1) interface down and up got it working for a while but then it would stutter into silence again. I modprobe -r'd the bcm43xx driver and pulled out the card, it was hot but not so hot. I know that wpa_supplicant is working fine on FC5 because this laptop's wifi is super stable (ipw3495-based). So the problem is either in the bcm43xx driver, or is a physical (heat?) problem with the adapter, I guess it makes sense it can show up in WPA breakage if it is a low level problem.
Edit: couple of days later, I changed the /etc/epa_supplicant/wpa_supplicant.conf contents and that seemed to resolve the problem, we will have to see if the improvement is permanent. Here is the contents:
ctrl_interface=/var/run/wpa_supplicant
network={
ssid="myssid"
scan_ssid=1
key_mgmt=WPA-PSK
proto=WPA
pairwise=CCMP TKIP
group=CCMP TKIP
psk=xxxxxxxx...xxxx
priority=3
}
Cursed AMD64 box2006-07-09T00:00:00+08:00https://warmcat.com/2006/07/09/cursed-amd64-boxAMD Athlon 64 x2 4400+ box has been working fine for about a year with a Western Digital 10KRPM SATA drive. This is a DFI Lanparty motherboard and a 450W PSU IIRC. The machine was up 24/7 for most of that year since it was acting as my mailserver amongst other things.
A few weeks ago the drive started acting erratically, I would waken in the morning and find that the ext3 filesystem on there had been remounted read-only because filesystem corruption had been detected. I was able to fsck the filesystem back into sanity and the drive would act fine for several days. Well these stories always end the same way, with a drive that won't complete a boot, and that was the case for this idiot too.
The particular disease was that the area of the disc that contained the LVM structure -- Fedora sends in LVM by default now -- was spewing hard IO errors when touched. Therefore it couldn't get past trying to bring up the LVM on boot and simply dropped dead. I documented the evasive actions I took on this <a href="http://www.redhat.com/archives/fedora-list/2006-June/msg03297.html">fedora-list mail</a> , basically I was able to recover the ext3 filesystem that was inside the LVM block on to a new SATA drive. "LVM"'s physical footprint is basically an 0x30000 byte header before the ext3 filesystem starts.
I installed FC5 on the new drive and brought over most of the data from the copy of the ext3 filesystem from the damaged drive, and went on pretty much as normal, with brief interruptions while I fished something I had forgotten I needed from the old filesystem. But then to my disbelief, after just a week, the new drive -- the only drive in the machine -- blew chunks in a similar way, hard IO errors one morning. I came in my work room and heard it performing the click of death.
I recovered from this rather grimly from backups, I did not fancy attempting a second recovery of 60GB of data from a second drive inside of a week. I stared at the AMD box for a minute or two though... I could think of two likely causes, the most likely one being the power supply. If it was having trouble with its 12VDC line, serious trouble, it might cause the drive to reset itself as if a poweron was happening repeatedly. It's not hard to imagine that a set of such resets at random intervals might eventually catch the drive out in its initialization phase and cause it to throw a fit ending it its head scratching the surface. The other possible cause is a bit more uncertain, both boxes were running the new FC5 2.6.17 kernel which has had a lot of work going on with libata and the kernel code for SATA. I wonder if that is repeatedly attempting drive resets as a last resort.
Anyway it had caused enough trouble, I swore off it and migrated back to running from this Centrino Duo laptop, it is plenty fast enough for a main workstation. One nice feature of vmware is that the XP I am running inside it has no idea that it has moved machine, there is no activation crap -- although this is of course a genuine retail copy of XP, one of two I own. I shall probably have cause to write about it another time but I have to have XP for <a href="http://altium.com">Protel</a>. It runs on top of Fedora Core thanks to Vmware workstation.
Coolest Mailserver2006-07-09T00:00:00+08:00https://warmcat.com/2006/07/09/coolest-mailserver<strong>Sata-eating Monster </strong>
Losing my 24/7 local box to the SATA-eating monster left an immediate problem - nothing was taking my mail. My MX for warmcat.com ends up at my cablemodem, incoming mail was just timing out with nobody to talk to. I thought for a bit about setting up an external postfix and moving the MX, but I didn't like it as a permanent solution, therefore it was wasted effort to make such a temporary solution. Annoyingly the AMD64 box went down just before I had to take a trip to Spain with Jenny for a few days. I shrugged my shoulders and hoped anyone with important mail would retry, and that the mailing lists I am on would be understanding of the rather unrefined behaviour.
When I returned from the few days of overheated childcare-in-another-land (formally, 16 years ago now, known as "holiday") I immediately fell ill with some kind of bad cold that laid me up in bed for a couple of days additionally. So after nearly a week the stress was on to get a permanent solution for the missing mailserver question.
<!--more-->
<strong>Replacing the fridge with something cooler </strong>
The huge fridge-with-fans that is the AMD64 box had never been a good fit for being on 24/7, it must have sucked significant power over the year it was on all the time. Since I am working on embedded ARM systems at the moment it had been in my mind for a while to port Postfix and run one of these lowpower AT91-based systems as my mailserver, with a USB memory stick as the storage. As soon as I got back I began laying the groundwork.
The first problem was my backed-up, transplanted buildroot gcc toolchain that used to run on the AMD64 box was an x86_64 --> ARM crosscompiler, but it had been restored on to what was basically an x86 32-bit box. Therefore I could not now even run the crosscompiler toolchain I had been relying on for many months to generate ARM packages.... a bigger problem even than the mailserver issue, since I can't do any work without a working toolchain. I decided to try to regenerate the crosscompilers from the original 4.0.2 gcc sources, which I had kept. This worked fairly smoothly, and I moved the old toolchain directory and replaced it with the new x86 compilers and started on porting Postfix in order to get my mail up, but also in order to test that the x86 -> Arm compilers generated code that would interoperate with the piles of existing x86_64 -> ARM compiled packages that were out in the world.
<strong>Postfix with added signal 11 </strong>
Postfix needed db4, or so I thought, it seemed later that there was some kind of HAS_DB define IIRC. Anyway I packaged db4 into RPM, installed it into the crosscompile devel filesystem and compiled and packaged postfix. When I scp'd them into the ARM box, all the postfix apps blew chunks with segfaults. If you ran them through strace they still segfaulted immediately before any dynamic libs were touched. This was a bit depressing... I assumed then that the problem was an incompatability between the code generated on 32-bit crosscompilers vs the 64-bit ones. Such differences shouldn't exist, but they could exist if there were bugs in the compiler sources easily enough. Well, I half expected it, so I recompiled busybox, uClibc and the gcc libs under the new compiler and updated the ARM box. I was surprised when I got the same segfaults on the postfix but good behavious on everything else, even things I hadn't recompiled running on newly compiled libs. It was clear the segfaults were not coming from differences between the new and old compilers after all.
<strong>Postfix Diet</strong>
Postfix is not set up to be thin and trim. It seemed that because of the desire to have it run on OSes that don't support dynamic libs, something I would be applauding if I were stuck with uClinux, it builds 5 static libraries and the dozen or so apps that make up postfix each include a bunch of them into themselves. The result is that the postfix package built like this totals 10MB compressed, since each app has copies of one or more libraries over and over. As I only have a meager 8MB root filesystem on these ARM boxes, that was not so optimal.
As part of the porting action I took the time to convert those five static libs into dynamic libs, so they would exist just the once, and to make all the apps bring them in as runtime libs. This reduced the postfix package to under 500KBytes. But suspicion fell first on the pretty widespread Makefile.in edits that had enabled this. After some futzing around, I learned an interesting lesson it seemed many people already knew well... -shared in the LDFLAGS to generate an executable does not indicate to used .so instead of any .a's that might be lying around, and it does not do good things without generating warnings or errors.
With this removed, half of postfix started up, the half that collects mail from port 25 and stores it in the internal postfix queues. The other half of it, "local", which takes mail from the internal queue and delivers to Mbox or Maildir for normal mail apps to see, continued to segfault. But I was very encouraged! I was at least collecting my mail again even if I couldn't easily read it yet. It was very cool to tail -f /var/log/messages and to see the familiar Postfix actions happening down on the little embedded ARM.
<strong>IRC -> GDB</strong>
I hit IRC to try to get some clue as to what would make Postfix's "local" process blow chunks, the #postfix channel on freenode. For whatever reason -- I think because my questions are either stupid or difficult -- I never have much luck with IRC support. #kernelnewbies helped me out really well once, but the other channels I have gone to my explanations and questions tend to be met with silence, simply because nobody feels they have a good answer, I think. If you stick around and try to help other people's questions, usually someone will suggest the mailing list, or give some general advice, so it's not unfriendliness.
I broke out the gdb package that I had prepared some months ago but not had a use for, and used postfix's /etc/postfix/master.conf to specify that the "local" processes being spawned should be debugged, using the -D switch. In /etc/postfix/main.conf you can define some bash script around gdb to account for the fact that there is no real console for the spawned process. This one worked for me
<em>debugger_command =
PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont;
echo where ; sleep 10 ) | /media/usbstick/usr/bin/gdb $daemon_directory/$process_name $process_id 2>&1
>$config_directory/$process_name.$process_id.log & sleep 5</em>
<strong>Burning CFLAGS</strong>
The result was a text file with a gdb backtrace was generated each time that the "local" postfix process blew chunks. Looking through these showed that each time it was dying when it was calling a function that lived in libdb4. This was a very strong hint that something was broken between the way libdb4 was compiled and the way the postfix app thought it was compiled.
I started looking through the db4 sources and saw some preprocessor code to catch a mismatch between the db4 include file versioning and that of the library. Then I realized where the problem was coming from... the libdb4 build was okay because it was using its own includes. But the Postfix build was getting db.h from the default, host path of /usr/include, so the db4 ABI in there reflected the version on the FC5 box, not the version that was going to be down on the ARM box. I added an -I to the CFLAGS on the postfix Makefile to direct it to look at the devel-filesystem path, where I had installed db-devel package with the matching db.h, and the problem was gone.
<strong>Dovecot</strong>
After I noticed I had accidentally set the /etc/postfix/main.cf to use Mbox "all in one file" format and converted the collected mail to Maildir and set it to use Maildir, the MTA side of the equation was complete. Mail was getting collected and stored on the USB memory stick in Maildir format. The next job was to set up Dovecot so I could access the Maildir contents over secure SSL IMAP.
Happily I had already packaged OpenSSL, and installed the -devel package into the host devel-filesystem. So all I needed to do was to turn off most everything else in the Dovecot ./configure action and make sure it was looking at the devel-filesystem includes and libraries. Dovecot pretty much worked first time, but it took a little messing to get the SSL cert generated cleanly. Then to my surprise Dovecot sat on ssl-build-param for a very long time, maybe 30 minutes, generating a 120 byte crypto signature of some kind. Afterwards it wrote the signature to disk as a cache and reads it, starting immediately. It seems at intervals (settable by the user in /etc/dovecot.conf) Dovecot wants to regenerate these very precious bytes for reasons I don't understand.
After this, I only had to configure /etc/dovecot.conf to listen for IMAP/S on port 993, add a user and password, and I was able to collect my mail securely in Thunderbird once again.
<strong>No shades of grey</strong>
The only thing missing is postgrey, which I am working on. But it requires perl, which is proving difficult to package in crosscompiled form, although it does have support of some sort for crosscompile. It seems to build miniperl and then use it as part of the build action, which is a dumb idea when the miniperl you just built is compiled to run on the target ARM arch, not the host.
Anyway, I do have a silent, low power mailserver up 24/7 with no moving parts now, which is very much where I wanted to end up.
Blog logic2006-07-09T00:00:00+08:00https://warmcat.com//2006/07/09/blog-logicI lean very heavily on Google during my long working day... most of the time the collected wisdom inside Google gets me out of whatever technical trouble I am, perhaps with a bit of headscratching and elbow grease on my part. Last week I was looking at a very specific problem that existed in a version of gcc when used with buildroot, and I found a mention of the problem from Rob Landley who runs busybox now. He had some kind of blog type thing going where he noted stuff, Google got ahold of it and presented it to people who where interested in that specific thing -- in this case, me. The post of his was about a year old, don't know if he kept it up or it fell into disrepeair as many of these ventures do, but I decided to try this style out.