It's Pi all the way down...

by

Last time we set up an NBD server, and the associated DHCP machinery to netboot our Pi. We then discovered a rather serious problem: when the Pi updated the content of the boot partition, the TFTP wouldn’t notice and would serve old files because the mount of the boot partition on the server didn’t realize it was “shared” with the NBD server.

Come and take a chance with me?

As mentioned in the last post, in trying to solve this I ventured through a myriad of hacky or simple solutions, all attempting to work around the “caching” of pages by the file-system driver on the server side. Yet again, caching rears its head as one of the two “hard problems” of computer science [1], though the root of the issue here is the assumption that the file-system driver has exclusive access to the underlying block device (the cache is merely a consequence of this).

In hindsight, I should’ve looked at the pieces and concluded what I eventually did: just write your own TFTP server! One which can read files directly out of the boot partition of the OS image. This solution sounds like a sledgehammer to crack a nut, but actually it’s less complex than it seems. All it’s got to handle is simple partitioning (MBR or GPT), reading a FAT file-system , and TFTP.

I may not be selling this as “simple”, but let’s break down what each of these actually entails:

MBR
Master Boot Record partitioning is an ancient method of partitioning a floppy disk hard drive USB stick small storage medium, but while it’s very difficult to accurately write (in such a way that it’ll actually … boot things), it’s not hard to read (other than the minor mess that is logical partitioning, intended to get around the original 4-partition limit). A few hours work. Maybe.
GPT
GUID Partition Tables are the modern, and frankly trivial (by comparison to MBR) means of partitioning storage media. No more than an hour needed here.
FAT

File Allocation Tables are another positively ancient piece of computing history, used for storing files on just about every piece of computing media with the possible exception of the tape drive (and I wouldn’t put it past someone to have tried that). The file-system isn’t complicated in basis (in fact I’ve written a library for something very similar before [2]); the complexity comes from the myriad implementations that use the same structures, but disagree subtly (and not so subtly) about the meaning or even position of certain fields within them.

Still, a few evenings to get the basics done, then several more to work out the rough edges (FAT-12, long filename handling, etc. etc.), then a few more to build a nice Path-like API on top of it, and we’re good!

TFTP
Trivial File Transfer Protocol is, tautologically, trivial. Getting the basic protocol implemented took only a couple of hours (including testing with several implementations). Then it was onto the extensions; I wound up only bothering with those that the Pi bootloader attempts to negotiate and one more that’s useful for testing (tsize, blocksize, and timeout) though hopefully I’ve structured things suitably for a future addition of the windowsize option. Still, only a few more hours on that.

All in all, despite a few headaches (mostly around FAT’s obscure history), it wasn’t difficult to throw all this together, stir it around a bit, and produce what I needed (would’ve been nice to make it into a “learn Rust” project, but I was already waaay overdue on this post series, so I wound up going with my comfortable old Python).

The result is nobodd: a TFTP boot-server that will read files directly out of the FAT partition of an image without mounting it.

NBD! NBD!

Let’s walk through a set up using nobodd and nbd-server. The client side is exactly the same as before so I’ll just refer you to the start of the last post for that part. Go through requirements (a Pi and a server), configuring the Pi for netboot, adding linux-modules-extra- and nbd-client, and transferring the regenerated initramfs and identity to the server, and shutting down the Pi [3].

Now, on the server we’re going to do things a bit differently. This time we’re going to configure a “template” image that we can copy to “instance” images any time we want a new install. And this time we’re going to use the fancy little nobodd-prep tool from nobodd to do it.

Start as we did before by downloading the image and verifying it:

$ sudo -i
Password:
# mkdir /srv/images
# cd /srv/images
# wget http://cdimage.ubuntu.com/releases/22.04.3/release/ubuntu-22.04.3-preinstalled-server-arm64+raspi.img.xz
 ...
# wget http://cdimage.ubuntu.com/releases/22.04.3/release/SHA256SUMS
 ...
# sha256sum --check --ignore-missing SHA256SUMS
ubuntu-22.04.3-preinstalled-server-arm64+raspi.img.xz: OK
# rm SHA256SUMS

Now, we simply use nobodd-prep to customize it. This tool can be used to resize the image, re-write the cmdline.txt parameters, and copy files onto the boot partition (including the customized initrd.img and a cloud-init user-data file), all in one go. It can only operate on uncompressed images though, so we still need to do that:

# unxz ubuntu-22.04.3-preinstalled-server-arm64+raspi.img.xz
# mv ubuntu-22.04.3-preinstalled-server-arm64+raspi.img jammy-template.img
# apt install nobodd-tools
 ...
# cat << EOF >> user-data
package_update: true
packages:
- avahi-daemon
- nbd-client
- linux-modules-extra-raspi
# cp jammy-template.img jammy.img
# nobodd-prep --copy user-data --copy initrd.img --size 16GB jammy.img

And that’s it! My hope is that, in noble (24.04) most of these steps will not be necessary. In that release, it should be possible to download the image unpack it, give it a nice name, run nobodd-prep to customize it, and go from there.

Hi, Future Dave here! Well, the MIR for nbd-client, which I had hoped would be a simple affair, erm, wasn’t [4]. It didn’t make it into noble (24.04), or oracular (24.10). It might make it into plucky (25.04), thanks largely to the efforts of my new side-kick [5]. But this update is so overdue, I’m just going to post it anyway, and hope it’ll be of use to Future You if/when the relevant things finally make it into the image.

—Future Dave

Next, we’ll install and configure the required daemons. As before, we’ll use dnsmasq as a DHCP proxy, and nbd-server. However, this time instead of dnsmasq also handling TFTP duties, we’ll use nobodd-tftpd for that. First up, install the packages:

# apt install dnsmasq nbd-server nobodd-tftpd

Next, configure dnsmasq:

# cat << EOF >> /etc/dnsmasq.conf
interface=eth0
bind-interfaces
dhcp-range=192.168.255.255,proxy
pxe-service=0,"Raspberry Pi Boot"
# systemctl restart dnsmasq

Note

Adjust the reference to eth0 if your Ethernet NIC is named something else. If your network’s mask is not 192.168.255.255, adjust this accordingly.

Now the NBD server:

# chown nbd:nbd jammy.img
# ls -lh jammy.img
-rw-r--r-- 1 nbd  nbd 16.0G Dec  3 13:48 jammy.img
# cat << EOF >> /etc/nbd-server/conf.d/jammy.conf
[jammy]
exportname = /srv/images/jammy.img
EOF
# systemctl restart nbd-server

And finally the TFTP server:

# cat ~ubuntu/pi-ident.txt
Serial          : 1000000089025d75
# piserial=$(sed -e '1s/^Serial.*\([0-9a-f]\{8\}\)$/\1/' ~ubuntu/pi-ident.txt)
# echo $piserial
89025d75
# cat << EOF >> /etc/nobodd/conf.d/jammy.conf
[board:$piserial]
image = /srv/images/jammy.img
partition = 1
EOF
# systemctl reload nobodd-tftpd

The final config piece (/etc/nobodd/conf.d/jammy.conf) ties the Pi’s board serial number to the image that we want to actually boot. The cmdline.txt on the boot partition of that image needs to point to the NBD server serving that same image. This was actually handled implicitly by the nobodd-prep command above. It assumes:

  • The NBD server is the machine it is running on
  • The NBD share will be named after the stem of the image’s filename (in other words it transforms “jammy.img” to “jammy”)
  • The root partition is the first non-FAT-type partition in the image (partition 2 in this case)

Hence, in this case it will have automatically inserted the following in the cmdline.txt of the image:

ip=dhcp nbdroot=ubuntu/jammy root=/dev/nbd0p2 ...

When the end comes, I know

At this point, you should be ready to go. Power up the Pi, and if everything is working correctly, it should netboot into your jammy image, applying the custom cloud-init along the way.

How do you add more boards? Now you’ve got your template image, the process should be as simple as:

  • Find your next board’s serial number
  • Create another copy of jammy-template.img
  • Customize it with nobodd-prep
  • Add the NBD server configuration, reload the NBD server
  • Add the TFTPD server configuration, reload the TFTPD server

It’s worth noting that nobodd-prep is technically capable of generating the required configurations for the last two items. However, I’m a bit unhappy with the design of it at the moment. It’s become a horribly convoluted “big” tool, and I think things would be much more flexible if it was broken up into a series of much simpler tools that emulate the functionality of cat, cp, rm, ls, and providing something that uses those pieces to achieve the customization of the current tool.

Another aspect of this setup worth noting is that we haven’t used mount anywhere. Neither nbd-server nor nobodd-tftpd require mounts or loop devices to operate. Why is this important? Because now this entire set up can run in an unprivileged container, which makes testing things much simpler. There are still hoops to jump through [6] if you want to go this route, which are beyond the scope of this post, but it is at least possible.


[1]Alongside naming things and off-by-one errors …
[2]I’ve written stuff very similar to a FAT file system driver in Python before.
[3]In other words, follow the last post until “Why’d you have to be so good?”. Incidentally, I make no apologies for the musical puns in my titles!
[4]After several years in Foundations, you’d think I’d have learned by now, wouldn’t you? If it looks hard, it is. If it looks easy, it isn’t.
[5]Yes! It’s not just me anymore! Say “hi” to r41k0u who’s also doing some sterling work on the camera stack this cycle.
[6]Specifically this involves getting your container to appear directly on your network, rather than hiding behind NAT, so that it can proxy DHCP requests