It's Pi all the way down...

Making Impish Impier

Fri 15 October 2021
by Dave Jones

One of the things that constantly annoys me on Ubuntu (and therefore, one of the things I’m intending to fix this cycle) is how bloody long it takes for kernel upgrades to install! Most of this turns out to be down to two components (which … ahem … my team is responsible for): update-initramfs which spends nearly two minutes re-building the initrd.img for the boot partition, and flash-kernel which spends another two minutes copying everything to the boot partition.

Under (Com)pressure

At some point last week when I was testing a lot of different kernel bits (see the prior post for context!), this annoyed me to the point that I sat down to dig into it. The first bit update-initramfs was a bit tricky to analyze: there’s no really good means of profiling shell scripts sadly, so I wound up using the old PS4-calling-date trick and writing a quick Python script to analyze the results. Once this was done, however, one line stood out like a sore thumb; the compression of the resulting image.

Impish has switched to using zstd as its compression algorithm and, while this certainly compresses things better than gzip, it takes ages to run on an ARM processor. Specifically, even on an overclocked Pi 400 running at 2GHz it was taking 85 seconds (!) to compress the image. A quick change to /etc/initramfs-tools/initramfs.conf and we can chop that down to 10 seconds with lz4:

50 #
51 # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
52 #
53 
54 #COMPRESS=zstd
55 COMPRESS=lz4

Admittedly the resulting image is much larger (~35MB with lz4 vs ~22MB with zstd) and that will affect boot speed. However, even at SD card read speeds that’s only one second lost per boot vs 75 seconds per kernel install (and lately I’ve been doing about one kernel install per boot!). Anyway, consider whether you do more or less than 75 boots per kernel install, and adjust accordingly!

Flash, ahh ahh!

On Ubuntu (and for that matter, Debian), the flash-kernel tool is charged with installing the bootloader, device-tree, kernel, and initrd (basically everything needed to get the system started) to … wherever they need to be on a given board. Sometimes that’s some NVRAM, sometimes a special ext4 partition, sometimes (as in the Pi’s case) that’s a FAT partition on some storage medium.

The Pi is quite an unusual case here inasmuch as its root storage is fully expected to go walkies (such as when upgrading an SD card on an old Pi and moving it to a new Pi model). Due to this, flash-kernel (in Ubuntu) copies the device-tree files for all Pi models (not just the one it finds itself on) to the boot partition every time the kernel is updated (along with everything else it usually copies). Unfortunately, the flash-kernel function that handles each transfer is quite slow and has a lot of overhead.

For most boards, this doesn’t matter as they copy (maybe) five files: a bootloader, a kernel, an initrd, a device-tree, and possibly some firmware or other scripts, or more often a single file which is some amalgam of the aforementioned pieces. But for the Pi there’s well over 200 individual files to copy and the result is a major slow-down.

Once I dug into the code the cause was pretty obvious. In the /usr/share/flash-kernel/functions script, the backup_and_install function on line 642 is the function we’re interested in:

642 backup_and_install() {
643     local source="$1"
644     local dest="$2"
645     local do_dot_bak=$(get_dot_bak_preference)
646     local mtd_backup_dir=$(get_mtd_backup_dir)
647     if [ -e "$dest" ]; then
648         if [ -n "$do_dot_bak" ]; then
649             echo "Taking backup of $(basename "$dest")." >&2
650             mv "$dest" "$dest.bak"
651         else
652             echo "Skipping backup of $(basename "$dest")." >&2
653         fi
654     fi
655     # If we are installing to a filesystem which is not normally mounted
656     # then take a second copy in /var/backups, where they can e.g. be
657     # backed up.
658     if [ -n "$boot_mnt_dir" ] && [ -n "$mtd_backup_dir" ] ; then
659         local bak="$mtd_backup_dir/"$(basename "$dest")
660         #echo "Saving $boot_device:"$(basename "$source")" in $bak"
661         mkdir -p "$mtd_backup_dir"
662         cp "$source" "$bak"
663     fi
664     echo "Installing new $(basename "$dest")." >&2
665     mv "$source" "$dest"
666     maybe_defrag "$dest"
667 }

Yes, it’s convoluted, but it needs to be for certain of the copying cases, and mostly it’s pretty quick. However, it’s the last line that’s interesting. That leads us to the “maybe_defrag” function on line 471:

471 maybe_defrag() {
472     local file="$1"
473     local field="Bootloader-Has-Broken-Ext4-Extent-Support"
474     local broken_fw
475 
476     if ! broken_fw=$(get_machine_field "$machine" "$field"); then
477         return
478     fi
479     if [ "$broken_fw" != "yes" ]; then
480         return
481     fi
482     if [ "$(df --output=fstype ${file} | sed -e 1d)" != "ext4" ]; then
483         return
484     fi
485     if ! command -v e4defrag > /dev/null; then
486         error "e4defrag command not found, unable to defrag $file"
487     fi
488     if ! e4defrag "$file" > /dev/null 2>&1; then
489         error "e4defrag of $file failed. Try freeing up space in /boot and re-executing flash-kernel"
490     fi
491 }

From working on flash-kernel in the past, I know that “get_machine_field” calls are quite expensive (it’s essentially using shell-script to parse a text-file database). That’s why there’s a ton of calls caching these lookups around line 1008. Ultimately the “proper” fix is to cache the result of that call too. However, on the Pi the “maybe_defrag” function never does anything (and will never do anything) anyway, so a quick hack is to simply comment out that final line in backup_and_install:

642 backup_and_install() {
643     local source="$1"
644     local dest="$2"
645     local do_dot_bak=$(get_dot_bak_preference)
646     local mtd_backup_dir=$(get_mtd_backup_dir)
647     if [ -e "$dest" ]; then
648         if [ -n "$do_dot_bak" ]; then
649             echo "Taking backup of $(basename "$dest")." >&2
650             mv "$dest" "$dest.bak"
651         else
652             echo "Skipping backup of $(basename "$dest")." >&2
653         fi
654     fi
655     # If we are installing to a filesystem which is not normally mounted
656     # then take a second copy in /var/backups, where they can e.g. be
657     # backed up.
658     if [ -n "$boot_mnt_dir" ] && [ -n "$mtd_backup_dir" ] ; then
659         local bak="$mtd_backup_dir/"$(basename "$dest")
660         #echo "Saving $boot_device:"$(basename "$source")" in $bak"
661         mkdir -p "$mtd_backup_dir"
662         cp "$source" "$bak"
663     fi
664     echo "Installing new $(basename "$dest")." >&2
665     mv "$source" "$dest"
666     #maybe_defrag "$dest"
667 }

Once I’d put these two changes in place, I found that kernel installs went from taking more than 4 minutes, to slightly less than 2. Okay, still not fantastic but I am operating on an SD card, and a halving of the time for two trivial changes is not bad!

I’ll get the “proper” fixes in place during the next development cycle.