Making Impish Impier

Fri 15 October 2021 (modified Tue 01 August 2023)

by Dave Jones

One of the things that constantly annoys me on Ubuntu (and therefore, one of the things I’m intending to fix this cycle) is how bloody long it takes for kernel upgrades to install! Most of this turns out to be down to two components (which … ahem … my team is responsible for): update-initramfs which spends nearly two minutes re-building the initrd.img for the boot partition, and flash-kernel which spends another two minutes copying everything to the boot partition.

Under (Com)pressure

At some point last week when I was testing a lot of different kernel bits (see the prior post for context!), this annoyed me to the point that I sat down to dig into it. The first bit update-initramfs was a bit tricky to analyze: there’s no really good means of profiling shell scripts sadly, so I wound up using the old PS4-calling-date trick and writing a quick Python script to analyze the results. Once this was done, however, one line stood out like a sore thumb; the compression of the resulting image.

Impish has switched to using zstd as its compression algorithm and, while this certainly compresses things better than gzip, it takes ages to run on an ARM processor. Specifically, even on an overclocked Pi 400 running at 2GHz it was taking 85 seconds (!) to compress the image. A quick change to /etc/initramfs-tools/initramfs.conf and we can chop that down to 10 seconds with lz4:

50 #
51 # COMPRESS: [ gzip | bzip2 | lz4 | lzma | lzop | xz | zstd ]
52 #
53 
54 #COMPRESS=zstd
55 COMPRESS=lz4

Admittedly the resulting image is much larger (~35MB with lz4 vs ~22MB with zstd) and that will affect boot speed. However, even at SD card read speeds that’s only one second lost per boot vs 75 seconds per kernel install (and lately I’ve been doing about one kernel install per boot!). Anyway, consider whether you do more or less than 75 boots per kernel install, and adjust accordingly!

Flash, ahh ahh!

On Ubuntu (and for that matter, Debian), the flash-kernel tool is charged with installing the bootloader, device-tree, kernel, and initrd (basically everything needed to get the system started) to … wherever they need to be on a given board. Sometimes that’s some NVRAM, sometimes a special ext4 partition, sometimes (as in the Pi’s case) that’s a FAT partition on some storage medium.

The Pi is quite an unusual case here inasmuch as its root storage is fully expected to go walkies (such as when upgrading an SD card on an old Pi and moving it to a new Pi model). Due to this, flash-kernel (in Ubuntu) copies the device-tree files for all Pi models (not just the one it finds itself on) to the boot partition every time the kernel is updated (along with everything else it usually copies). Unfortunately, the flash-kernel function that handles each transfer is quite slow and has a lot of overhead.

For most boards, this doesn’t matter as they copy (maybe) five files: a bootloader, a kernel, an initrd, a device-tree, and possibly some firmware or other scripts, or more often a single file which is some amalgam of the aforementioned pieces. But for the Pi there’s well over 200 individual files to copy and the result is a major slow-down.

Once I dug into the code the cause was pretty obvious. In the /usr/share/flash-kernel/functions script, the backup_and_install function on line 642 is the function we’re interested in:

642backup_and_install() {
643    local source="$1"
644    local dest="$2"
645    local do_dot_bak=$(get_dot_bak_preference)
646    local mtd_backup_dir=$(get_mtd_backup_dir)
647    if [ -e "$dest" ]; then
648        if [ -n "$do_dot_bak" ]; then
649            echo "Taking backup of $(basename "$dest")." >&2
650            mv "$dest" "$dest.bak"
651        else
652            echo "Skipping backup of $(basename "$dest")." >&2
653        fi
654    fi
655    # If we are installing to a filesystem which is not normally mounted
656    # then take a second copy in /var/backups, where they can e.g. be
657    # backed up.
658    if [ -n "$boot_mnt_dir" ] && [ -n "$mtd_backup_dir" ] ; then
659        local bak="$mtd_backup_dir/"$(basename "$dest")
660        #echo "Saving $boot_device:"$(basename "$source")" in $bak"
661        mkdir -p "$mtd_backup_dir"
662        cp "$source" "$bak"
663    fi
664    echo "Installing new $(basename "$dest")." >&2
665    mv "$source" "$dest"
666    maybe_defrag "$dest"
667}

Yes, it’s convoluted, but it needs to be for certain of the copying cases, and mostly it’s pretty quick. However, it’s the last line that’s interesting. That leads us to the “maybe_defrag” function on line 471:

471maybe_defrag() {
472    local file="$1"
473    local field="Bootloader-Has-Broken-Ext4-Extent-Support"
474    local broken_fw
475
476    if ! broken_fw=$(get_machine_field "$machine" "$field"); then
477        return
478    fi
479    if [ "$broken_fw" != "yes" ]; then
480        return
481    fi
482    if [ "$(df --output=fstype ${file} | sed -e 1d)" != "ext4" ]; then
483        return
484    fi
485    if ! command -v e4defrag > /dev/null; then
486        error "e4defrag command not found, unable to defrag $file"
487    fi
488    if ! e4defrag "$file" > /dev/null 2>&1; then
489        error "e4defrag of $file failed. Try freeing up space in /boot and re-executing flash-kernel"
490    fi
491}

From working on flash-kernel in the past, I know that “get_machine_field” calls are quite expensive (it’s essentially using shell-script to parse a text-file database). That’s why there’s a ton of calls caching these lookups around line 1008. Ultimately the “proper” fix is to cache the result of that call too. However, on the Pi the “maybe_defrag” function never does anything (and will never do anything) anyway, so a quick hack is to simply comment out that final line in backup_and_install:

642backup_and_install() {
643    local source="$1"
644    local dest="$2"
645    local do_dot_bak=$(get_dot_bak_preference)
646    local mtd_backup_dir=$(get_mtd_backup_dir)
647    if [ -e "$dest" ]; then
648        if [ -n "$do_dot_bak" ]; then
649            echo "Taking backup of $(basename "$dest")." >&2
650            mv "$dest" "$dest.bak"
651        else
652            echo "Skipping backup of $(basename "$dest")." >&2
653        fi
654    fi
655    # If we are installing to a filesystem which is not normally mounted
656    # then take a second copy in /var/backups, where they can e.g. be
657    # backed up.
658    if [ -n "$boot_mnt_dir" ] && [ -n "$mtd_backup_dir" ] ; then
659        local bak="$mtd_backup_dir/"$(basename "$dest")
660        #echo "Saving $boot_device:"$(basename "$source")" in $bak"
661        mkdir -p "$mtd_backup_dir"
662        cp "$source" "$bak"
663    fi
664    echo "Installing new $(basename "$dest")." >&2
665    mv "$source" "$dest"
666    #maybe_defrag "$dest"
667}

Once I’d put these two changes in place, I found that kernel installs went from taking more than 4 minutes, to slightly less than 2. Okay, still not fantastic but I am operating on an SD card, and a halving of the time for two trivial changes is not bad!

I’ll get the “proper” fixes in place during the next development cycle.