Running QEMU-ARM Debian Guests with libvirt

Like most embedded devices on the market our Blickwerk sensors are ARM-based and is powered by a NXP (formerly Freescale Semiconductor) i.MX283 processor. ARM cores are great for low-powered devices or devices under thermal constraints but they are not exactly powerful. Even though we try to utilize cross-compile toolchains whenever and whereever we can, we sometimes need to resort to an actual ARM environment for building libraries and applications.

Until now we’ve used actual hardware for this namely an ODROID-XU4 and a Banana PI BPI-M3 because they have relatively large amounts of RAM and enough processing power to complete builds in a reasonable amount of time. Each of these SBCs is running a Buildbot worker.

With the release of Debian Buster we wanted to revisit the potential of ARM virtualization with QEMU. Peter Maydell wrote an excellent article on the virt board in QEMU back in 2016, which doesn’t try to emulate specific ARM hardware like the often used versatilepb board and therefore features large amounts of RAM and a configurable amount of CPUs. The article was a great starting point for our endeavor but one of the stumbling blocks was proper libvirt integration. libvirt is a virtualization daemon and API that defines some abstraction around common virtualization or process isolation techniques like LXC, KVM or, XEN, which we use basically everywhere.

From QEMU arguments to libvirt XML

Peter Maydell’s article closes with the following qemu-system-arm call:

qemu-system-arm -M virt -m 1024 \
  -kernel vmlinuz-3.16.0-4-armmp-lpae \
  -initrd initrd.img-3.16.0-4-armmp-lpae \
  -append 'root=/dev/vda2' \
  -drive if=none,file=hda.qcow2,format=qcow2,id=hd \
  -device virtio-blk-device,drive=hd \
  -netdev user,id=mynet \
  -device virtio-net-device,netdev=mynet \
  -nographic

The specific problem we’ve encountered was to translate the -device arguments into something that libvirt understands. virsh, the command line management tool that comes with libvirt, even has a domxml-from-native command that was designed to convert qemu arguments to the appropriate libvirt format. Unfortunately it isn’t exactly well maintained as outlined in this mail from Cole Robinson. Our second approach was to start the machine with the known working qemu arguments and see what bus and drivers are used inside the system in order to replicate the environment in libvirt. And et voilá: the crucial hint is exposed by the /dev/disk/by-path/ directory exposing the shortest physical path to disks in the system. In our case a simple call to ls /dev/disk/by-path/ reveals the following output:

platform-a003c00.virtio_mmio  platform-a003e00.virtio_mmio

virtio-mmio is a valid type attribute in the documentation for the <address/> element of each <device> element. In contrast to that libvirt generates pci address types by default which don’t seem to be supported in current versions of Debian. In the end the only thing we needed to do in order to convert the default libvirt configuration for our ARM host to something that actually works was to replace all <address type="pci" .../> lines with <address type="virtio-mmio"/>. This worked for both drives and network devices.

This is how our working libvirt configuration looks like:

<domain type='qemu' id='38'>
 <name>usain</name>
 <uuid>3bf6e58f-e513-47b4-9b64-e00b32d9d9f4</uuid>
 <memory unit='KiB'>3145728</memory>
 <currentMemory unit='KiB'>3145728</currentMemory>
 <vcpu placement='static'>3</vcpu>
 <resource>
 <partition>/machine</partition>
 </resource>
 <os>
 <type arch='armv7l' machine='virt-3.1'>hvm</type>
 <kernel>/var/lib/libvirt/boot/usain/vmlinuz</kernel>
 <initrd>/var/lib/libvirt/boot/usain/initrd.img</initrd>
 <cmdline>root=UUID=7a7f1855-2536-4342-a481-4853a125560f</cmdline>
 <boot dev='hd'/>
 </os>
 <features>
 <gic version='2'/>
 </features>
 <clock offset='utc'/>
 <on_poweroff>destroy</on_poweroff>
 <on_reboot>restart</on_reboot>
 <on_crash>restart</on_crash>
 <devices>
 <emulator>/usr/bin/qemu-system-arm</emulator>
 <disk type='block' device='disk'>
 <driver name='qemu' type='raw'/>
 <source dev='/dev/lvm-uhura/usain-boot'/>
 <backingStore/>
 <target dev='vda' bus='virtio'/>
 <alias name='virtio-disk0'/>
 <address type='virtio-mmio'/>
 </disk>
 <disk type='block' device='disk'>
 <driver name='qemu' type='raw'/>
 <source dev='/dev/lvm-uhura/usain-root'/>
 <backingStore/>
 <target dev='vdb' bus='virtio'/>
 <alias name='virtio-disk1'/>
 <address type='virtio-mmio'/>
 </disk>
 <controller type='pci' index='0' model='pcie-root'>
 <alias name='pcie.0'/>
 </controller>
 <interface type='bridge'>
 <mac address='52:54:00:79:39:16'/>
 <source bridge='br-virt'/>
 <target dev='vnet0'/>
 <model type='virtio'/>
 <alias name='net0'/>
 <address type='virtio-mmio'/>
 </interface>
 <serial type='pty'>
 <source path='/dev/pts/1'/>
 <target type='system-serial' port='0'>
 <model name='pl011'/>
 </target>
 <alias name='serial0'/>
 </serial>
 <console type='pty' tty='/dev/pts/1'>
 <source path='/dev/pts/1'/>
 <target type='serial' port='0'/>
 <alias name='serial0'/>
 </console>
 </devices>
 <seclabel type='dynamic' model='apparmor' relabel='yes'>
 <label>libvirt-3bf6e58f-e513-47b4-9b64-e00b32d9d9f4</label>
 <imagelabel>libvirt-3bf6e58f-e513-47b4-9b64-e00b32d9d9f4</imagelabel>
 </seclabel>
 <seclabel type='dynamic' model='dac' relabel='yes'>
 <label>+64055:+64055</label>
 <imagelabel>+64055:+64055</imagelabel>
 </seclabel>
</domain>

Automatic Kernel Updates

Another complication with ARM virtualization is that we currently have to use the direct kernel boot feature in order to start the system. libvirt supports direct kernel boot out of the box but the kernel and initrd must be accessible from within the virtualization host’s filesystem. This boils down to a simple problem: the kernel and initrd are updated in the virtual machine but are booted from the outside. So whenever either one is updated, we would have to mount the boot partition and copy them somewhere where libvirt can access them.

Fortunately libvirt also supports hooks that are executed whenever a qemu guest is started. We usually use LVM as storage backend for libvirt and each ARM guest has a $name-boot and $name-root volume associated with it. So whenever we start an ARM guest we can automatically mount the appropriate LVM volume, copy the kernel and initrd and let libvirt handle the rest. This works great so far and lightens our maintenance burden when it comes to automatic system updates of our ARM guests.

This is the hook we use for that:

#!/bin/sh

set -eu

GUEST="$1"
ACTION="$2"
PHASE="$3"

BOOT_IMAGE_BASE_PATH=/var/lib/libvirt/boot

_is_mounted() {
    grep -qwF "$1" /proc/mounts
}

_is_host_running() {
    # calling virsh domstate here will cause the process to hang so we use ps instead
    ps --no-headers -u libvirt-qemu -o cmd | grep -q -- "-name guest=$1"
}

_get_boot_volume() {
    local guest="$1"
    local configured_volume
    local dm_path

    # looks for a volume whose name ends with "-boot"
    configured_volume="$(
        xmllint --xpath 'string(/domain/devices/disk[@type="block"]/source[substring(@dev, string-length(@dev) - string-length("-boot") + 1) = "-boot"]/@dev)' \
            "/etc/libvirt/qemu/$guest.xml"
    )"

    # the configured volume might contain any path that refers to a volume but /proc/mounts
    # will contain paths from /dev/mapper so we need to find the path of the actual devices
    # and then find the corresponding symlink in /dev/mapper
    dm_path="$(realpath "$configured_volume")"
    find /dev/mapper -type l | while read -r mapper_path; do
        if [ "$(readlink -f "$mapper_path")" = "$dm_path" ]; then
            echo "$mapper_path"
            break
        fi
    done
}

update_guest_kernel_and_initrd() {
    # ARM hosts cannot be booted like any x64_64 host.
    # Instead we need libvirt to boot the kernel directly along with the guest’s generated initrd.
    # We update the kernel and initrd on every guest startup, so that a system update will behave
    # as expected on the next reboot.
    local guest="$1"
    local boot_image_path="$BOOT_IMAGE_BASE_PATH/$guest"
    local boot_volume
    local tmp_mount_path
    boot_volume="${2:-$(_get_boot_volume "$guest")}"

    if [ ! -z "$boot_volume" ]; then
        echo "Boot volume for guest $guest not found." >&2
        return 1
    fi

    if [ ! -e "$boot_volume" ]; then
        echo "Boot volume for guest $guest does not exist in '$boot_volume'. Cannot extract kernel and initrd." >&2
        return 1
    fi

    if _is_host_running "$guest"; then
        # this should not happen, but maybe someone is calling this script manually
        echo "Guest $guest is not shut down. Refusing to mount volumes." >&2
        return 1
    fi

    if _is_mounted "$boot_volume"; then
        echo "Boot volume '$boot_volume' is already mounted in the system. Mounting the volume twice may cause data loss." >&2
        return 1
    fi

    mkdir -p --mode 750 "$boot_image_path"
    chgrp libvirt-qemu "$boot_image_path"
    tmp_mount_path="$(mktemp -d)"
    trap "umount '$tmp_mount_path'; rmdir '$tmp_mount_path'" EXIT
    mount -o ro "$boot_volume" "$tmp_mount_path"
    cp "$tmp_mount_path/vmlinuz" "$tmp_mount_path/initrd.img" "$boot_image_path/"
}

if [ "$ACTION" = prepare ] && [ "$PHASE" = begin ]; then
    # kernel and initrd of guests that use ARM emulation should be updated before being started
    if grep -qwF /usr/bin/qemu-system-arm "/etc/libvirt/qemu/$GUEST.xml"; then
        update_guest_kernel_and_initrd "$GUEST"
    fi
fi

Conclusion

ARM virtualization works great on Debian Buster and even though the performance is not on par with actual ARM hardware it’s fast enough to support our use cases. Virtualized environments are much easier to scale, manage and replicate compared to the hardware boards we have used earlier that often rely on ancient kernels due to non-free firmware.