Considerations for Updating the Bootloader Over-the-Air (OTA)
August 27, 2020
Story
The article will discuss the issue of updating the bootloader in a connected device. The principles discussed are true of any software system, we will specifically be discussing systems running linux.
Modern electronic devices are increasingly complex, and internet-connected. As a general rule, complexity runs counter to security, and insecure internet-connected devices are ripe for culprits to abuse. When designing these systems, we must assume that all software will have bugs, and some of those bugs will be exploitable vulnerabilities. The first step to addressing these issues is to ensure that software updates can be delivered to your systems, preferably automatically, and over-the-air (OTA). The European Union draft standard for “Cyber Security for Consumer Internet of Things: Baseline Requirements (ETSI EN 303 645),” specifically includes timely, automatic updates as one of its requirements. It does make an exception for an immutable first-stage boot loader to minimize the risk of leaving devices in the field in a non-booting state (also referred to as “bricking[1]” the board).
This article will discuss the issue of updating the bootloader in a connected device. Note that while the principles discussed here are true of any software system, we will specifically be discussing systems running Linux. Using smaller more custom-designed systems may give more options that are unique to those systems.
System Design
Figure 1 shows a generalized Linux system with the main components of interest for possible updates. The Storage Media will be some kind of block device, such as eMMC or a SATA hard drive. Within that device, there will be the bootloader, the kernel, the device tree (depending on the CPU in use) and a root filesystem containing all the files needed to build the system. More complicated architectures are used in some cases but for the purposes of this discussion, we will limit it to the simplest case.
System update utilities such as Mender[2], swupdate[3], and others are able to update the kernel, device tree and root filesystem out of the box and in many cases, this level of updatability is sufficient.
The bootloader is the component of the system that is responsible for initializing the system at power-on, starting with the CPU reset instruction. It is responsible for tasks such as:
? Initialize and scrub RAM
? Setup power rails and clocks
? Setup all peripherals to a known and quiescent state to avoid unexpected interrupts.
? Load and launch the Linux kernel
As discussed earlier, all software has bugs, so we can assume there will be bootloader bugs as well. We can reduce the attack surface by minimizing the functionality of the bootloader, however we can remove the risk of bugs completely. Why is updating the bootloader more complicated than updating the other components of the system? What is the risk if we try? What is the risk if we do not try?
Over-the-air Capable Systems
This block diagram shows the basic system design for a system capable of robust over-the-air (OTA) updates.[4] The bootloader is responsible for system initialization and interacting with the OTA client to select which kernel, device tree, and root filesystem to use. Robustness is provided by having full redundancy of the components needed for a running Linux image. This ensures that there is always a known-good image to roll back to in cases of a broken OTA update. Additionally, this ensures completely atomic updates since the update client is the only component in the system aware that an update is in process, until the update is completed and ready to run.
Any component that is updated OTA can potentially result in a non-functioning device, therefore the robustness of the system is directly tied to the capability of the bootloader to handle rolling back to a previously known-good configuration. The implication of this is that there must be a component in the system, that is immutable to properly handle bad updates.
Bootloader Updates
In most cases, the immutable component handling rollback _is_ the bootloader. In a typical Embedded Linux application is Das U-Boot[5]. If we try to update the bootloader, we are at risk of bricking our board since there is no redundancy. If the board power cycles after we have started writing the new bootloader image, but before the write has completed, then our image contains part of the old version and part of the new. The behavior in this case is undefined and the only mitigation is to be able to physically access the device in order to write a correct bootloader, typically using a USB or other hard-wired connection.
But why would we want to update the bootloader? At a bare minimum, the bootloader is simply used as a means to initialize the hardware and then pass control over to the Linux kernel. The risk of issues with the bootloader is minimized due to the limited functionality.
For many designs, this level of risk is acceptable, and architects can make the decision to simply not provide for OTA bootloader updates in their deployed devices. Using a hard-wired mechanism is still available as a last resort.
For many designs, however, this level of risk is deemed unacceptable and some mechanism must be provided for OTA updates of the bootloader. Additionally, many designs add more functionality into the bootloader; things such as system diagnostics or other application-specific requirements may be implemented in the bootloader, resulting in even more likelihood of needing updates. So how do we handle this?
Options for Providing Bootloader Updates
There are a number of options to allow updating the bootloader. This discussion is not intended to be a full solution, but rather a high-level description of approaches that may work for your design. Each has its tradeoffs.
Option 1: No Redundancy
If the risk of a bricked board is acceptable for a particular application, then you can simply try to deploy the bootloader updates OTA and just deal with the consequences when it happens. If the size of your fleet is small and the cost of getting physical access to the devices is low, then this may work well. If a bootloader update is needed, and an OTA attempt has failed, you are no worse off for having tried. The case of a failed OTA bootloader update is identical to the case of not having OTA bootloader update capability. I.e. you have to obtain physical access to the device and use the manufacturer provided mechanism for re-flashing the bootloader.
Option 2: Multi-Stage Bootloader
This architecture splits the bootloader functionality into two stages (or more depending on the complexity of your design). Ultimately this still requires an immutable piece of code in stage 1. You do have redundancy and robustness in updating stage 2 so if you choose where to implement functionality carefully, you can provide for OTA updates of bootloader functionality. This is a good option as the amount of code in the immutable stage 1 binary is reduced, resulting in lower overall risk.
U-Boot implements multi-stage booting using SPL (Secondary Program Loader) and TPL (Tertiary Program Loader). This mechanism was introduced to allow for support of systems with separate boot ROMs that were too small to store a full U-Boot image. In this case, the U-Boot SPL image will contain enough initialization code to load and launch the full U-Boot image, typically off of a large block device such as MMC. SPL will need to be able to initialize enough RAM and the device that contains the full U-Boot image.
Even for devices that do not have the limitation of a small boot ROM, we can take advantage of this architecture to implement our updatable functionality in stage 2, while leaving the bare minimum, including the proper handling of the redundant blocks, in stage 1.
There is a risk that there will be issues with stage 1, requiring physical access to address. Given the reduced functionality in stage 1, in many cases this level of risk is acceptable.
Option 3: Parallel Bootloaders
Many boards provide the capability to boot from multiple devices. For example, many boards can boot off of either an onboard eMMC or removable SD/MMC card. Alternatively, they may use a dedicated NOR flash device for the bootloader but still be able to run a bootloader out of the eMMC block media.
These kinds of boards can be configured to store an immutable bootloader in one of the supported devices and then store the OTA updatable bootloader in the other device. Typically the updatable bootloader will be in the same media (ie eMMC) as the root filesystem, making it quite easy to update. Since the bootloader in the “alternate” media is immutable, it can be relied on to recover from a broken OTA update of the bootloader in the “standard” location.
The issue with this approach is that the selection of the boot device usually requires physical access to the board to move a jumper or change a switch setting. If your devices are in locations where end users can access them, this may be a viable option as the end user can make the selection of the recovery media in the case of failures. This can be done either through documentation or as instructed by support staff.
Some systems use external hardware to select the bootloader. A small MCU running an RTOS can monitor for proper system activity and select the alternate bootloader in the case that the Linux system is not running. This can be tricky to detect properly using an external source but a watchdog timer toggling a GPIO pin or writing to shared memory may be sufficient. This is also a more complex design, which needs to be considered against your systems requirements. Note that you may need to consider OTA updates to the MCU firmware image which is yet another level of complexity.
Option 4: eMMC Boot Partitions
Version 4.3 of the eMMC[6] specification requires 2 separate hardware boot partitions. These partitions are typically 4MB each and intended to store bootloaders. These partitions are readable and writable from Linux user-space however they are available as read-only by default; read-write capability is enabled by writing to a file in the /sys pseudo-filesystem:
Bootloaders can then be written to these partitions using the dd utility
The partition used by the eMMC device as the boot block is determined by a parameter that is set within the device itself. This can be done from the U-Boot prompt:
Or from Linux user-space:
Mender’s Approach
Utilizing eMMC boot partitions, the updates to the partition are atomic and independent of the updates to the root filesystem. There is no automatic failover between the eMMC boot partitions so this does not mitigate the concerns of bricked devices due to a failed bootloader update. This does, however, make it easy to provide updates to just the bootloader without making any specific accommodations for the root filesystem.
Due to the risk of bricked boards and reduced robustness of the OTA update process when providing bootloader updates, Mender[7] does not provide out-of-the-box bootloader updates. As discussed, it is difficult to do in a generic fashion and will likely end up being quite application and hardware specific. The Mender Update Module framework[8] allows for custom update types to be supported by a plugin architecture. Any arbitrary payload type can be supported by Mender using a custom Update Module. This plugin architecture allows for custom scripts to be provided that handle specific payload types. Allowing for bootloader updates in a specific system can be implemented using Update Modules. Any of the above discussed approaches can be used depending on the needs of the application as well as the capabilities of the hardware in use.
Wrapping Up
There are many risks associated with uploading system bootloaders in field-deployed devices. A power failure at an inopportune time can leave devices bricked in the field, resulting in a potentially costly recall process. However, not providing a mechanism for bootloader updates can represent an unacceptable risk depending on the profile for a particular application. We presented a number of approaches for allowing bootloader updates with a discussion of the advantages and disadvantages of each. This will hopefully allow you, as a system designer, to make appropriate choices for your system, and help you get to market quickly with an appropriate understanding of the risks of your design.
About the Author
Drew Moseley is currently part of the Mender.io open source project to deploy OTA software updates to embedded Linux devices. He has worked on embedded projects such as RAID storage controllers, Direct and Network attached storage devices and graphical pagers.
He has spent the last 7 years working in Operating System Professional Services helping customers develop production embedded Linux systems. He has spent his career in embedded software and developer tools and has focused on Embedded Linux and Yocto for about 10 years.
Moseley has spoken at various conferences, including Embedded Linux Conference, Embedded Systems Conference, Texas Linux Fest, OSCON, and other technology conferences.
[2] https://mender.io: The author is currently employed working full time on this project.