A/B System Updates

IN THIS DOCUMENT

Overview
1. Bootloader state examples
2. Update Engine features
Life of an A/B update
1. Post-install step
Implementation
1. Kernel patches
2. Kernel command line arguments
3. Recovery
4. Build variables
5. Partitions
6. Fstab
7. Kernel slot arguments
8. OTA package generation
Configuration
1. Partitions
2. Post-install
3. App compilation in background

A/B system updates ensure a workable booting system remains on the disk during an over-the-air (OTA) update. This reduces the likelihood of an inactive device afterward, which means less device replacements and device reflashes at repair/warranty centers.

Customers can continue to use their devices during an OTA. The only downtime during an update is when the device reboots into the updated disk partition. If the OTA fails, the device is still useable since it will boot into the pre-OTA disk partition. The download of the OTA can be attempted again. A/B system updates implemented through OTA are recommended for new devices only.

A/B system updates affect:

Interactions with the bootloader
Partition selection
The build process
OTA update package generation

The existing dm-verity feature guarantees the device will boot an uncorrupted image. If a device doesn't boot, because of a bad OTA or dm-verity issue, the device can reboot into an old image.

The A/B system is robust because any errors (such as I/O errors) affect only the unused partition set and can be retried. Such errors also become less likely because the I/O load is deliberately low to avoid degrading the user experience.

OTA updates can occur while the system is running, without interrupting the user. This includes the app optimizations that occur after a reboot. Additionally, the cache partition is no longer used to store OTA update packages; there is no need for sizing the cache partition.

Overview

A/B system updates use a background daemon called update_engine and two sets of partitions. The two sets of partitions are referred to as slots, normally as slot A and slot B. The system runs from one slot, the current slot, while the partitions in the unused slot are not accessed by the running system (for normal operation).

The goal of this feature is to make updates fault resistant by keeping the unused slot as a fallback. If there is an error during an update or immediately after an update, the system can rollback to the old slot and continue to have a working system. To achieve this goal, none of the partitions used by the current slot should be updated as part of the OTA update (including partitions for which there is only one copy).

Each slot has a bootable attribute, which states whether the slot contains a correct system from which the device can boot. The current slot is clearly bootable when the system is running, but the other slot may have an old (still correct) version of the system, a newer version, or invalid data. Regardless of what the current slot is, there is one slot which is the active or preferred slot. The active slot is the one the bootloader will boot from on the next boot. Finally, each slot has a successful attribute set by the user space, which is only relevant if the slot is also bootable.

A successful slot should be able to boot, run, and update itself. A bootable slot that was not marked as successful (after several attempts were made to boot from it) should be marked as unbootable by the bootloader, including changing the active slot to another bootable slot (normally to the slot running right before the attempt to boot into the new, active one). The specific details of the interface are defined in boot_control.h.

Bootloader state examples

The boot_control HAL is used by update_engine (and possibly other daemons) to instruct the bootloader what to boot from. These are common example scenarios and their associated states:

Normal case: The system is running from its current slot, either slot A or B. No updates have been applied so far. The system's current slot is bootable, successful, and the active slot.
Update in progress: The system is running from slot B, so slot B is the bootable, successful, and active slot. Slot A was marked as unbootable since the contents of slot A are being updated but not yet completed. A reboot in this state should continue booting from slot B.
Update applied, reboot pending: The system is running from slot B, slot B is bootable and successful, but slot A was marked as active (and therefore is marked as bootable). Slot A is not yet marked as successful and some number of attempts to boot from slot A should be made by the bootloader.
System rebooted into new update: The system is running from slot A for the first time, slot B is still bootable and successful while slot A is only bootable, and still active but not successful. A user space daemon should mark slot A as successful after some checks are made.

Update Engine features

The update_engine daemon runs in the background and prepares the system to boot into a new, updated version. The update_engine daemon is not involved in the boot process itself and is limited in what it can do during an update. The update_engine daemon can do the following:

Read from the current slot A/B partitions and write any data to the unused slot A/B partitions as instructed by the OTA package
Call the boot_control interface in a pre-defined workflow
Run a post-install program from the new partition after writing all the unused slot partitions, as instructed by the OTA package

The post-install step is described in detail below. Note that the update_engine daemon is limited by the SELinuxpolicies and features in the current slot; those policies and features can't be updated until the system boots into a new version. To achieve a robustness goal, the update process should not:

Modify the partition table
Modify the contents of partitions in the current slot
Modify the contents of non-A/B partitions that can't be wiped with a factory reset

Life of an A/B update

The update process starts when an OTA package, referred to in code as a payload, is available for downloading. Policies in the device may defer the payload download and application based on battery level, user activity, whether it is connected to a charger, or other policies. But since the update runs in the background, the user might not know that an update is in progress and the process can be interrupted at any point due to policies or unexpected reboots.

The steps in the update process after a payload is available are as follows:

Step 1: The current slot (or "source slot") is marked as successful (if not already marked) withmarkBootSuccessful().

Step 2: The unused slot (or "target slot") is marked as unbootable by calling the functionsetSlotAsUnbootable().

The current slot is always marked as successful at the beginning of the update to prevent the bootloader from falling back to the unused slot, which will soon have invalid data. If the system has reached the point where it can start applying an update, the current slot is marked as successful even if other major components are broken (such as the UI in a crash loop) since it's possible to push new software to fix these major problems.

The update payload is an opaque blob with the instructions to update to the new version. The update payload consists of basically two parts: the metadata and the extra data associated with the instructions. The metadata is relatively small and contains a list of operations to produce and verify the new version on the target slot. For example, an operation could decompress a certain blob and write it to certain blocks in a target partition, or read from a source partition, apply a binary patch, and write to certain blocks in a target partition. The extra data associated to the operations, not included in the metadata, is the bulk of the update payload and would consist of the compressed blob or binary patch in these examples.

Step 3: The payload metadata is downloaded.

Step 4: For each operation defined in the metadata, in order, the associated data (if any) is downloaded to memory, the operation is applied, and the associated memory is discarded.

These two steps take most of the update time, as they involve writing and downloading large amounts of data, and are likely to be interrupted for reasons of policy or reboot.

Step 5: The whole partitions are re-read and verified against the expected hash.

Step 6: The post-install step (if any) is run.

In the case of an error during the execution of any step, the update fails and is re-attempted with possibly a different payload. If all the steps so far have succeeded, the update succeeds and the last step is executed.

Step 7: The unused slot is marked as active by calling setActiveBootSlot().

Marking the unused slot as active doesn't mean it will finish booting. The bootloader—or system itself—can switch the active slot back if it doesn't read a successful state.

Post-install step

The post-install step consists of running a program from the "new update" version while still running in the old version. If defined in the OTA package, this step is mandatory and the program must return with exit code 0; otherwise, the update fails.

For every partition where a post-install step is defined, update_engine mounts the new partition into a specific location and executes the program specified in the OTA relative to the mounted partition. For example, if the post-install program is defined as usr/bin/postinstall in the system partition, this partition from the unused slot will be mounted in a fixed location (for example, in /postinstall_mount) and the/postinstall_mount/usr/bin/postinstall command will be executed. Note that for this step to work, the following are required:

The old kernel needs to be able to mount the new filesystem format. The filesystem type cannot change unless there's support for it in the old kernel (which includes details such as the compression algorithm used if using a compressed filesystem like SquashFS).
The old kernel needs to understand the new partition's post-install program format. If using an ELF binary, it should be compatible with the old kernel (e.g. a 64-bit new program running on an old 32-bit kernel if the architecture switched from 32- to 64-bit builds). Also, the libraries will be loaded from the old system image, not the new one, unless the loader (ld) is instructed to use other paths or build a static binary.
The new post-install program will be limited by the SELinux policies defined in the old system.

An example case is to use a shell script as a post-install program (interpreted by the old system's shell binary with a #! marker at the top) and then set up library paths from the new environment for executing a more complex binary post-install program.

Another example case is to run the post-install step from a dedicated smaller partition, so the filesystem format in the main system partition can be updated without incurring backward compatibility issues or stepping-stone updates, allowing users to update straight to the latest version from a factory image.

Due to the SELinux policies, the post-install step is suitable for performing tasks required by design on a given device or other best-effort tasks: update the A/B-capable firmware or bootloader, prepare copies of some databases for the new version, etc. This step is not suitable for one-off bug fixes before reboot that require unforeseen permissions.

The selected post-install program runs in the postinstall SELinux context. All the files in the new mounted partition will be tagged with postinstall_file, regardless of what their attributes are after rebooting into that new system. Changes to the SELinux attributes in the new system won't impact the post-install step. If the post-install program needs extra permissions, those must be added to the post-install context.

Implementation

OEMs and SoC vendors who wish to implement the feature must add the following support to their bootloaders:

Pass the correct parameters to the kernel
Implement the boot_control HAL (https://android.googlesource.com/platform/hardware/libhardware/+/master/include/hardware/boot_control.h)
Implement the state machine as shown in Figure 1:

Figure 1. Bootloader state machine

The boot control HAL can be tested using the bootctl utility.

Some tests have been implemented for Brillo:

https://android.googlesource.com/platform/system/extras/+/refs/heads/master/tests/bootloader/
https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/server/site_tests/brillo_BootLoader/brillo_BootLoader.py

Kernel patches

https://android-review.googlesource.com/#/c/158491/
https://android-review.googlesource.com/#/q/status:merged+project:kernel/common+branch:android-3.18+topic:A_B_Changes_3.18

Kernel command line arguments

The kernel command line arguments must contain the following extra arguments:

skip_initramfs rootwait ro init=/init root="/dev/dm-0 dm=system none ro,0 1 \  android-verity <public-key-id> <path-to-system-partition>"

The value is the ID of the public key used to verify the verity table signature (see dm-verity).

To add the .X509 certificate containing the public key to the system keyring:

Copy the .X509 certificate formatted in the .der format to the root of the kernel directory. Use the followingopenssl command to convert from .pem to .der format (if the .X509 certificate is formatted in .pem format):
```
openssl x509 -in <x509-pem-certificate> -outform der -out <x509-der-certificate>
```

Once copied to the kernel build root, build the zImage to include the certificate as part of the system keyring. This can be verified from the following procfs entry (requires KEYS_CONFIG_DEBUG_PROC_KEYS to be enabled):

angler:/# cat /proc/keys1c8a217e I------     1 perm 1f010000     0     0 asymmetriAndroid: 7e4333f9bba00adfe0ede979e28ed1920492b40f: X509.RSA 0492b40f []2d454e3e I------     1 perm 1f030000     0     0 keyring.system_keyring: 1/4

Successful inclusion of the .X509 certificate indicates the presence of the public key in the system keyring. The highlighted portion denotes the public key ID.

As the next step, replace the space with ‘#’ and pass it as in the kernel command line. For example, in the above case, the following is passed in the place of :Android:#7e4333f9bba00adfe0ede979e28ed1920492b40f

Recovery

The recovery RAM disk is now contained in the boot.img file. When going into recovery, the bootloader cannotput the skip_initramfs option on the kernel command line.

Build variables

Must define for the A/B target:

AB_OTA_UPDATER := true
AB_OTA_PARTITIONS := \
boot \
system \
vendor
and other partitions updated through update_engine (radio, bootloader, etc.)
BOARD_BUILD_SYSTEM_ROOT_IMAGE := true
TARGET_NO_RECOVERY := true
BOARD_USES_RECOVERY_AS_BOOT := true
PRODUCT_PACKAGES += \
update_engine \
update_verifier

Optionally define for debug builds:

PRODUCT_PACKAGES_DEBUG += update_engine_client

Cannot define for the A/B target:

BOARD_RECOVERYIMAGE_PARTITION_SIZE
BOARD_CACHEIMAGE_PARTITION_SIZE
BOARD_CACHEIMAGE_FILE_SYSTEM_TYPE

Partitions

A/B devices do not need a recovery partition or cache partition because Android no longer uses these partitions. The data partition is now used for the downloaded OTA package, and the recovery image code is on the boot partition.
All partitions that are A/B-ed should be named as follows (assuming the suffix chosen is _a and _b): boot_a,boot_b, system_a, system_b, vendor_a, vendor_b.

Fstab

The slotselect argument must be on the line for the A/B-ed partitions. For example:

/vendor  /vendor  ext4  rowait,verify=/metadata,slotselect

Please note that there should be no partition named vendor but instead the partition vendor_a or vendor_b will be selected and mounted on the /vendor mount point.

Kernel slot arguments

The current slot suffix should be passed either through a specific DT node (/firmware/android/slot_suffix) or through the androidboot.slot_suffix command line argument.

Optionally, if the bootloader implements fastboot, the following commands and variables should be supported:

Commands

set_active —Sets the current active slot to the given suffix. This must also clear the unbootable flag for that slot, and reset the retry count to default values.

Variables

has-slot: —Returns “yes” if the given partition supports slots, “no” otherwise.
current-slot —Returns the slot suffix that will be booted from next.
slot-suffixes —Returns a comma-separated list of slot suffixes supported by the device.
slot-successful: —Returns "yes" if the given slot has been marked as successfully booting, "no" otherwise.
slot-unbootable: —Returns “yes” if the given slot is marked as unbootable, "no" otherwise.
slot-retry-count: —Number of retries remaining to attempt to boot the given slot.
These variables should all appear under the following: fastboot getvar all

OTA package generation

The OTA package tools follow the same commands as the commands for non-A/B devices. Thetarget_files.zip file must be generated by defining the build variables for the A/B target. The OTA package tools automatically identify and generate packages in the format for the A/B updater.

For example, use the following to generate a full OTA:

./build/tools/releasetools/ota_from_target_files \  dist_output/tardis-target_files.zip ota_update.zip

Or, generate an incremental OTA:

./build/tools/releasetools/ota_from_target_files \  -i PREVIOUS-tardis-target_files.zip \  dist_output/tardis-target_files.zip incremental_ota_update.zip

Configuration

Partitions

The Update Engine can update any pair of A/B partitions defined in the same disk.

A pair of partitions has a common prefix (such as system or boot) and per-slot suffix (such as _a or -a) as defined by the boot_control HAL in the function getSuffix(). The list of partitions for which the payload generator defines an update is configured by the AB_OTA_PARTITIONS make variable. For example, if a pair of partitions bootloader_a and booloader_b are included (assuming _a and _b are the slot suffixes), these partitions can be updated by specifying the following on the product or board configuration:

AB_OTA_PARTITIONS := \  boot \  system \  bootloader

All the partitions updated by the Update Engine must not be modified by the rest of the system. During incremental or delta updates, the binary data from the current slot is used to generate the data in the new slot. Any modification may cause the new slot data to fail verification during the update process, and therefore fail the update.

Post-install

The post-install step can be configured differently for each updated partition using a set of key-value pairs.

To run a program located at /system/usr/bin/postinst in a new image, specify the path relative to the root of the filesystem in the system partition. For example, usr/bin/postinst is system/usr/bin/postinst (if not using a RAM disk). Additionally, specify the filesystem type to pass to the mount(2) system call. Add the following to the product or device .mk files (if applicable):

AB_OTA_POSTINSTALL_CONFIG += \  RUN_POSTINSTALL_system=true \  POSTINSTALL_PATH_system=usr/bin/postinst \  FILESYSTEM_TYPE_system=ext4

App compilation in background

Compiling apps in the background for A/B updates requires the following two additions to the product's device configuration (in the product's device.mk):

Include the native components in the build. This ensures the compilation script and binaries are compiled and included in the system image.
```
  # A/B OTA dexopt package  PRODUCT_PACKAGES += otapreopt_script
```

Connect the compilation script to update_engine such that it is run as a post-install step.

  # A/B OTA dexopt update_engine hookup  AB_OTA_POSTINSTALL_CONFIG += \    RUN_POSTINSTALL_system=true \    POSTINSTALL_PATH_system=system/bin/otapreopt_script \    FILESYSTEM_TYPE_system=ext4 \    POSTINSTALL_OPTIONAL_system=true

See First boot installation of DEX_PREOPT files to install the preopted files in the unused second system partition.

Android官方资料--A/B System Updates

A/B System Updates

IN THIS DOCUMENT

Overview

Bootloader state examples

Update Engine features

Life of an A/B update

Post-install step

Implementation

Kernel patches

Kernel command line arguments

To add the .X509 certificate containing the public key to the system keyring:

Recovery

Build variables

Must define for the A/B target:

Optionally define for debug builds:

Cannot define for the A/B target:

Partitions

Fstab

Kernel slot arguments

Commands

Variables

OTA package generation

Configuration

Partitions

Post-install

App compilation in background

更多相关文章

随机推荐