From 5fc34d81a002f6ca0706f5285ee15b919c3d8d2e Mon Sep 17 00:00:00 2001 From: Daniel Watkins Date: Wed, 16 Sep 2020 16:49:34 -0400 Subject: boot.rst: add First Boot Determination section (#568) LP: #1888858 --- doc/rtd/topics/boot.rst | 86 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) (limited to 'doc') diff --git a/doc/rtd/topics/boot.rst b/doc/rtd/topics/boot.rst index 4e79c958..a5282e35 100644 --- a/doc/rtd/topics/boot.rst +++ b/doc/rtd/topics/boot.rst @@ -157,4 +157,90 @@ finished, the ``cloud-init status`` subcommand can help block external scripts until cloud-init is done without having to write your own systemd units dependency chains. See :ref:`cli_status` for more info. +First Boot Determination +************************ + +cloud-init has to determine whether or not the current boot is the first boot +of a new instance or not, so that it applies the appropriate configuration. On +an instance's first boot, it should run all "per-instance" configuration, +whereas on a subsequent boot it should run only "per-boot" configuration. This +section describes how cloud-init performs this determination, as well as why it +is necessary. + +When it runs, cloud-init stores a cache of its internal state for use across +stages and boots. + +If this cache is present, then cloud-init has run on this system before. +[#not-present]_ There are two cases where this could occur. Most commonly, +the instance has been rebooted, and this is a second/subsequent boot. +Alternatively, the filesystem has been attached to a *new* instance, and this +is an instance's first boot. The most obvious case where this happens is when +an instance is launched from an image captured from a launched instance. + +By default, cloud-init attempts to determine which case it is running in by +checking the instance ID in the cache against the instance ID it determines at +runtime. If they do not match, then this is an instance's first boot; +otherwise, it's a subsequent boot. Internally, cloud-init refers to this +behavior as ``check``. + +This behavior is required for images captured from launched instances to +behave correctly, and so is the default which generic cloud images ship with. +However, there are cases where it can cause problems. [#problems]_ For these +cases, cloud-init has support for modifying its behavior to trust the instance +ID that is present in the system unconditionally. This means that cloud-init +will never detect a new instance when the cache is present, and it follows that +the only way to cause cloud-init to detect a new instance (and therefore its +first boot) is to manually remove cloud-init's cache. Internally, this +behavior is referred to as ``trust``. + +To configure which of these behaviors to use, cloud-init exposes the +``manual_cache_clean`` configuration option. When ``false`` (the default), +cloud-init will ``check`` and clean the cache if the instance IDs do not match +(this is the default, as discussed above). When ``true``, cloud-init will +``trust`` the existing cache (and therefore not clean it). + +Manual Cache Cleaning +===================== + +cloud-init ships a command for manually cleaning the cache: ``cloud-init +clean``. See :ref:`cli_clean`'s documentation for further details. + +Reverting ``manual_cache_clean`` Setting +======================================== + +Currently there is no support for switching an instance that is launched with +``manual_cache_clean: true`` from ``trust`` behavior to ``check`` behavior, +other than manually cleaning the cache. + +.. warning:: If you want to capture an instance that is currently in ``trust`` + mode as an image for launching other instances, you **must** manually clean + the cache. If you do not do so, then instances launched from the captured + image will all detect their first boot as a subsequent boot of the captured + instance, and will not apply any per-instance configuration. + + This is a functional issue, but also a potential security one: cloud-init is + responsible for rotating SSH host keys on first boot, and this will not + happen on these instances. + +.. [#not-present] It follows that if this cache is not present, cloud-init has + not run on this system before, so this is unambiguously this instance's + first boot. + +.. [#problems] A couple of ways in which this strict reliance on the presence + of a datasource has been observed to cause problems: + + * If a cloud's metadata service is flaky and cloud-init cannot obtain the + instance ID locally on that platform, cloud-init's instance ID + determination will sometimes fail to determine the current instance ID, + which makes it impossible to determine if this is an instance's first or + subsequent boot (`#1885527`_). + * If cloud-init is used to provision a physical appliance or device and an + attacker can present a datasource to the device with a different instance + ID, then cloud-init's default behavior will detect this as an instance's + first boot and reset the device using the attacker's configuration + (this has been observed with the NoCloud datasource in `#1879530`_). + +.. _#1885527: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1885527 +.. _#1879530: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1879530 + .. vi: textwidth=79 -- cgit v1.2.3