Age | Commit message (Collapse) | Author |
|
With the changes for SSH public keys to be retrieved from IMDS as a
first option, when a key is passed through not in the raw SSH public key
format it causes an issue and the key is not added to the user's
authorized_keys file.
This PR will temporarily disable this behavior until a permanent fix is
put in place.
|
|
Prevent network interfaces without IP addresses from being added to the
generated network configuration.
|
|
Adds the ability to run the Azure preprovisioned VMs as NIC-less and
then hot-attach them when assigned for reprovision.
The NIC on the preprovisioned VM is hot-detached as soon as it reports
ready and goes into wait for one or more interfaces to be hot-attached.
Once they are attached, cloud-init gets the expected number of NICs (in
case there are more than one) that will be attached from IMDS and waits
until all of them are attached. After all the NICs are attached,
reprovision proceeds as usual.
|
|
cc_set_password will only update the password for the default user if
cfg['password'] is set. The existing code of datasource Azure will fail
to update the default user's password because it does not set that
metadata. If the default user doesn't exist in the image, the current
code works fine because the password is set during user create and
not in cc_set_password
|
|
On systems where the Azure datasource
is a viable platform for crawling metadata,
cloud-init occasionally encounters fatal
irrecoverable errors during the crawling
of the Azure datasource.
When this happens, cloud-init crashes,
and Azure VM provisioning would fail.
However, instead of failing immediately,
the user will continue seeing provisioning
for a long time until it times out with
"OS Provisioning Timed Out" message.
In these situations, cloud-init should
report failure to the Azure datasource
endpoint indicating provisioning failure.
The user will immediately see provisioning
terminate, giving them a much better
failure experience instead of pointlessly
waiting for OS provisioning timeout.
|
|
This just separates the reading of dmi values into its own file.
Some things of note:
* left import of util in dmi.py only for 'is_container'
It'd be good if is_container was not in util.
* just the use of 'util.is_x86' to dmi.py
* open() is used directly rather than load_file.
|
|
DataSourceAzure previously writes the preprovisioning
reported ready marker file before it goes through the
report ready workflow. On certain VM instances, the
marker file is successfully written but then reporting
ready fails.
Upon rare VM reboots by the platform, cloud-init sees
that the report ready marker file already exists.
The existence of this marker file tells cloud-init
not to report ready again (because it mistakenly
assumes that it already reported ready in
preprovisioning).
In this scenario, cloud-init instead erroneously
takes the reprovisioning workflow instead of
reporting ready again.
|
|
enumeration of physical network devices (#591)
|
|
fails (#549)
Azure datasource's `parse_network_config` throws a fatal uncaught exception when an exception is raised during generation of network config from IMDS metadata. This happens when IMDS metadata is invalid/corrupted (such as when it is missing network or interface metadata). This causes the rest of provisioning to fail.
This changes `parse_network_config` to be a non-fatal implementation. Additionally, when generating network config from IMDS metadata fails, fall back on generating fallback network config (`_generate_network_config_from_fallback_config`).
This also changes fallback network config generation (`_generate_network_config_from_fallback_config`) to blacklist an additional driver: `mlx5_core`.
|
|
* pull ssh keys from imds first and fall back to ovf if unavailable
* refactor log and diagnostic messages
* refactor the OpenSSLManager instantiation and certificate usage
* fix unit test where exception was being silenced for generate cert
* fix tests now that certificate is not always generated
* add documentation for ssh key retrieval
* add ability to check if http client has security enabled
* refactor certificate logic to GoalState
|
|
This fixes a long delay during boot of some instances. For Azure instance types using SR-IOV via the Hyper-V netvsc network driver, two network interfaces are created that share the same MAC, but only the virtual device should be configured and used. Updating the netplan configuration to filter on the hv_netvsc driver prevents netplan from trying to figure both devices.
LP: #1830740
|
|
DataSourceAzure: Gracefully handle the case of set hostname failure during provisioning
|
|
|
|
This was painful, but it finishes a TODO from cloudinit/subp.py.
It moves the following from util to subp:
ProcessExecutionError
subp
which
target_path
I moved subp_blob_in_tempfile into cc_chef, which is its only caller.
That saved us from having to deal with it using write_file
and temp_utils from subp (which does not import any cloudinit things now).
It is arguable that 'target_path' could be moved to a 'path_utils' or
something, but in order to use it from subp and also from utils,
we had to get it out of utils.
|
|
Improving the debugability of this code path by logging the thrown exception details for the non 404 exceptions.
Retry IMDS on HTTP Error 404 and 410, re-run DHCP on other exceptions.
|
|
This fixes issues with closing brackets not matching the opening
bracket's line and continuation line under-idented for hanging indent.
|
|
These libraries provide backports of Python 3's stdlib components to Python 2. As we only support Python 3, we can simply use the stdlib now. This pull request does the following:
* removes some unneeded compatibility code for the old spelling of `assertRaisesRegex`
* replaces invocations of the Python 2-only `assertItemsEqual` with its new name, `assertCountEqual`
* replaces all usage of `unittest2` with `unittest`
* replaces all usage of `contextlib2` with `contextlib`
* drops `unittest2` and `contextlib2` from requirements files and tox.ini
It also rewrites some `test_azure` helpers to use bare asserts. We were seeing a strange error in xenial builds of this branch which appear to be stemming from the AssertionError that pytest produces being _different_ from the standard AssertionError. This means that the modified helpers weren't behaving correctly, because they weren't catching AssertionErrors as one would expect. (I believe this is related, in some way, to https://github.com/pytest-dev/pytest/issues/645, but the only version of pytest where we're affected is so far in the past that it's not worth pursuing it any further as we have a workaround.)
|
|
These classes don't use `self.logs` anywhere in their body, so we can
remove the `with_logs = True` setting from them.
These instances were found using astpath[0], with the following
invocation:
astpath "//Name[@id='with_logs' and not(ancestor::ClassDef//Attribute[@attr='logs'])]"
[0] https://github.com/hchasestevens/astpath
|
|
Azure stores the instance ID with an incorrect byte ordering for the
first three hyphen delimited parts. This results in invalid
is_new_instance checks forcing Azure datasource to recrawl the metadata
service.
When persisting instance-id from the metadata service, swap the
instance-id string byte order such that it is consistent with
that returned by dmi information. Check whether the instance-id
string is a byte-swapped match when determining correctly whether
the Azure platform instance-id has actually changed.
|
|
Azure's Instance Metadata Service (IMDS) reports multiple IPv6
addresses, via the http://169.254.169.254/metadata/instance/network
route. Any additional values after the first in 'ipAddresses' under the
'ipv6' interface key are extracted and configured as static IPs on
the interface.
|
|
Network v2 configuration for Azure will set both dhcp4 and
dhcp6 to False by default.
When IPv6 privateIpAddresses are present for an interface in Azure's
Instance Metadata Service (IMDS), set dhcp6: True and provide a
route-metric value that will match the corresponding dhcp4 route-metric.
The route-metric value will increase by 100 for each additional
interface present to ensure the primary interface has a route to IMDS.
Also fix dhcp route-metric rendering for eni and sysconfig distros.
LP: #1850308
|
|
After initial boot ovf-env.xml is copied to agent dir
(/var/lib/waagent/) with REDACTED password.
On subsequent boots DataSourceAzure loads with a configuration where the
user specified in /var/lib/waagent/ovf-env.xml is locked.
If instance id changes, cc_users_groups action will lock the user.
Fix this behavior by not locking the user if its password is REDACTED.
LP: #1849677
|
|
Collect and record the following information through KVP:
+ timestamps related to kernel initialization and systemd activation
of cloud-init services
+ system information including cloud-init version, kernel version,
distro version, and python version
+ diagnostic events for the most common provisioning error issues
such as empty dhcp lease, corrupted ovf-env.xml, etc.
+ increasing the log frequency of polling IMDS during reprovision.
|
|
The function generate_fallback_config is used by Azure by default when
not consuming IMDS configuration data. This function is also used by any
datasource which does not implement it's own network config. This simple
fallback configuration sets up dhcp on the most likely NIC. It will now
emit network v2 instead of network v1.
This is a step toward moving all components talking in v2 and allows us
to avoid costly conversions between v1 and v2 for newer distributions
which rely on netplan.
|
|
The EphemeralDHCP context manager did not parse or handle
rfc3442 classless static routes which prevented reading
datasource metadata in some clouds. This branch adds support
for extracting the field from the leases output, parsing the
format and then adding the required iproute2 ip commands to
apply (and teardown) the static routes.
LP: #1821102
|
|
This allows cloud-init query region to show valid region data for Azure
|
|
- UFS file system support
- GPT partition table support
- add support for newfs's -L parameter (label)
- move freebsd specific test from Azure to freebsd
|
|
If the IMDS primary server is not available, falling back to the
secondary server takes about 1s. The net result is that the
expected E2E time is slightly more than 1s. This change increases
the timeout to 2s to prevent the infinite loop of timeouts.
|
|
Mock util.SeLinuxGuard to do nothing within tests that mock functions
used by the guard, when those mocks confuse the guard. This has no
impact when executing unit tests on systems which do not enable selinux
(e.g. Ubuntu).
LP: #1825253
|
|
The Azure platform surfaces random bytes into /sys via Hyper-V.
Python 2.7 json.dump() raises an exception if asked to convert
a str with non-character content, and python 3.0 json.dump()
won't serialize a "bytes" value. As a result, c-i instance
data is often not written by Azure, making reboots slower (c-i
has to repeat work).
The random data is base64-encoded and then decoded into a string
(str or unicode depending on the version of Python in use). The
base64 string has just as many bits of entropy, so we're not
throwing away useful "information", but we can be certain
json.dump() will correctly serialize the bits.
|
|
- Remove the last few places that use `if PY26`
- Replace our Python version detection logic with six's (which we were
already using in most places)
|
|
In test_ds_identify, don't mutate otherwise-static test data. When
running tests in a random order, this was causing failures due to
breaking preconditions for other tests.
In tests/helpers, reset logging level in tearDown. Some of the CLI
tests set the level of the root logger in a way that isn't correctly
reset.
For test_poll_imds_re_dhcp_on_timeout and
test_dhcp_discovery_run_in_sandbox_warns_invalid_pid, mock out
time.sleep; this saves ~11 seconds (or ~40% of previous test time!).
|
|
Replace Azure pre-provision polling on IMDS with a blocking call
which watches for netlink link state change messages. The media
change event happens when a pre-provisioned VM has been activated
and is connected to the users virtual network and cloud-init can
then resume operation to complete image instantiation.
|
|
Upon URL timeout, _poll_imds is expected to re-dhcp to get updated
IP configuration. We don't want to indefinitely retry because the
instance likely has invalid IP configuration.
LP: #1803598
|
|
There is an infrequent race when the booting instance can hit the IMDS
service before it is fully available. This results in a
requests.ConnectTimeout being raised.
Azure's retry_callback logic now retries on either 404s or Timeouts.
LP:1800223
|
|
If Azure detects an ntfs filesystem type during mount attempt, it should
still report the resource device as reformattable. There are slight
differences in error message format on RedHat and SuSE. This patch
simplifies the expected error match to work on both distributions.
LP: #1799338
|
|
In commitish 9073951 azure datasource tried to leverage stale DHCP
information obtained from EphemeralDHCPv4 context manager to report
updated provisioning status to the fabric earlier in the boot process.
Unfortunately the stale ephemeral network configuration had already been
torn down in preparation to bring up IMDS network config so the report
attempt failed on timeout.
This branch introduces obtain_lease and clean_network public methods on
EphemeralDHCPv4 to allow for setup and teardown of ephemeral network
configuration without using a context manager. Azure datasource now uses
this to persist ephemeral network configuration across multiple contexts
during provisioning to avoid multiple DHCP roundtrips.
|
|
When reusing a preprovisioned VM, report ready to Azure fabric as soon as
we get the reprovision data and the goal state so that we are not delayed
by the cloud-init stage switch, saving 2-3 seconds. Also reduce logging
when polling IMDS for reprovision data.
LP: #1799594
|
|
Azure generates network configuration from the IMDS service and removes
any preexisting hotplug network scripts which exist in Azure cloud images.
Add a datasource configuration option which allows for writing a default
network configuration which sets up dhcp on eth0 and leave the hotplug
handling to the cloud-image scripts.
To disable network-config from Azure IMDS, add the following to
/etc/cloud/cloud.cfg.d/99-azure-no-imds-network.cfg:
datasource:
Azure:
apply_network_config: False
LP: #1798424
|
|
Add the following instance-data.json standardized keys:
* v1._beta_keys: List any v1 keys in beta development,
e.g. ['subplatform'].
* v1.public_ssh_keys: List of any cloud-provided ssh keys for the
instance.
* v1.platform: String representing the cloud platform api supporting the
datasource. For example: 'ec2' for aws, aliyun and brightbox cloud
names.
* v1.subplatform: String with more details about the source of the
metadata consumed. For example, metadata uri, config drive device path
or seed directory.
To support the new platform and subplatform standardized instance-data,
DataSource and its subclasses grew platform and subplatform attributes.
The platform attribute defaults to the lowercase string datasource name at
self.dsname. This method is overridden in NoCloud, Ec2 and ConfigDrive
datasources.
The subplatform attribute calls a _get_subplatform method which will
return a string containing a simple slug for subplatform type such as
metadata, seed-dir or config-drive followed by a detailed uri, device or
directory path where the datasource consumed its configuration.
As part of this work, DatasourceEC2 methods _get_data and _crawl_metadata
have been refactored for a few reasons:
- crawl_metadata is now a read-only operation, persisting no attributes on
the datasource instance and returns a dictionary of consumed metadata.
- crawl_metadata now closely represents the raw stucture of the ec2
metadata consumed, so that end-users can leverage public ec2 metadata
documentation where possible.
- crawl_metadata adds a '_metadata_api_version' key to the crawled
ds.metadata to advertise what version of EC2's api was consumed by
cloud-init.
- _get_data now does all the processing of crawl_metadata and saves
datasource instance attributes userdata_raw, metadata etc.
Additional drive-bys:
* unit test rework for test_altcloud and test_azure to simplify mocks
and make use of existing util and test_helpers functions.
|
|
Azure datasource now queries IMDS metadata service for network
configuration at link local address
http://169.254.169.254/metadata/instance?api-version=2017-12-01. The
azure metadata service presents a list of macs and allocated ip addresses
associated with this instance. Azure will now also regenerate network
configuration on every boot because it subscribes to EventType.BOOT
maintenance events as well as the 'first boot'
EventType.BOOT_NEW_INSTANCE.
For testing add azure-imds --kind to cloud-init devel net_convert tool
for debugging IMDS metadata.
Also refactor _get_data into 3 discrete methods:
- is_platform_viable: check quickly whether the datasource is
potentially compatible with the platform on which is is running
- crawl_metadata: walk all potential metadata candidates, returning a
structured dict of all metadata and userdata. Raise InvalidMetaData on
error.
- _get_data: call crawl_metadata and process results or error. Cache
instance data on class attributes: metadata, userdata_raw etc.
|
|
The Azure data source provides a method to check whether a NTFS partition
on the ephemeral disk is safe for reformatting to ext4. The method checks
to see if there are customer data files on the disk. However, mounting
the partition fails on systems that do not have the capability of
mounting NTFS. Note that in this case, it is also very unlikely that the
NTFS partition would have been used by the system (since it can't mount
it). The only case would be where an update to the system removed the
capability to mount NTFS, the likelihood of which is also very small.
This change allows the reformatting of the ephemeral disk to ext4 on
systems where mounting NTFS is not supported.
|
|
This change is for Azure VM Preprovisioning. A bug was found when after
azure VMs report ready the first time, during the time when VM is polling
indefinitely for the new ovf-env.xml from Instance Metadata Service
(IMDS), if a reboot happens, we send another report ready signal to the
fabric, which deletes the reprovisioning data on the node.
This marker file is used to fix this issue so that we will only send a
report ready signal to the fabric when no marker file is present. Then,
create a marker file so that when a reboot does occur, we check if a
marker file has been created and decide whether we would like to send the
repot ready signal.
LP: #1765214
|
|
This enables warnings produced by pylint for unused variables (W0612),
and fixes the existing errors.
|
|
Reducing timeout to 1 second as IMDS responds within a handful
of milliseconds. Also get rid of max_retries to prevent exiting
out of polling loop early due to IMDS outage / upgrade.
Reduce Azure PreProvisioning HTTP timeouts during polling to
avoid waiting an extra minute.
LP: #1752977
|
|
LP: #1754495
|
|
This change will enable azure vms to report provisioning has completed
twice, first to tell the fabric it has completed then a second time to
enable customer settings. The datasource for the second provisioning is
the Instance Metadata Service (IMDS),and the VM will poll indefinitely for
the new ovf-env.xml from IMDS.
This branch introduces EphemeralDHCPv4 which encapsulates common logic
used by both DataSourceEc2 an DataSourceAzure for temporary DHCP
interactions without side-effects.
LP: #1734991
|
|
This fixes a traceback when attempting to bounce the network after
hostname resets.
In artful and bionic ifupdown package is no longer installed in default
cloud images. As such, Azure can't use those tools to bounce the network
informing DDNS about hostname changes. This doesn't affect DDNS updates
though because systemd-networkd is now watching hostname deltas and with
default behavior to SendHostname=True over dhcp for all hostname updates
which publishes DDNS for us.
LP: #1722668
|
|
The motivation for this is that
a.) 1.7.1 runs with python 3.6 (bionic)
b.) we want to run pylint on tests/ and tools for the same reasons
that we want to run it on cloudinit/
The changes are described below.
- Update tox.ini to invoke pylint v1.7.1.
- Modify .pylintrc generated-members ignore mocked object members (m_.*)
- Replace "dangerous" params defaulting to {}
- Fix up cloud_tests use of platforms
- Cast some instance objects to with dict()
- Handle python2.7 vs 3+ ConfigParser use of readfp (deprecated)
- Update use of assertEqual(<boolean>, value) to assert<Boolean>(value)
- replace depricated assertRegexp -> assertRegex
- Remove useless test-class calls to super class
- Assign class property accessors a result and use it
- Fix missing class member in CepkoResultTests
- Fix Cheetah test import
|
|
Each DataSource subclass must define its own get_data method. This branch
formalizes our DataSource class to require that subclasses define an
explicit dsname for sourcing cloud-config datasource configuration.
Subclasses must also override the _get_data method or a
NotImplementedError is raised.
The branch also writes /run/cloud-init/instance-data.json. This file
contains all meta-data, user-data and vendor-data and a standardized set
of metadata keys in a json blob which other utilities with root-access
could make use of. Because some meta-data or user-data is potentially
sensitive the file is only readable by root.
Generally most metadata content types should be json serializable. If
specific keys or values are not serializable, those specific values will
be base64encoded and the key path will be listed under the top-level key
'base64-encoded-keys' in instance-data.json. If json writing fails due to
other TypeErrors or UnicodeDecodeErrors, a warning log will be emitted to
/var/log/cloud-init.log and no instance-data.json will be created.
|