Age | Commit message (Collapse) | Author |
|
This attempts to standardize unit test file location under test/unittests/
such that any source file located at cloudinit/path/to/file.py may have a
corresponding unit test file at test/unittests/path/to/test_file.py.
Noteworthy Comments:
====================
Four different duplicate test files existed:
test_{gpg,util,cc_mounts,cc_resolv_conf}.py
Each of these duplicate file pairs has been merged together. This is a
break in git history for these files.
The test suite appears to have a dependency on test order. Changing test
order causes some tests to fail. This should be rectified, but for now
some tests have been modified in
tests/unittests/config/test_set_passwords.py.
A helper class name starts with "Test" which causes pytest to try
executing it as a test case, which then throws warnings "due to Class
having __init__()". Silence by changing the name of the class.
# helpers.py is imported in many test files, import paths change
cloudinit/tests/helpers.py -> tests/unittests/helpers.py
# Move directories:
cloudinit/distros/tests -> tests/unittests/distros
cloudinit/cmd/devel/tests -> tests/unittests/cmd/devel
cloudinit/cmd/tests -> tests/unittests/cmd/
cloudinit/sources/helpers/tests -> tests/unittests/sources/helpers
cloudinit/sources/tests -> tests/unittests/sources
cloudinit/net/tests -> tests/unittests/net
cloudinit/config/tests -> tests/unittests/config
cloudinit/analyze/tests/ -> tests/unittests/analyze/
# Standardize tests already in tests/unittests/
test_datasource -> sources
test_distros -> distros
test_vmware -> sources/vmware
test_handler -> config # this contains cloudconfig module tests
test_runs -> runs
|
|
Some references were missed in the removal of the agent command
in PR #799. This simply removes the remaining references.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
testing: monkeypatch system_info call in unit tests
system_info can make calls that read or write from the filesystem, which
should require special mocking. It is also decorated with 'lru_cache',
which means test authors often don't realize they need to be mocking.
Also, we don't actually want the results from the user's local
machine, so monkeypatching it across all tests should be reasonable.
Additionally, moved some of 'system_info` into a helper function to
reduce the surface area of the monkeypatch, added tests for the new
function (and fixed a bug as a result), and removed related mocks that
should be no longer needed.
|
|
Without UDF support, DS Azure cannot mount the provisioning ISO,
which contains platform metadata necessary to support
pre-provisioning. The required metadata is made available in IMDS
starting with api version 2021-08-01. This change will leverage IMDS
to obtain the required metadata to support pre-preprovisioning if
provisioning ISO was not available.
|
|
tox: bump the pinned flake8 and pylint version
* pylint: fix W1406 (redundant-u-string-prefix)
The u prefix for strings is no longer necessary in Python >=3.0.
* pylint: disable W1514 (unspecified-encoding)
From https://www.python.org/dev/peps/pep-0597/ (Python 3.10):
The new warning stems form https://www.python.org/dev/peps/pep-0597,
which says:
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8. [...] Even Python experts may assume that the
default encoding is UTF-8. This creates bugs that only happen on Windows.
The warning could be fixed by always specifying encoding='utf-8',
however we should be careful to not break environments which are not
utf-8 (or explicitly state that only utf-8 is supported). Let's silence
the warning for now.
* _quick_read_instance_id: cover the case where load_yaml() returns None
Spotted by pylint:
- E1135 (unsupported-membership-test)
- E1136 (unsubscriptable-object)
LP: #1944414
|
|
In the nic attach path, we skip doing dhcp since we already did it
when bringing the interface up. However when polling for
reprovisiondata, it is possible for the request to timeout due to
platform issues. In those cases we still need to do dhcp and try again
since we tear down the context. We can only skip the first dhcp
attempt.
|
|
before rebinding again (#990)
Add 10 second polling loop in wait_for_link_up after performing
an unbind and re-bind of primary NIC in hv_netvsc driver.
Also reduce cloud-init logging levels to debug for these operations.
|
|
When bringing interface up by unbinding and then binding hv_netvsc
driver, it might take a short delay after binding for the link to be
up. So before trying unbind/bind again after sleep, check if the link
is up. This is a corner case when a preprovisioned VM is reused and
the NICs are hot-attached.
|
|
|
|
With a few exceptions, Azure VM deployments receive provisioning
metadata through the provisioning iso presented as a cdrom device
(/dev/sr0). The existing code attempts to find this device by calling
blkid to find all devices that have either type iso9660 or udf. This
can be very expensive if the VM has a lot of disks. This commit will
attempt to mount the default iso location first and only tries to use
blkid to locate the iso location if the default mounting location fails
|
|
Control is currently limited to boot events, though this should
allow us to more easily incorporate HOTPLUG support. Disabling
'instance-first-boot' is not supported as we apply networking config
too early in boot to have processed userdata (along with the fact
that this would be a pretty big foot-gun).
The concept of update events on datasource has been split into
supported update events and default update events. Defaults will be
used if there is no user-defined update events, but user-defined
events won't be supplied if they aren't supported.
When applying the networking config, we now check to see if the event
is supported by the datasource as well as if it is enabled.
Configuration looks like:
updates:
network:
when: ['boot']
|
|
See https://bugs.launchpad.net/cloud-init/+bug/1910835
|
|
|
|
When network interfaces are hot-attached to the VM, attempting to get
network metadata might return 410 (or 500, 503 etc) because the info
is not yet available. In those cases, we retry getting the metadata
before giving up. The only case where we can move on to wait for more
nic attach events is if the call times out despite retries, which
means the interface is not likely a primary interface, and we should
try for more nic attach events.
|
|
This change allows us to retrieve the username and hostname from
IMDS instead of having to rely on the mounted OVF.
|
|
Invoking walinuxagent from within cloud-init is no longer
supported/necessary
|
|
Add flexibility to IMDS api-version by having both a desired IMDS
api-version and a minimum api-version. The desired api-version will
be used first, and if that fails it will fall back to the minimum
api-version.
|
|
Changes:
* Only merge in default Azure cloud ephemeral disk configs
during DataSourceAzure._get_data() if the ephemeral disk
exists.
* DataSourceAzure.address_ephemeral_resize() (which is
invoked in DataSourceAzure.activate() should only set up
the ephemeral disk if the disk exists.
Azure VMs may or may not come with ephemeral resource disks
depending on the VM SKU. For VM SKUs that come with
ephemeral resource disks, the Azure platform guarantees that
the ephemeral resource disk is attached to the VM before
the VM is booted. For VM SKUs that do not come with
ephemeral resource disks, cloud-init currently attempts
to wait and set up a non-existent ephemeral resource
disk, which wastes boot time. It also causes disk setup
modules to fail (due to non-existent references to the
ephemeral resource disk).
udevadm settle is invoked by cloud-init very early in boot.
udevadm settle is invoked very early, before
DataSourceAzure's _get_data() and activate() methods.
Within DataSourceAzure's _get_data() and activate() methods,
the ephemeral resource disk path should exist if the
VM SKU comes with an ephemeral resource disk.
The ephemeral resource disk path should not exist if the
VM SKU does not come with an ephemeral resource disk.
LP: #1901011
|
|
Kernel's newer than 4.15 present /sys/dmi/id/product_uuid as a
lowercase value. Previously UUID was uppercase.
Azure datasource reads the product_uuid directly as their platform's
instance-id. This presents a problem if a kernel is either
upgraded or downgraded across the 4.15 kernel version boundary because
the case of the UUID will change, resulting in cloud-init seeing a
"new" instance id and re-running all modules.
Re-running cc_ssh in cloud-init deletes and regenerates ssh_host keys
on a system which can cause concern on long-running instances that
somethingnefarious has happened.
Also add:
- An integration test for this for Azure Bionic Ubuntu FIPS upgrading from
a FIPS kernel with uppercase UUID to a lowercase UUID in linux-azure
- A new pytest.mark.sru_next to collect all integration tests related to our
next SRU
LP: #1835584
|
|
With the changes for SSH public keys to be retrieved from IMDS as a
first option, when a key is passed through not in the raw SSH public key
format it causes an issue and the key is not added to the user's
authorized_keys file.
This PR will temporarily disable this behavior until a permanent fix is
put in place.
|
|
Prevent network interfaces without IP addresses from being added to the
generated network configuration.
|
|
Adds the ability to run the Azure preprovisioned VMs as NIC-less and
then hot-attach them when assigned for reprovision.
The NIC on the preprovisioned VM is hot-detached as soon as it reports
ready and goes into wait for one or more interfaces to be hot-attached.
Once they are attached, cloud-init gets the expected number of NICs (in
case there are more than one) that will be attached from IMDS and waits
until all of them are attached. After all the NICs are attached,
reprovision proceeds as usual.
|
|
cc_set_password will only update the password for the default user if
cfg['password'] is set. The existing code of datasource Azure will fail
to update the default user's password because it does not set that
metadata. If the default user doesn't exist in the image, the current
code works fine because the password is set during user create and
not in cc_set_password
|
|
On systems where the Azure datasource
is a viable platform for crawling metadata,
cloud-init occasionally encounters fatal
irrecoverable errors during the crawling
of the Azure datasource.
When this happens, cloud-init crashes,
and Azure VM provisioning would fail.
However, instead of failing immediately,
the user will continue seeing provisioning
for a long time until it times out with
"OS Provisioning Timed Out" message.
In these situations, cloud-init should
report failure to the Azure datasource
endpoint indicating provisioning failure.
The user will immediately see provisioning
terminate, giving them a much better
failure experience instead of pointlessly
waiting for OS provisioning timeout.
|
|
This just separates the reading of dmi values into its own file.
Some things of note:
* left import of util in dmi.py only for 'is_container'
It'd be good if is_container was not in util.
* just the use of 'util.is_x86' to dmi.py
* open() is used directly rather than load_file.
|
|
DataSourceAzure previously writes the preprovisioning
reported ready marker file before it goes through the
report ready workflow. On certain VM instances, the
marker file is successfully written but then reporting
ready fails.
Upon rare VM reboots by the platform, cloud-init sees
that the report ready marker file already exists.
The existence of this marker file tells cloud-init
not to report ready again (because it mistakenly
assumes that it already reported ready in
preprovisioning).
In this scenario, cloud-init instead erroneously
takes the reprovisioning workflow instead of
reporting ready again.
|
|
enumeration of physical network devices (#591)
|
|
fails (#549)
Azure datasource's `parse_network_config` throws a fatal uncaught exception when an exception is raised during generation of network config from IMDS metadata. This happens when IMDS metadata is invalid/corrupted (such as when it is missing network or interface metadata). This causes the rest of provisioning to fail.
This changes `parse_network_config` to be a non-fatal implementation. Additionally, when generating network config from IMDS metadata fails, fall back on generating fallback network config (`_generate_network_config_from_fallback_config`).
This also changes fallback network config generation (`_generate_network_config_from_fallback_config`) to blacklist an additional driver: `mlx5_core`.
|
|
* pull ssh keys from imds first and fall back to ovf if unavailable
* refactor log and diagnostic messages
* refactor the OpenSSLManager instantiation and certificate usage
* fix unit test where exception was being silenced for generate cert
* fix tests now that certificate is not always generated
* add documentation for ssh key retrieval
* add ability to check if http client has security enabled
* refactor certificate logic to GoalState
|
|
This fixes a long delay during boot of some instances. For Azure instance types using SR-IOV via the Hyper-V netvsc network driver, two network interfaces are created that share the same MAC, but only the virtual device should be configured and used. Updating the netplan configuration to filter on the hv_netvsc driver prevents netplan from trying to figure both devices.
LP: #1830740
|
|
DataSourceAzure: Gracefully handle the case of set hostname failure during provisioning
|
|
|
|
This was painful, but it finishes a TODO from cloudinit/subp.py.
It moves the following from util to subp:
ProcessExecutionError
subp
which
target_path
I moved subp_blob_in_tempfile into cc_chef, which is its only caller.
That saved us from having to deal with it using write_file
and temp_utils from subp (which does not import any cloudinit things now).
It is arguable that 'target_path' could be moved to a 'path_utils' or
something, but in order to use it from subp and also from utils,
we had to get it out of utils.
|
|
Improving the debugability of this code path by logging the thrown exception details for the non 404 exceptions.
Retry IMDS on HTTP Error 404 and 410, re-run DHCP on other exceptions.
|
|
This fixes issues with closing brackets not matching the opening
bracket's line and continuation line under-idented for hanging indent.
|
|
These libraries provide backports of Python 3's stdlib components to Python 2. As we only support Python 3, we can simply use the stdlib now. This pull request does the following:
* removes some unneeded compatibility code for the old spelling of `assertRaisesRegex`
* replaces invocations of the Python 2-only `assertItemsEqual` with its new name, `assertCountEqual`
* replaces all usage of `unittest2` with `unittest`
* replaces all usage of `contextlib2` with `contextlib`
* drops `unittest2` and `contextlib2` from requirements files and tox.ini
It also rewrites some `test_azure` helpers to use bare asserts. We were seeing a strange error in xenial builds of this branch which appear to be stemming from the AssertionError that pytest produces being _different_ from the standard AssertionError. This means that the modified helpers weren't behaving correctly, because they weren't catching AssertionErrors as one would expect. (I believe this is related, in some way, to https://github.com/pytest-dev/pytest/issues/645, but the only version of pytest where we're affected is so far in the past that it's not worth pursuing it any further as we have a workaround.)
|
|
These classes don't use `self.logs` anywhere in their body, so we can
remove the `with_logs = True` setting from them.
These instances were found using astpath[0], with the following
invocation:
astpath "//Name[@id='with_logs' and not(ancestor::ClassDef//Attribute[@attr='logs'])]"
[0] https://github.com/hchasestevens/astpath
|
|
Azure stores the instance ID with an incorrect byte ordering for the
first three hyphen delimited parts. This results in invalid
is_new_instance checks forcing Azure datasource to recrawl the metadata
service.
When persisting instance-id from the metadata service, swap the
instance-id string byte order such that it is consistent with
that returned by dmi information. Check whether the instance-id
string is a byte-swapped match when determining correctly whether
the Azure platform instance-id has actually changed.
|
|
Azure's Instance Metadata Service (IMDS) reports multiple IPv6
addresses, via the http://169.254.169.254/metadata/instance/network
route. Any additional values after the first in 'ipAddresses' under the
'ipv6' interface key are extracted and configured as static IPs on
the interface.
|
|
Network v2 configuration for Azure will set both dhcp4 and
dhcp6 to False by default.
When IPv6 privateIpAddresses are present for an interface in Azure's
Instance Metadata Service (IMDS), set dhcp6: True and provide a
route-metric value that will match the corresponding dhcp4 route-metric.
The route-metric value will increase by 100 for each additional
interface present to ensure the primary interface has a route to IMDS.
Also fix dhcp route-metric rendering for eni and sysconfig distros.
LP: #1850308
|
|
After initial boot ovf-env.xml is copied to agent dir
(/var/lib/waagent/) with REDACTED password.
On subsequent boots DataSourceAzure loads with a configuration where the
user specified in /var/lib/waagent/ovf-env.xml is locked.
If instance id changes, cc_users_groups action will lock the user.
Fix this behavior by not locking the user if its password is REDACTED.
LP: #1849677
|
|
Collect and record the following information through KVP:
+ timestamps related to kernel initialization and systemd activation
of cloud-init services
+ system information including cloud-init version, kernel version,
distro version, and python version
+ diagnostic events for the most common provisioning error issues
such as empty dhcp lease, corrupted ovf-env.xml, etc.
+ increasing the log frequency of polling IMDS during reprovision.
|
|
The function generate_fallback_config is used by Azure by default when
not consuming IMDS configuration data. This function is also used by any
datasource which does not implement it's own network config. This simple
fallback configuration sets up dhcp on the most likely NIC. It will now
emit network v2 instead of network v1.
This is a step toward moving all components talking in v2 and allows us
to avoid costly conversions between v1 and v2 for newer distributions
which rely on netplan.
|
|
The EphemeralDHCP context manager did not parse or handle
rfc3442 classless static routes which prevented reading
datasource metadata in some clouds. This branch adds support
for extracting the field from the leases output, parsing the
format and then adding the required iproute2 ip commands to
apply (and teardown) the static routes.
LP: #1821102
|
|
This allows cloud-init query region to show valid region data for Azure
|
|
- UFS file system support
- GPT partition table support
- add support for newfs's -L parameter (label)
- move freebsd specific test from Azure to freebsd
|
|
If the IMDS primary server is not available, falling back to the
secondary server takes about 1s. The net result is that the
expected E2E time is slightly more than 1s. This change increases
the timeout to 2s to prevent the infinite loop of timeouts.
|
|
Mock util.SeLinuxGuard to do nothing within tests that mock functions
used by the guard, when those mocks confuse the guard. This has no
impact when executing unit tests on systems which do not enable selinux
(e.g. Ubuntu).
LP: #1825253
|
|
The Azure platform surfaces random bytes into /sys via Hyper-V.
Python 2.7 json.dump() raises an exception if asked to convert
a str with non-character content, and python 3.0 json.dump()
won't serialize a "bytes" value. As a result, c-i instance
data is often not written by Azure, making reboots slower (c-i
has to repeat work).
The random data is base64-encoded and then decoded into a string
(str or unicode depending on the version of Python in use). The
base64 string has just as many bits of entropy, so we're not
throwing away useful "information", but we can be certain
json.dump() will correctly serialize the bits.
|
|
- Remove the last few places that use `if PY26`
- Replace our Python version detection logic with six's (which we were
already using in most places)
|