Age | Commit message (Collapse) | Author |
|
Pre-provisioned instances report ready early in the local phase and
again in the non-local phase, during setup(). Non-PPS only reports
ready during non-local phase.
Update the process to report ready during the local phase for all
cases. Only attempt to do so if networking is up to prevent stalling
boot. We've already waited at least 20 minutes for DHCP if we're
provisioning, or 5 minutes for DHCP on normal boot requesting updated
network configuration.
- Extend _report_ready() with pubkey_info and raise exception
on error to consolidate reporting done in _negotiate() and
_report_ready().
- Remove setup(), moving relevant logic into crawl_metadata().
- Move remaining _negotiate() logic into _cleanup_markers() and
_determine_wireserver_pubkey_info().
These changes effectively fix two issues that were present:
(1) _negotiated is incorrectly set to True
When failing to report ready. _negotiate() squashed the exception and
the return value was not checked. This was probably masked due to the
forced removal of obj.pkl on Ubuntu instances, but would be preferable
once we start persisting it to prevent unnecessary re-negotiation.
(2) provisioning media is not ejected for non-PPS
_negotiate() did not pass iso_dev parameter when reporting ready. The
host will ensure this operation takes place, but it is preferable to
eject /dev/sr0 from within the guest when we're done with it.
Lastly, this removes any need for lease file parsing as the wireserver
addressed is tracked for ephemeral DHCP. A follow-up PR will remove
this now-unused logic.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Due to race conditions and caching, IMDS may return stale or incomplete
metadata. Add some validation to detect these scenarios and report
appropriate telemetry.
Introduce normalize_mac_address() to allow for comparison of mac
addresses, replacing that found inline in:
_generate_network_config_from_imds_metadata()
Add validation of final fetch of IMDS metadata.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Eliminated the duplicate code and now run the entire configuration
routine against both public and private interfaces.
Also addressed an inconsistency from our metadata api for ipv6
address configuration.
|
|
Raise runtime errors for unhandled cases which would cause other
exceptions. Ignore types for a few cases where a non-trivial
refactor would be required to prevent the warning.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Once a valid datasource is detected, publish the following artifacts
to expedite cloud-identification without having to invoke cloud-id from
shell scripts or sheling out from python.
These files can also be relied on in systemd ConditionPathExists
directives to limit execution of services and units to specific
clouds.
/run/cloud-init/cloud-id:
- A symlink with content that is the canonical cloud-id of the
datasource detected. This content is the same lower-case value
as the output of /usr/bin/cloud-id.
/run/cloud-init/cloud-id-<canonical-cloud-id>:
- A single file which will contain the canonical cloud-id encoded
in the filename
|
|
Split _get_public_ssh_keys_and_source() into
_get_public_keys_from_imds() and _get_public_keys_from_ovf().
Set _get_public_keys_from_imds() to take a parameter of the
IMDS metadata rather than assuming it is already set in
self.metadata. This will allow us to move negotation into
local phase where self.metadata may not be set yet. Update this
method to raise KeyError if IMDS metadata is missing/malformed,
and ValueError if SSH key format is not supported. Update
get_public_ssh_keys() to catch these errors and fall back to the
OVF/Wireserver keys as needed.
To improve clarity, update register_with_azure_and_fetch_data()
to return the list of SSH keys, rather than bundling them into
a dictionary for updating against the metadata dictionary.
There should be no change in behavior with this refactor.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
When the datasource was originally submitted, EphemeralDHCPv4 was not
yet available. Also avoid race conditions by skipping network
configuration if metadata service can be reached.
Signed-off-by: Markus Schade <markus.schade@hetzner.com>
|
|
This change converts the IPv6 netmask from the network_data.json[1]
format to the CIDR style, <IPv6_addr>/<prefix>.
Using an IPv6 address like ffff:ffff:ffff:ffff:: does not work with
NetworkManager, nor networkscripts.
NetworkManager will ignore the route, logging:
ifcfg-rh: ignoring invalid route at \
"::/:: via fd00:fd00:fd00:2::fffe dev $DEV" \
(/etc/sysconfig/network-scripts/route6-$DEV:3): \
Argument for "::/::" is not ADDR/PREFIX format
Similarly if using networkscripts, ip route fail with error:
Error: inet6 prefix is expected rather than \
"fd00:fd00:fd00::/ffff:ffff:ffff:ffff::".
Also a bit of refactoring ...
cloudinit.net.sysconfig.Route.to_string:
* Move a couple of lines around to reduce repeated code.
* if "ADDRESS" not in key -> continute, so that the
code block following it can be de-indented.
cloudinit.net.network_state:
* Refactors the ipv4_mask_to_net_prefix, ipv6_mask_to_net_prefix
removes mask_to_net_prefix methods. Utilize ipaddress library to
do some of the heavy lifting.
LP: #1959148
|
|
Remove debug print that snuck in on a previous fixup.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Introduce:
- _setup_ephemeral_networking() to bring up networking.
If no iface is specified, it will use net.find_fallback_nic()
which is consistent with the previous usage of fallback_interface.
This method now tracks the encoded address of the wireserver
with a new property `_wireserver_endpoint`. Introduce a
timeout parameter to allow for retrying for a specified amount
of time.
- _teardown_ephemeral_networking() to bring down networking.
- _is_ephemeral_networking_up() to check status.
Ephemeral networking is now:
- Brought up prior to checking IMDS.
- Torn down following metadata crawl.
- For Savable PPS, torn down prior to waiting for NIC detach.
The link must be torn down in advance or we will see errors
from cleaning up network after the interface is unplugged.
- For Running PPS, torn down after waiting for media switch.
The link must be up for media switch to be detected.
- For all PPS, after network switch is complete, networking is
brought back up to poll for reprovision data and report ready.
It will be torn down after metadata crawl is complete like
non-PPS paths.
Additionally:
- Remove EphemeralDHCPv4WithReporting variant in favor of directly
using EphemeralDHCPv4. The reporting was only for __enter__ usage
which is no longer a used path. Continue to use dhcp_log_cb
callback.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Currently _check_if_nic_is_primary() checks for imds_md is None,
but imds_md is returned as an empty dictionary on error fetching
metdata.
Fix this check and the tests that are incorrectly vetting IMDS
polling code.
Additionally, use response.contents instead of str(response) when
loding the JSON data returned from readurl. This helps simplify the
mocking and avoids an unncessary conversion.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
|
|
Refactor _report_ready_if_needed() to work for both Savable PPS
and Runnable PPS:
* rename _report_ready_if_needed() to _report_ready_for_pps()
* return interface name from lease to support _poll_imds() behavior
without changing it.
* fixes an issue where reporting ready return value was silently
ignored for Savable PPS.
* add explicit handling for failure to obtain DHCP lease to
result in sources.InvalidMetaDataException.
Refactor _poll_imds():
* use _report_ready_for_pps() for reporting ready, removing this logic
to simplify loop logic.
* move netlink and vnetswitch out of while loop to simplify loop logic,
leaving only reprovision polling in loop.
* add explicit handling for failure to obtain DHCP lease and
retry in the next iteration.
Signed-off-by: Chris Patterson cpatterson@microsoft.com
|
|
|
|
Consolidate _should_reprovision_after_nic_attach() with
_should_reprovision() into the following:
_write_reprovision_marker() to write provisioning marker for
reboot-during-provisioning case.
PPSType enum and _determine_pps_type() for determining which to
provisioning mode, if any, we're running under.
PPSType.UNKNOWN is when the reprovisioning marker is found and we
do not have the context to know what the original mode was. In this
scenario, we must resort to polling for reprovision data.
Tests:
Introduce a simple data source fixture to for fine-grain
control of mocking with pytest without unittest.
Migrate relevant _should_reprovision() tests into a combination of
TestDeterminePPSTypeScenarios cases.
Signed-off-by: Chris Patterson cpatterson@microsoft.com
|
|
According to the documentation in the tests:
```
We expect 3 calls to report_failure_to_fabric,
because we try 3 different methods of calling report failure.
The different methods are attempted in the following order:
1. Using cached ephemeral dhcp context to report failure to Azure
2. Using new ephemeral dhcp to report failure to Azure
3. Using fallback lease to report failure to Azure
```
Case 1 and 2 make sense. If networking is established, use it.
Should failure occur using current network configuration, retry
with fresh DHCP.
Case 3 suggests that we can fall back to a lease file and retry.
Given that:
1. The wireserver address has never changed to date.
2. The wireserver address should be in the DHCP lease.
3. Parsing the lease file does not improve connectivity over the
prior attempts.
...we can safely remove this case without regression.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Avoid requirement of getattr() and ensure _ephemeral_dhcp_ctx isn't
persisted in the cache.
Signed-off-by: Chris Patterson cpatterson@microsoft.com
|
|
load_azure_ds_dir() always returns a tuple. Instead of saving this
tuple as ret, expand it immediately as md, userdata_raw, cfg, files.
This allows for more fine-grained passing of data before getting
expanded later.
- Update _should_reprovision methods to use cfg instead of tuple.
- Update _should_reprovision methods to remove the ovf_md guard.
This should be a safe refactor as the OVF is not required, and the
config is initialized to an empty dict. In practice, a mount failure
would have initialized ret anyways if the OVF was not found. If a
mount failure wasn't seen and ret was None, this guard could be
causing other failures by ignoring the PPS state that should be
available from IMDS metadata.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
We were seeing issues where if anything showed up before the
expected first adapter, booting could fail. This switches to seeking
for a working interface to handle edge cases.
Also fixes region code handling.
|
|
The if-statement set ovf_is_accessible to True if the OVF is read
from /dev/sr0, but not from other data sources. It defaults to
True, but may get flipped to False while processing an invalid
source, and never get set back to True when reading from the data
directory.
Instead, default ovf_is_accessible to False, and only set it to
True once we've read an OVF successfully (and end the search).
This fixes an error when OVF is read from data_dir and IMDS
data is unavailable (failing with "No OVF or IMDS available").
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
- Update EphemeralDHCPv4WithReporting to subclass EphemeralDHCPv4 for
consistency (non-functional change).
- Replace all usage of EphemeralDHCPv4 with EphemeralDHCPv4WithReporting.
- Converging to one DHCP class exposed an issue with ExitStack patches
being mixed with decorators. Specifically, it appeared that tests
that did not enable azure.EphemeralDHCPv4WithReporting mocks had it
applied anyways from previous tests.
Presumably ExitStack was overwriting the actual value with the
mock provided by the decorator? For now, remove some mock patches
that trigger failures, but future work should move towards a
consistent approach to prevent undetected effects.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
In 2c52e6e88b19f5db8d55eb7280ee27703e05d75f, the order of
reading network config was changed for Oracle due to initramfs
needing to take lower precedence than the datasource. However,
this also bumped system_cfg to a lower precedence than ds, which
means that any network configuration specified in /etc/cloud will not
be applied. system_cfg should instead be moved above ds so network
configuration in /etc/cloud takes precedence.
LP: #1956788
|
|
distutils is getting deprecated soon. Let's replace it with suggested
alternatives as suggested in:
https://www.python.org/dev/peps/pep-0632/
Remove `requests` version check and related code from url_helper.py
as the versions specified are old enough to no longer be relevant.
Signed-off-by: Shreenidhi Shedi <sshedi@vmware.com>
|
|
Sometimes an import might fail for different reasons: the string
is wrongly typed, or the module has a dependency that is not
installed in python.
We should print that there is an import error, otherwise it might be
really difficult to understand what is the root cause of this
issue. Currently, cloud-init just ignores the error and continues.
This can have fatal consequences when used to pick
the datasource to use.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
|
|
Format tweak to match naming conventions for classes & enums.
No functional changes.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
If get_imds_data_with_api_fallback() falls back to the minimum required
API version, it is effectively pinned to the old API version forever.
Remove the failed_desired_api_version property to prevent persistence of
the flag between calls and/or reboots.
The continued presence of this flag in obj.pkl should be harmless.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Applied Black and isort, fixed any linting issues, updated tox.ini
and CI.
|
|
Thanks to [1], the hostname is set prior to network bring-up.
The Azure data source has been bouncing the hostname during
setup(), occurring after the hostname has already been
properly configured.
Note that this doesn't prevent leaking the image's hostname
during Azure's _get_data() when it brings up ephemeral DHCP.
However, as are not guaranteed to have the hostname metadata
available from a truly "local" source, this behavior is to
be expected unless we disable `send host-name` from dhclient
config.
[1]: https://github.com/canonical/cloud-init/commit/133ad2cb327ad17b7b81319fac8f9f14577c04df
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
This attempts to standardize unit test file location under test/unittests/
such that any source file located at cloudinit/path/to/file.py may have a
corresponding unit test file at test/unittests/path/to/test_file.py.
Noteworthy Comments:
====================
Four different duplicate test files existed:
test_{gpg,util,cc_mounts,cc_resolv_conf}.py
Each of these duplicate file pairs has been merged together. This is a
break in git history for these files.
The test suite appears to have a dependency on test order. Changing test
order causes some tests to fail. This should be rectified, but for now
some tests have been modified in
tests/unittests/config/test_set_passwords.py.
A helper class name starts with "Test" which causes pytest to try
executing it as a test case, which then throws warnings "due to Class
having __init__()". Silence by changing the name of the class.
# helpers.py is imported in many test files, import paths change
cloudinit/tests/helpers.py -> tests/unittests/helpers.py
# Move directories:
cloudinit/distros/tests -> tests/unittests/distros
cloudinit/cmd/devel/tests -> tests/unittests/cmd/devel
cloudinit/cmd/tests -> tests/unittests/cmd/
cloudinit/sources/helpers/tests -> tests/unittests/sources/helpers
cloudinit/sources/tests -> tests/unittests/sources
cloudinit/net/tests -> tests/unittests/net
cloudinit/config/tests -> tests/unittests/config
cloudinit/analyze/tests/ -> tests/unittests/analyze/
# Standardize tests already in tests/unittests/
test_datasource -> sources
test_distros -> distros
test_vmware -> sources/vmware
test_handler -> config # this contains cloudconfig module tests
test_runs -> runs
|
|
(#1123)
Allow #cloud-config and cloud-init query to use underscore-delimited
"jinja-safe" key aliases for any instance-data.json keys
containing jinja operator characters.
This provides a means to use Jinja's dot-notation instead of square brackets
and quoting to reference "unsafe" obtain attribute names.
Support for these aliased keys is available to both #cloud-config user-data and
`cloud-init query`.
For example #cloud-config alias access can look like:
{{ ds.config.user_network_config }}
- instead of -
{{ ds.config["user.network-config"] }}
|
|
GCE currently fetches metadata after network has come up. There's no
reason we can't fetch at init-local time, so update GCE to fetch at
init-local time to be more performant and consistent with other
datasources.
|
|
Vultr uses 169.254.169.254 for the metadata server. Some distros are
having trouble with this on IPv6 only servers because the route is
not being assigned to the link-local interface by default as it is in
other distros. This change sets that route before attempting to fetch
the metadata avoiding the current issue.
|
|
Some references were missed in the removal of the agent command
in PR #799. This simply removes the remaining references.
Signed-off-by: Chris Patterson <cpatterson@microsoft.com>
|
|
Some Vultr Datacenters can experience latency in the connection due
to the location of one of the dependant api's. The timouts need to be
adjusted so this isn't a failure in the future.
|
|
LXD now adds cloud-init scoped configuration keys network-config,
user-data and vendor-data. The existing user.user-data,
user.vendor-data, user.network-config and meta-data will be
deprecated in newer LXD.
cloud-init will prefer LXD config keys cloud-init.* keys above
user.* keys even if both are present. Warnings will be emitted
for ignored user.* keys if cloud-init.* overrides are present.
Expectation is that the configuration user.network-config,
user.meta-data, user.user-data and user.vendor-data* keys should
not be present at the same time as the comparable cloud-init.* keys.
|
|
For Debian, the network configure file was named
/etc/network/interfaces.d/50-cloud-init, not the 50-cloud-init.cfg,
related to
https://github.com/canonical/cloud-init/blob/62721ae71057530e41779ff02ce578b7b802a60f/cloudinit/distros/debian.py#L56
the static IP customization on Debian will fail owing to
"source /etc/network/interfaces.d/*.cfg".
This change will fix this issue.
LP: #1950136
|
|
During reprovisioning, VM network will change. fallback nic
should be cleared after use so that it can be re-evaluated after
reprovisioning
|
|
Without UDF support, DS Azure cannot mount the provisioning ISO,
which contains platform metadata necessary to support
pre-provisioning. The required metadata is made available in IMDS
starting with api version 2021-08-01. This change will leverage IMDS
to obtain the required metadata to support pre-preprovisioning if
provisioning ISO was not available.
|
|
Add DataSourceLXD which knows how to talk to the dev-lxd socket to
obtain all instance metadata API:
https://linuxcontainers.org/lxd/docs/master/dev-lxd.
This first branch is to deliver feature parity with the existing
NoCloud datasource which is currently used to intialize LXC instances
on first boot.
Introduce a SocketConnectionPool and LXDSocketAdapter to support
performing HTTP GETs on the following routes which are surfaced by the
LXD host to all containers:
http://unix.socket/1.0/meta-data
http://unix.socket/1.0/config/user.user-data
http://unix.socket/1.0/config/user.network-config
http://unix.socket/1.0/config/user.vendor-data
These 4 routes minimally replace the static content provided in the
following nocloud-net seed files:
/var/lib/cloud/nocloud-net/{meta-data,vendor-data,user-data,network-config}
The intent of this commit is to set a foundation for LXD socket
communication that will allow us to build network hot-plug features
by eventually consuming LXD's websocket upgrade route 1.0/events to
react to network, meta-data and user-data config changes over time.
In the event that no custom network-config is provided, default to the
same network-config definition provided by LXD to the NoCloud
network-config seed file.
Supplemental features above NoCloud datasource:
surface all custom instance data config keys via cloud-init query ds
which aids in discoverability of features/tags/labels as well as
conditional #cloud-config jinja templates operations based on custom
config options.
TBD: better cloud-init query support for dot-delimited keys
|
|
When we added the install hotplug module, we forgot to update the
redhet/cloud-init.spec.in file and allow for execution on /usr/libexec.
This PR adds that functionality.
|
|
In some of the cases, the system-product-name is just google.
This is useful incase of nocloud where we use the disk to load the datasource
|
|
When self.failed_desired_api_version was added to DataSourceAzure, the
attribute was never added to the _unpickle method using the upgrade
framework. This commit adds the attribute.
LP: #1946644
|
|
There is no reason for the ISO missing this functionality.
As discussed in https://github.com/canonical/cloud-init/pull/947/files#r707338489
|
|
CloudStack DNS resolution should be done against
the DNS search domain (with the final dot, DNS
resolution does not work with e.g. Fedora 34)
LP: #1942232
|
|
Due to multiarch, the libdeployPkgPlugin.so is deployed into dir
/usr/lib/<multiarch name>/open-vm-tools, we need to add this path
into search_paths.
LP: #1944946
|
|
OpenNebula 6.1.80 (current dev. version) is introducing new IPv6 gateway
contextualization variable ETHx_IP6_GATEWAY, which mimics existing
variable ETHx_GATEWAY6. The ETHx_GATEWAY6 used until now will
be depracated in future relase (ET spring 2022).
See:
- new variable - https://github.com/OpenNebula/one/commit/e4d2cc11b9f3c6d01b53774b831f48d9d089c1cc
- deprecation tracking issue - https://github.com/OpenNebula/one/issues/5536
Also, added support for SET_HOSTNAME context variable, which is
currently widely used variable to configure guest VM hostname. See
https://docs.opennebula.io/6.0/management_and_operations/references/template.html#context-section
|
|
Add MTU, accept-ra, routes, options and a direct way to provide intact
cloud configs for networking opposed to relying on configurations that
may need changed often.
|
|
Offload Vultr's vendordata assembly to the backend, correct vendordata
storage and parsing, allow passing critical data via the useragent,
better networking configuration for additional interfaces.
|
|
tox: bump the pinned flake8 and pylint version
* pylint: fix W1406 (redundant-u-string-prefix)
The u prefix for strings is no longer necessary in Python >=3.0.
* pylint: disable W1514 (unspecified-encoding)
From https://www.python.org/dev/peps/pep-0597/ (Python 3.10):
The new warning stems form https://www.python.org/dev/peps/pep-0597,
which says:
Developers using macOS or Linux may forget that the default encoding
is not always UTF-8. [...] Even Python experts may assume that the
default encoding is UTF-8. This creates bugs that only happen on Windows.
The warning could be fixed by always specifying encoding='utf-8',
however we should be careful to not break environments which are not
utf-8 (or explicitly state that only utf-8 is supported). Let's silence
the warning for now.
* _quick_read_instance_id: cover the case where load_yaml() returns None
Spotted by pylint:
- E1135 (unsupported-membership-test)
- E1136 (unsubscriptable-object)
LP: #1944414
|
|
Add retries to DatasourceGCE when connecting to GCE.
Sometimes when the trying to fetch the metadata,
cloud-init fails and the fallback datasource NoCloud is used which is
not expected. Add retries to ensure loading of the data source.
|