Age | Commit message (Collapse) | Author |
|
Changes:
* Only merge in default Azure cloud ephemeral disk configs
during DataSourceAzure._get_data() if the ephemeral disk
exists.
* DataSourceAzure.address_ephemeral_resize() (which is
invoked in DataSourceAzure.activate() should only set up
the ephemeral disk if the disk exists.
Azure VMs may or may not come with ephemeral resource disks
depending on the VM SKU. For VM SKUs that come with
ephemeral resource disks, the Azure platform guarantees that
the ephemeral resource disk is attached to the VM before
the VM is booted. For VM SKUs that do not come with
ephemeral resource disks, cloud-init currently attempts
to wait and set up a non-existent ephemeral resource
disk, which wastes boot time. It also causes disk setup
modules to fail (due to non-existent references to the
ephemeral resource disk).
udevadm settle is invoked by cloud-init very early in boot.
udevadm settle is invoked very early, before
DataSourceAzure's _get_data() and activate() methods.
Within DataSourceAzure's _get_data() and activate() methods,
the ephemeral resource disk path should exist if the
VM SKU comes with an ephemeral resource disk.
The ephemeral resource disk path should not exist if the
VM SKU does not come with an ephemeral resource disk.
LP: #1901011
|
|
Kernel's newer than 4.15 present /sys/dmi/id/product_uuid as a
lowercase value. Previously UUID was uppercase.
Azure datasource reads the product_uuid directly as their platform's
instance-id. This presents a problem if a kernel is either
upgraded or downgraded across the 4.15 kernel version boundary because
the case of the UUID will change, resulting in cloud-init seeing a
"new" instance id and re-running all modules.
Re-running cc_ssh in cloud-init deletes and regenerates ssh_host keys
on a system which can cause concern on long-running instances that
somethingnefarious has happened.
Also add:
- An integration test for this for Azure Bionic Ubuntu FIPS upgrading from
a FIPS kernel with uppercase UUID to a lowercase UUID in linux-azure
- A new pytest.mark.sru_next to collect all integration tests related to our
next SRU
LP: #1835584
|
|
New datasource utilizing UpCloud metadata API, including relevant unit
tests and documentation.
|
|
Add support for openstack's dynamic vendor data, which appears under openstack/latest/vendor_data2.json
This adds vendor_data2 to all pathways; it should be a no-op for non-OpenStack providers.
LP: #1841104
|
|
Since version 1.9.1, @includedir can be used in the sudoers files
instead of #includedir:
https://github.com/sudo-project/sudo/releases/tag/SUDO_1_9_1
Actually "@includedir" is the modern syntax, and "#includedir" the historic
syntax. It has been considered that "#includedir" was too puzzling because
it started with a "#" that otherwise denotes comments.
This happens to be the default in SUSE Linux enterprise sudoer package,
so cloudinit should take this into account.
Otherwise, cloudinit was adding an extra #includedir, which was
resulting on the files under /etc/sudoers.d being included twice, one by
@includedir from the SUSE package, one by the @includedir from
cloudinit. The consequence of this, was that if you were defining an
Cmnd_Alias inside any of those files, this was being defined twice and
creating an error when using sudo.
|
|
This reverts commit b0e73814db4027dba0b7dc0282e295b7f653325c.
|
|
This feature will modify VMware datasource to read from meta data and user data which are specified by VMware vSphere user. If meta data/user data are found in cloud-init configuration directory, datasource will parse the meta data/network and user data from the configuration file, otherwise it will continue to parse them from traditional customization configuration file as before. The supported meta data file is in json or yaml format.
|
|
Route '-net' parameter is incompatible with /32 IPv4 addresses so we
have to use '-host' in that case.
|
|
With the changes for SSH public keys to be retrieved from IMDS as a
first option, when a key is passed through not in the raw SSH public key
format it causes an issue and the key is not added to the user's
authorized_keys file.
This PR will temporarily disable this behavior until a permanent fix is
put in place.
|
|
IPV6_AUTOCONF needs to be set to 'no' on RHEL so NetworkManager can
properly acquire ipv6 address.
rhbz: #1859695
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
|
|
This refactors cc_ca_certs to support non-ca-certificates distros, and
adds RHEL support.
|
|
Prevent network interfaces without IP addresses from being added to the
generated network configuration.
|
|
CA Cert tests will fail on systems that don't have ca-certificates
installed and configured.
Signed-off-by: Daniel Watkins <oddbloke@ubuntu.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
|
|
BOOTPROTO needs to be set to 'dhcp' on RHEL so NetworkManager can
properly acquire ipv6 address.
rhbz: #1859695
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Co-authored-by: Daniel Watkins <oddbloke@ubuntu.com>
Co-authored-by: Scott Moser <smoser@brickies.net>
|
|
Adds the ability to run the Azure preprovisioned VMs as NIC-less and
then hot-attach them when assigned for reprovision.
The NIC on the preprovisioned VM is hot-detached as soon as it reports
ready and goes into wait for one or more interfaces to be hot-attached.
Once they are attached, cloud-init gets the expected number of NICs (in
case there are more than one) that will be attached from IMDS and waits
until all of them are attached. After all the NICs are attached,
reprovision proceeds as usual.
|
|
On FreeBSD, if a UFS has trim: (-t) or MAC multilabel: (-l) flag, resize
FS fail, because the _can_skip_ufs_resize check gets tripped up by the
missing options.
This was reported at FreeBSD Bugzilla:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=250496 and as
LP: #1901958
Rather than fixing the parser as in the patches proposed there (and
attempted in #636) this pull-request rips out all of it, and simplifies
the code. We now use `growfs -N` and check if that returns an error. If
it returns the correct kind of error, we can skip the resize, because we
either are at the correct size, or the filesystem in question is broken
or not UFS. If it returns the wrong kind of error, we just re-raise it.
LP: #1901958
|
|
Pushing dmesg log to KVP to help troubleshoot VM boot issues
|
|
cc_set_password will only update the password for the default user if
cfg['password'] is set. The existing code of datasource Azure will fail
to update the default user's password because it does not set that
metadata. If the default user doesn't exist in the image, the current
code works fine because the password is set during user create and
not in cc_set_password
|
|
Increase Azure Endpoint HTTP retries to handle
occasional platform network blips.
Introduce a common method http_with_retries
in the azure.py helper, which will serve as
the common HTTP request handler for
all HTTP requests with the Azure endpoint.
This method has builtin retries and
reporting diagnostics logic.
|
|
On systems where the Azure datasource
is a viable platform for crawling metadata,
cloud-init occasionally encounters fatal
irrecoverable errors during the crawling
of the Azure datasource.
When this happens, cloud-init crashes,
and Azure VM provisioning would fail.
However, instead of failing immediately,
the user will continue seeing provisioning
for a long time until it times out with
"OS Provisioning Timed Out" message.
In these situations, cloud-init should
report failure to the Azure datasource
endpoint indicating provisioning failure.
The user will immediately see provisioning
terminate, giving them a much better
failure experience instead of pointlessly
waiting for OS provisioning timeout.
|
|
Allow root user to validate the userdata provided to the launched
machine using `cloud-init devel schema --system`
|
|
Add code so that specifying "wakeonlan: true" actually results in relevant
configuration entry appearing in /etc/network/interfaces, Netplan, and
sysconfig for RHEL and OpenSuse.
Add testcases for the above.
|
|
FreeBSD lets us read out kernel parameters with kenv(1), a user-space
utility that's shipped in "base" We can use it in place of dmidecode(8),
thus removing the dependency on sysutils/dmidecode, and the restrictions
to i386 and x86_64 architectures that this utility imposes on FreeBSD.
Co-authored-by: Scott Moser <smoser@brickies.net>
|
|
This allows the cloud-init log to be pushed multiple times during boot,
with the latest lines being pushed each time.
|
|
FreeBSD doesn't have blkid, so we want to use geom to list devices and
their fstypes and labels.
This PR also adds `jail` to the list of is_container()
And we now also properly cache geom and blkid output!
A test is added to verify the new behaviour by correctly identifying
NoCloud on FreeBSD.
Co-authored-by: Scott Moser <smoser@brickies.net>
|
|
This just separates the reading of dmi values into its own file.
Some things of note:
* left import of util in dmi.py only for 'is_container'
It'd be good if is_container was not in util.
* just the use of 'util.is_x86' to dmi.py
* open() is used directly rather than load_file.
|
|
Fixes erroneous string/int comparison introduced in 1431c8a
metadata['instance-id'] is an integer but the value read from smbios is
a string. The comparision would cause TypeError.
|
|
Hetzner Cloud also provides the instance ID in SMBIOS information. Use
it to locally check_instance_id and to compared with instance_id from
metadata service.
LP: #1885527
|
|
The static and static6 subnet types for network_data.json were
being ignored by the Openstack handler, this would cause the code to
break and not function properly.
As of today, if a static6 configuration is chosen, the interface will
still eventually be available to receive router advertisements or be set
from NetworkManager to wait for them and cycle the interface in negative
case.
It is safe to assume that if the interface is manually configured to use
static ipv6 address, there's no need to wait for router advertisements.
This patch will set automatically IPV6_AUTOCONF and IPV6_FORCE_ACCEPT_RA
both to "no" in this case.
This patch fixes the specific behavior only for RHEL flavor and
sysconfig renderer. It also introduces new unit tests for the specific
case as well as adjusts some existent tests to be compatible with the
new options. This patch also addresses this problem by assigning the
appropriate subnet type for each case on the openstack handler.
rhbz: #1889635
rhbz: #1889635
Signed-off-by: Eduardo Otubo otubo@redhat.com
|
|
Reliable Scalable Cluster Technology (RSCT) is a set of software
components that together provide a comprehensive clustering
environment(RAS features) for IBM PowerVM based virtual machines. RSCT
includes the Resource Monitoring and Control (RMC) subsystem. RMC is a
generalized framework used for managing, monitoring, and manipulating
resources. RMC runs as a daemon process on individual machines and needs
creation of unique node id and restarts during VM boot.
LP: #1895979
Co-authored-by: Scott Moser <smoser@brickies.net>
|
|
Also update MAC addresses used in testcases to remove quotes where not
required and add single quotes where quotes are required.
|
|
Gentoo's hostname file format instead of being just the host name
is hostname=thename". The old code works fine when the file has no comments
but if there is a comment the line
```
gentoo_hostname_config = 'hostname="%s"' % conf
```
can render an invalid hostname file that looks similar to
```
hostname="#This is the host namehello"
```
The fix inserts the hostname in a gentoo friendly way so that it gets
handled by HostnameConf as a whole and comments are handled and preserved
|
|
update_resolve_conf_file is no longer used. The last reference
to it was removed in c3680475f9c970, which was itself a "remove dead
code" commit.
|
|
The following commit merged all ssh keys into a default user file
`~/.ssh/authorized_keys` in sshd_config had multiple files configured for
AuthorizedKeysFile:
commit f1094b1a539044c0193165a41501480de0f8df14
Author: Eduardo Otubo <otubo@redhat.com>
Date: Thu Dec 5 17:37:35 2019 +0100
Multiple file fix for AuthorizedKeysFile config (#60)
This commit ignored the case when sshd_config would have a single file for
AuthorizedKeysFile, but a non default configuration, for example
`~/.ssh/authorized_keys_foobar`. In this case cloud-init would grab all keys
from this file and write a new one, the default `~/.ssh/authorized_keys`
causing the bug.
rhbz: #1862967
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
|
|
DataSourceAzure previously writes the preprovisioning
reported ready marker file before it goes through the
report ready workflow. On certain VM instances, the
marker file is successfully written but then reporting
ready fails.
Upon rare VM reboots by the platform, cloud-init sees
that the report ready marker file already exists.
The existence of this marker file tells cloud-init
not to report ready again (because it mistakenly
assumes that it already reported ready in
preprovisioning).
In this scenario, cloud-init instead erroneously
takes the reprovisioning workflow instead of
reporting ready again.
|
|
Consider valid product names as valid chassis asset tags when detecting
OpenStack platform before crawling for OpenStack metadata.
As `ds-identify` tool uses product name as valid chassis asset tags,
let's replicate the behaviour in the OpenStack platform detection too.
This change should be backwards compatible and a temporary fix for the
current limitations on the OpenStack platform detection.
LP: #1895976
|
|
This moves logging into `report_diagnostic_event`, to clean up its callsites.
|
|
enumeration of physical network devices (#591)
|
|
fails (#549)
Azure datasource's `parse_network_config` throws a fatal uncaught exception when an exception is raised during generation of network config from IMDS metadata. This happens when IMDS metadata is invalid/corrupted (such as when it is missing network or interface metadata). This causes the rest of provisioning to fail.
This changes `parse_network_config` to be a non-fatal implementation. Additionally, when generating network config from IMDS metadata fails, fall back on generating fallback network config (`_generate_network_config_from_fallback_config`).
This also changes fallback network config generation (`_generate_network_config_from_fallback_config`) to blacklist an additional driver: `mlx5_core`.
|
|
|
|
Under FreeBSD, we want to use "shutdown -p" for poweroff.
Alpine Linux also has some specificities.
We choose to define a method that returns the shutdown command line to
use, rather than a method that actually does the shutdown. This makes it
easier to have the tests in test_handler_power_state do their
verifications.
Two tests are added for the special behaviours that are known so far.
|
|
Prior to this change, vlans were rendered in sysconfig with
'TYPE=Ethernet', and incorrectly rendered the PHYSDEV based on
the name of the vlan device rather than the 'link' provided
in the network config.
The change here fixes:
* rendering of TYPE=Ethernet for a vlan
* adds a warning if the configured device name is not supported
per the RHEL 7 docs "11.5. Naming Scheme for VLAN Interfaces"
LP: #1788915
LP: #1826608
RHBZ: #1861871
|
|
* pull ssh keys from imds first and fall back to ovf if unavailable
* refactor log and diagnostic messages
* refactor the OpenSSLManager instantiation and certificate usage
* fix unit test where exception was being silenced for generate cert
* fix tests now that certificate is not always generated
* add documentation for ssh key retrieval
* add ability to check if http client has security enabled
* refactor certificate logic to GoalState
|
|
* LXD: detach network from profile before deleting it
When cleaning up the bridge network created by default by LXD as part
of the `lxd init` process detach the network its profile before deleting
it. LXD will otherwise refuse to delete it with error:
Error: The network is currently in use.
Discussion with LXD upstream: https://github.com/lxc/lxd/issues/7804.
LP: #1776958
* LXD bridge deletion: fail if bridge exists but can't be deleted
* LXD bridge deletion: remove useless failure logging
|
|
This fixes a long delay during boot of some instances. For Azure instance types using SR-IOV via the Hyper-V netvsc network driver, two network interfaces are created that share the same MAC, but only the virtual device should be configured and used. Updating the netplan configuration to filter on the hv_netvsc driver prevents netplan from trying to figure both devices.
LP: #1830740
|
|
Update ssh_util.py with latest list of keys (from openssh-8.3p1/sshkey.c),
Added keys:
sk-ecdsa-sha2-nistp256-cert-v01@openssh.com
sk-ecdsa-sha2-nistp256@openssh.com
sk-ssh-ed25519-cert-v01@openssh.com
sk-ssh-ed25519@openssh.com
ssh-xmss-cert-v01@openssh.com
ssh-xmss@openssh.com
LP: #1877869
|
|
Push the cloud-init.log file (Up to 500KB at once) to the KVP before reporting ready to the Azure platform.
Based on the analysis done on a large sample of cloud-init.log files, Here's the statistics collected on the log file size:
P50 P90 P95 P99 P99.9 P99.99
137K 423K 537K 3.5MB 6MB 16MB
This change limits the size of cloud-init.log file data that gets dumped to KVP to 500KB. So for ~95% of the cases, the whole log file will be dumped and for the remaining ~5%, we will get the last 500KB of the cloud-init.log file.
To asses the performance of the 500KB limit, 250 VM were deployed with a 500KB cloud-init.log file and the time taken to compress, encode and dump the entries to KVP was measured. Here's the time in milliseconds percentiles:
P50 P99 P999
75.705 232.701 1169.636
Another 250 VMs were deployed with this logic dumping their normal cloud-init.log file to KVP, the same timing was measured as above. Here's the time in milliseconds percentiles:
P50 P99 P999
1.88 5.277 6.992
Added excluded_handlers to the report_event function to be able to opt-out from reporting the events of the compressed cloud-init.log file to the cloud-init.log file.
The KVP break_down logic had a bug, where it will reuse the same key for all the split chunks of KVP which results in overwriting the split KVPs by the last one when consumed by Hyper-V. I added the split chunk index as a differentiator to the KVP key.
The Hyper-V consumes the KVPs from the KVP file as chunks whose key is 512KB and value is 2048KB but the Azure platform expects the value to be 1024KB, thus I introduced the Azure value limit.
|
|
Add new module cc_apk_configure for creating Alpine /etc/apk/repositories file.
Modify cc_ca_certs, cc_ntp, cc_power_state_change, and cc_resolv_conf for Alpine.
Add Alpine template files for Chrony and Busybox NTP support.
Add Alpine template file for /etc/hosts.
|
|
According to man page `man 8 swapon', "Preallocated swap files are
supported on XFS since Linux 4.18". This patch checks for kernel version
before attepting to create swapfile, using dd for XFS only on kernel
versions <= 4.18 or btrfs.
Add new func util.kernel_version which returns a tuple of ints (major, minor)
Signed-off-by: Eduardo Otubo otubo@redhat.com
|
|
This PR refactors Azure report ready code to include more robust tests and telemetry.
|