summaryrefslogtreecommitdiff
path: root/cloudinit/sources/DataSourceAzure.py
AgeCommit message (Collapse)Author
2020-11-23Ability to hot-attach NICs to preprovisioned VMs before reprovisioning (#613)aswinrajamannar
Adds the ability to run the Azure preprovisioned VMs as NIC-less and then hot-attach them when assigned for reprovision. The NIC on the preprovisioned VM is hot-detached as soon as it reports ready and goes into wait for one or more interfaces to be hot-attached. Once they are attached, cloud-init gets the expected number of NICs (in case there are more than one) that will be attached from IMDS and waits until all of them are attached. After all the NICs are attached, reprovision proceeds as usual.
2020-11-18DataSourceAzure: update password for defuser if exists (#671)Anh Vo
cc_set_password will only update the password for the default user if cfg['password'] is set. The existing code of datasource Azure will fail to update the default user's password because it does not set that metadata. If the default user doesn't exist in the image, the current code works fine because the password is set during user create and not in cc_set_password
2020-11-18DataSourceAzure: send failure signal on Azure datasource failure (#594)Johnson Shi
On systems where the Azure datasource is a viable platform for crawling metadata, cloud-init occasionally encounters fatal irrecoverable errors during the crawling of the Azure datasource. When this happens, cloud-init crashes, and Azure VM provisioning would fail. However, instead of failing immediately, the user will continue seeing provisioning for a long time until it times out with "OS Provisioning Timed Out" message. In these situations, cloud-init should report failure to the Azure datasource endpoint indicating provisioning failure. The user will immediately see provisioning terminate, giving them a much better failure experience instead of pointlessly waiting for OS provisioning timeout.
2020-11-02cloudinit: move dmi functions out of util (#622)Scott Moser
This just separates the reading of dmi values into its own file. Some things of note: * left import of util in dmi.py only for 'is_container' It'd be good if is_container was not in util. * just the use of 'util.is_x86' to dmi.py * open() is used directly rather than load_file.
2020-10-16DataSourceAzure: write marker file after report ready in preprovisioning (#590)Johnson Shi
DataSourceAzure previously writes the preprovisioning reported ready marker file before it goes through the report ready workflow. On certain VM instances, the marker file is successfully written but then reporting ready fails. Upon rare VM reboots by the platform, cloud-init sees that the report ready marker file already exists. The existence of this marker file tells cloud-init not to report ready again (because it mistakenly assumes that it already reported ready in preprovisioning). In this scenario, cloud-init instead erroneously takes the reprovisioning workflow instead of reporting ready again.
2020-10-15azure: clean up and refactor report_diagnostic_event (#563)Johnson Shi
This moves logging into `report_diagnostic_event`, to clean up its callsites.
2020-10-13net: add the ability to blacklist network interfaces based on driver during ↵Anh Vo
enumeration of physical network devices (#591)
2020-09-24Azure parse_network_config uses fallback cfg when generate IMDS network cfg ↵Johnson Shi
fails (#549) Azure datasource's `parse_network_config` throws a fatal uncaught exception when an exception is raised during generation of network config from IMDS metadata. This happens when IMDS metadata is invalid/corrupted (such as when it is missing network or interface metadata). This causes the rest of provisioning to fail. This changes `parse_network_config` to be a non-fatal implementation. Additionally, when generating network config from IMDS metadata fails, fall back on generating fallback network config (`_generate_network_config_from_fallback_config`). This also changes fallback network config generation (`_generate_network_config_from_fallback_config`) to blacklist an additional driver: `mlx5_core`.
2020-09-10Retrieve SSH keys from IMDS first with OVF as a fallback (#509)Thomas Stringer
* pull ssh keys from imds first and fall back to ovf if unavailable * refactor log and diagnostic messages * refactor the OpenSSLManager instantiation and certificate usage * fix unit test where exception was being silenced for generate cert * fix tests now that certificate is not always generated * add documentation for ssh key retrieval * add ability to check if http client has security enabled * refactor certificate logic to GoalState
2020-08-25tox: bump the pylint version to 2.6.0 in the default run (#544)Paride Legovini
Changes: tox: bump the pylint version to 2.6.0 in the default run Fix pylint 2.6.0 W0707 warnings (raise-missing-from)
2020-08-24Azure: Add netplan driver filter when using hv_netvsc driver (#539)James Falcon
This fixes a long delay during boot of some instances. For Azure instance types using SR-IOV via the Hyper-V netvsc network driver, two network interfaces are created that share the same MAC, but only the virtual device should be configured and used. Updating the netplan configuration to filter on the hv_netvsc driver prevents netplan from trying to figure both devices. LP: #1830740
2020-08-20Pushing cloud-init log to the KVP (#529)Moustafa Moustafa
Push the cloud-init.log file (Up to 500KB at once) to the KVP before reporting ready to the Azure platform. Based on the analysis done on a large sample of cloud-init.log files, Here's the statistics collected on the log file size: P50 P90 P95 P99 P99.9 P99.99 137K 423K 537K 3.5MB 6MB 16MB This change limits the size of cloud-init.log file data that gets dumped to KVP to 500KB. So for ~95% of the cases, the whole log file will be dumped and for the remaining ~5%, we will get the last 500KB of the cloud-init.log file. To asses the performance of the 500KB limit, 250 VM were deployed with a 500KB cloud-init.log file and the time taken to compress, encode and dump the entries to KVP was measured. Here's the time in milliseconds percentiles: P50 P99 P999 75.705 232.701 1169.636 Another 250 VMs were deployed with this logic dumping their normal cloud-init.log file to KVP, the same timing was measured as above. Here's the time in milliseconds percentiles: P50 P99 P999 1.88 5.277 6.992 Added excluded_handlers to the report_event function to be able to opt-out from reporting the events of the compressed cloud-init.log file to the cloud-init.log file. The KVP break_down logic had a bug, where it will reuse the same key for all the split chunks of KVP which results in overwriting the split KVPs by the last one when consumed by Hyper-V. I added the split chunk index as a differentiator to the KVP key. The Hyper-V consumes the KVPs from the KVP file as chunks whose key is 512KB and value is 2048KB but the Azure platform expects the value to be 1024KB, thus I introduced the Azure value limit.
2020-07-22azure: disable bouncing hostname when setting hostname fails (#494)Anh Vo
DataSourceAzure: Gracefully handle the case of set hostname failure during provisioning
2020-07-15DataSourceAzure: Use ValueError when JSONDecodeError is not available (#490)Anh Vo
JSONDecodeError is only available in Python 3.5+. When it isn't available (i.e. on Python 3.4, which cloud-init still supports) use the more generic ValueError.
2020-07-15cloudinit: remove global disable of pylint W0107 and fix errors (#489)Daniel Watkins
* cloudinit: remove global disable of pylint W0107 and fix errors This includes removing a test class which contained no tests but wasn't detected as empty because of an errant pass statement. * .pylintrc: update disable comment to match arguments
2020-07-13cloudinit: remove global disable of pylint W0105 and fix errors (#480)Daniel Watkins
This includes a fix to a test that had a string concatenation issue, and so was only testing a prefix of what was intended.
2020-06-19printing the error stream of the dhclient process before killing it (#369)Moustafa Moustafa
This introduces a way to log the dhclient error stream, and uses it for the Azure datasource (where we have a specific requirement for this data to be logged).
2020-06-10test: fix all flake8 E126 errors (#425)Joshua Powers
2020-06-08Move subp into its own module. (#416)Scott Moser
This was painful, but it finishes a TODO from cloudinit/subp.py. It moves the following from util to subp: ProcessExecutionError subp which target_path I moved subp_blob_in_tempfile into cc_chef, which is its only caller. That saved us from having to deal with it using write_file and temp_utils from subp (which does not import any cloudinit things now). It is arguable that 'target_path' could be moved to a 'path_utils' or something, but in order to use it from subp and also from utils, we had to get it out of utils.
2020-06-03Enhance poll imds logging (#365)Moustafa Moustafa
Improving the debugability of this code path by logging the thrown exception details for the non 404 exceptions. Retry IMDS on HTTP Error 404 and 410, re-run DHCP on other exceptions.
2019-12-18cloud-init: fix capitalisation of SSH (#126)Daniel Watkins
* cc_ssh: fix capitalisation of SSH * doc: fix capitalisation of SSH * cc_keys_to_console: fix capitalisation of SSH * ssh_util: fix capitalisation of SSH * DataSourceIBMCloud: fix capitalisation of SSH * DataSourceAzure: fix capitalisation of SSH * cs_utils: fix capitalisation of SSH * distros/__init__: fix capitalisation of SSH * cc_set_passwords: fix capitalisation of SSH * cc_ssh_import_id: fix capitalisation of SSH * cc_users_groups: fix capitalisation of SSH * cc_ssh_authkey_fingerprints: fix capitalisation of SSH
2019-12-12azure: avoid re-running cloud-init when instance-id is byte-swapped (#84)AOhassan
Azure stores the instance ID with an incorrect byte ordering for the first three hyphen delimited parts. This results in invalid is_new_instance checks forcing Azure datasource to recrawl the metadata service. When persisting instance-id from the metadata service, swap the instance-id string byte order such that it is consistent with that returned by dmi information. Check whether the instance-id string is a byte-swapped match when determining correctly whether the Azure platform instance-id has actually changed.
2019-11-13azure: support secondary ipv6 addresses (#33)Chad Smith
Azure's Instance Metadata Service (IMDS) reports multiple IPv6 addresses, via the http://169.254.169.254/metadata/instance/network route. Any additional values after the first in 'ipAddresses' under the 'ipv6' interface key are extracted and configured as static IPs on the interface.
2019-11-04azure: support matching dhcp route-metrics for dual-stack ipv4 ipv6Chad Smith
Network v2 configuration for Azure will set both dhcp4 and dhcp6 to False by default. When IPv6 privateIpAddresses are present for an interface in Azure's Instance Metadata Service (IMDS), set dhcp6: True and provide a route-metric value that will match the corresponding dhcp4 route-metric. The route-metric value will increase by 100 for each additional interface present to ensure the primary interface has a route to IMDS. Also fix dhcp route-metric rendering for eni and sysconfig distros. LP: #1850308
2019-10-29azure: Do not lock user on instance id changeSam Eiderman
After initial boot ovf-env.xml is copied to agent dir (/var/lib/waagent/) with REDACTED password. On subsequent boots DataSourceAzure loads with a configuration where the user specified in /var/lib/waagent/ovf-env.xml is locked. If instance id changes, cc_users_groups action will lock the user. Fix this behavior by not locking the user if its password is REDACTED. LP: #1849677
2019-08-14Azure: Record boot timestamps, system information, and diagnostic eventsAnh Vo
Collect and record the following information through KVP:  + timestamps related to kernel initialization and systemd activation    of cloud-init services  + system information including cloud-init version, kernel version,    distro version, and python version  + diagnostic events for the most common provisioning error issues    such as empty dhcp lease, corrupted ovf-env.xml, etc. + increasing the log frequency of polling IMDS during reprovision.
2019-08-13azure/net: generate_fallback_nic emits network v2 config instead of v1Chad Smith
The function generate_fallback_config is used by Azure by default when not consuming IMDS configuration data. This function is also used by any datasource which does not implement it's own network config. This simple fallback configuration sets up dhcp on the most likely NIC. It will now emit network v2 instead of network v1. This is a step toward moving all components talking in v2 and allows us to avoid costly conversions between v1 and v2 for newer distributions which rely on netplan.
2019-06-25azure: add region and AZ properties from imds compute location metadataChad Smith
This allows cloud-init query region to show valid region data for Azure
2019-05-08DataSourceAzure: Adjust timeout for polling IMDSAnh Vo
If the IMDS primary server is not available, falling back to the secondary server takes about 1s. The net result is that the expected E2E time is slightly more than 1s. This change increases the timeout to 2s to prevent the infinite loop of timeouts.
2019-04-18mount_cb: do not pass sync and rw options to mountGonéri Le Bouder
On FreeBSD, mount_cd9660 does not accept the sync option that is enabled by default. In addition, the sync is only useful with the `rw` mode. However the `rw` mode was never used. This patch removes the `rw` and `sync` parameter of `mount_cb` to simplify the code base and resolve the FreeBSD issue. LP: #1645824
2019-04-03Azure: Treat _unset network configuration as if it were absentJason Zions (MSFT)
When the Azure datasource persists all of its metadata to the instance directory, it deliberately sets the self.network_config value to be the sources.UNSET value. The goal is to ensure that each time the system boots, fresh network configuration data is fetched from the cloud platform so that any control plane changes will take effect. When a VM is first created, there's no pickled instance to restore, so self._network_config is None, resulting in self.network_config() properly building a new config. Azure suffered from LP: #1801364 which prevented ds from being stored in obj.pkl in the instance directory, so subsequent reboots always regenerated their network configuration. Commit 0dc3a77f41f4544e4cb5a41637af7693410d4cdf introduced a new bug in which self.network_config() assumed the self._network_config value was either None or trustable; when the config was unpickled, that value was _unset, thus breaking the assumption. LP: #1823084
2019-04-03DatasourceAzure: add additional logging for azure datasourceAnh Vo
Create an Azure logging decorator and use additional ReportEventStack context managers to provide additional logging details.
2019-03-26Azure: Ensure platform random_seed is always serializable as JSON.Jason Zions (MSFT)
The Azure platform surfaces random bytes into /sys via Hyper-V. Python 2.7 json.dump() raises an exception if asked to convert a str with non-character content, and python 3.0 json.dump() won't serialize a "bytes" value. As a result, c-i instance data is often not written by Azure, making reboots slower (c-i has to repeat work). The random data is base64-encoded and then decoded into a string (str or unicode depending on the version of Python in use). The base64 string has just as many bits of entropy, so we're not throwing away useful "information", but we can be certain json.dump() will correctly serialize the bits.
2019-02-22azure: Filter list of ssh keys pulled from fabricJason Zions (MSFT)
The Azure data source is expected to expose a list of ssh keys for the user-to-be-provisioned in the crawled metadata. When configured to use the __builtin__ agent this list is built by the WALinuxAgentShim. The shim retrieves the full set of certificates and public keys exposed to the VM from the wireserver, extracts any ssh keys it can, and returns that list. This fix reduces that list of ssh keys to just the ones whose fingerprints appear in the "administrative user" section of the ovf-env.xml file. The Azure control plane exposes other ssh keys to the VM for other reasons, but those should not be added to the authorized_keys file for the provisioned user.
2019-01-15[Azure] Increase retries when talking to Wireserver during metadata walkJason Zions
Testing startup of large numbers of VMs (of varying distros) in Azure shows that 3 retries results in a small percentage of failed VMs. Increasing that by a few dramatically decreases the occurrence of provisioning timeout errors. The initial choice of "3 retries" was uninformed by heavy testing. Also, the alternate provisioning mechanism for Azure (waagent) retries the Wireserver crawl without limit. 10 retries seems a more reasonable choice.
2018-12-14Update to pylint 2.2.2.Scott Moser
The tip-pylint tox target correctly reported the invalid use of string formatting. The change here is to: a.) Fix the error that was caught. b.) move to pylint 2.2.2 for the default 'pylint' target.
2018-11-29azure: detect vnet migration via netlink media change eventTamilmani Manoharan
Replace Azure pre-provision polling on IMDS with a blocking call which watches for netlink link state change messages. The media change event happens when a pre-provisioned VM has been activated and is connected to the users virtual network and cloud-init can then resume operation to complete image instantiation.
2018-11-29Azure: fix copy/paste error in error handling when reading azure ovf.Adam DePue
Check the appropriate variables based on code review. Correcting what seems to be a copy/paste mistake for the error handling from a few lines above.
2018-11-15azure: _poll_imds only retry on 404. Fail on TimeoutChad Smith
Upon URL timeout, _poll_imds is expected to re-dhcp to get updated IP configuration. We don't want to indefinitely retry because the instance likely has invalid IP configuration. LP: #1803598
2018-11-13azure: retry imds polling on requests.TimeoutChad Smith
There is an infrequent race when the booting instance can hit the IMDS service before it is fully available. This results in a requests.ConnectTimeout being raised. Azure's retry_callback logic now retries on either 404s or Timeouts. LP:1800223
2018-11-12azure: Accept variation in error msg from mount for ntfs volumesJason Zions
If Azure detects an ntfs filesystem type during mount attempt, it should still report the resource device as reformattable. There are slight differences in error message format on RedHat and SuSE. This patch simplifies the expected error match to work on both distributions. LP: #1799338
2018-11-12azure: fix regression introduced when persisting ephemeral dhcp leaseasakkurr
In commitish 9073951 azure datasource tried to leverage stale DHCP information obtained from EphemeralDHCPv4 context manager to report updated provisioning status to the fabric earlier in the boot process. Unfortunately the stale ephemeral network configuration had already been torn down in preparation to bring up IMDS network config so the report attempt failed on timeout. This branch introduces obtain_lease and clean_network public methods on EphemeralDHCPv4 to allow for setup and teardown of ephemeral network configuration without using a context manager. Azure datasource now uses this to persist ephemeral network configuration across multiple contexts during provisioning to avoid multiple DHCP roundtrips.
2018-11-01azure: remove /etc/netplan/90-hotplug-azure.yaml when net from IMDSChad Smith
There was a typo in the seeded filename s/azure-hotplug/hotplug-azure/.
2018-10-31azure: report ready to fabric after reprovision and reduce loggingasakkurr
When reusing a preprovisioned VM, report ready to Azure fabric as soon as we get the reprovision data and the goal state so that we are not delayed by the cloud-init stage switch, saving 2-3 seconds. Also reduce logging when polling IMDS for reprovision data. LP: #1799594
2018-10-17azure: Add apply_network_config option to disable network from IMDSChad Smith
Azure generates network configuration from the IMDS service and removes any preexisting hotplug network scripts which exist in Azure cloud images. Add a datasource configuration option which allows for writing a default network configuration which sets up dhcp on eth0 and leave the hotplug handling to the cloud-image scripts. To disable network-config from Azure IMDS, add the following to /etc/cloud/cloud.cfg.d/99-azure-no-imds-network.cfg: datasource:   Azure:     apply_network_config: False LP: #1798424
2018-10-09instance-data: Add standard keys platform and subplatform. Refactor ec2.Chad Smith
Add the following instance-data.json standardized keys: * v1._beta_keys: List any v1 keys in beta development, e.g. ['subplatform']. * v1.public_ssh_keys: List of any cloud-provided ssh keys for the instance. * v1.platform: String representing the cloud platform api supporting the datasource. For example: 'ec2' for aws, aliyun and brightbox cloud names. * v1.subplatform: String with more details about the source of the metadata consumed. For example, metadata uri, config drive device path or seed directory. To support the new platform and subplatform standardized instance-data, DataSource and its subclasses grew platform and subplatform attributes. The platform attribute defaults to the lowercase string datasource name at self.dsname. This method is overridden in NoCloud, Ec2 and ConfigDrive datasources. The subplatform attribute calls a _get_subplatform method which will return a string containing a simple slug for subplatform type such as metadata, seed-dir or config-drive followed by a detailed uri, device or directory path where the datasource consumed its configuration. As part of this work, DatasourceEC2 methods _get_data and _crawl_metadata have been refactored for a few reasons: - crawl_metadata is now a read-only operation, persisting no attributes on the datasource instance and returns a dictionary of consumed metadata. - crawl_metadata now closely represents the raw stucture of the ec2 metadata consumed, so that end-users can leverage public ec2 metadata documentation where possible. - crawl_metadata adds a '_metadata_api_version' key to the crawled ds.metadata to advertise what version of EC2's api was consumed by cloud-init. - _get_data now does all the processing of crawl_metadata and saves datasource instance attributes userdata_raw, metadata etc. Additional drive-bys: * unit test rework for test_altcloud and test_azure to simplify mocks and make use of existing util and test_helpers functions.
2018-08-17azure: allow azure to generate network configuration from IMDS per boot.Chad Smith
Azure datasource now queries IMDS metadata service for network configuration at link local address http://169.254.169.254/metadata/instance?api-version=2017-12-01. The azure metadata service presents a list of macs and allocated ip addresses associated with this instance. Azure will now also regenerate network configuration on every boot because it subscribes to EventType.BOOT maintenance events as well as the 'first boot' EventType.BOOT_NEW_INSTANCE. For testing add azure-imds --kind to cloud-init devel net_convert tool for debugging IMDS metadata. Also refactor _get_data into 3 discrete methods:   - is_platform_viable: check quickly whether the datasource is     potentially compatible with the platform on which is is running   - crawl_metadata: walk all potential metadata candidates, returning a     structured dict of all metadata and userdata. Raise InvalidMetaData on     error.   - _get_data: call crawl_metadata and process results or error. Cache     instance data on class attributes: metadata, userdata_raw etc.
2018-05-23Azure: Ignore NTFS mount errors when checking ephemeral drivePaul Meyer
The Azure data source provides a method to check whether a NTFS partition on the ephemeral disk is safe for reformatting to ext4. The method checks to see if there are customer data files on the disk. However, mounting the partition fails on systems that do not have the capability of mounting NTFS. Note that in this case, it is also very unlikely that the NTFS partition would have been used by the system (since it can't mount it). The only case would be where an update to the system removed the capability to mount NTFS, the likelihood of which is also very small. This change allows the reformatting of the ephemeral disk to ext4 on systems where mounting NTFS is not supported.
2018-05-03azure: Add reported ready marker file.Joshua Chan
This change is for Azure VM Preprovisioning. A bug was found when after azure VMs report ready the first time, during the time when VM is polling indefinitely for the new ovf-env.xml from Instance Metadata Service (IMDS), if a reboot happens, we send another report ready signal to the fabric, which deletes the reprovisioning data on the node. This marker file is used to fix this issue so that we will only send a report ready signal to the fabric when no marker file is present. Then, create a marker file so that when a reboot does occur, we check if a marker file has been created and decide whether we would like to send the repot ready signal. LP: #1765214
2018-04-19pylint: pay attention to unused variable warnings.Scott Moser
This enables warnings produced by pylint for unused variables (W0612), and fixes the existing errors.