summaryrefslogtreecommitdiff
path: root/cloudinit/sources/DataSourceAzure.py
AgeCommit message (Collapse)Author
2019-06-25azure: add region and AZ properties from imds compute location metadataChad Smith
This allows cloud-init query region to show valid region data for Azure
2019-05-08DataSourceAzure: Adjust timeout for polling IMDSAnh Vo
If the IMDS primary server is not available, falling back to the secondary server takes about 1s. The net result is that the expected E2E time is slightly more than 1s. This change increases the timeout to 2s to prevent the infinite loop of timeouts.
2019-04-18mount_cb: do not pass sync and rw options to mountGonéri Le Bouder
On FreeBSD, mount_cd9660 does not accept the sync option that is enabled by default. In addition, the sync is only useful with the `rw` mode. However the `rw` mode was never used. This patch removes the `rw` and `sync` parameter of `mount_cb` to simplify the code base and resolve the FreeBSD issue. LP: #1645824
2019-04-03Azure: Treat _unset network configuration as if it were absentJason Zions (MSFT)
When the Azure datasource persists all of its metadata to the instance directory, it deliberately sets the self.network_config value to be the sources.UNSET value. The goal is to ensure that each time the system boots, fresh network configuration data is fetched from the cloud platform so that any control plane changes will take effect. When a VM is first created, there's no pickled instance to restore, so self._network_config is None, resulting in self.network_config() properly building a new config. Azure suffered from LP: #1801364 which prevented ds from being stored in obj.pkl in the instance directory, so subsequent reboots always regenerated their network configuration. Commit 0dc3a77f41f4544e4cb5a41637af7693410d4cdf introduced a new bug in which self.network_config() assumed the self._network_config value was either None or trustable; when the config was unpickled, that value was _unset, thus breaking the assumption. LP: #1823084
2019-04-03DatasourceAzure: add additional logging for azure datasourceAnh Vo
Create an Azure logging decorator and use additional ReportEventStack context managers to provide additional logging details.
2019-03-26Azure: Ensure platform random_seed is always serializable as JSON.Jason Zions (MSFT)
The Azure platform surfaces random bytes into /sys via Hyper-V. Python 2.7 json.dump() raises an exception if asked to convert a str with non-character content, and python 3.0 json.dump() won't serialize a "bytes" value. As a result, c-i instance data is often not written by Azure, making reboots slower (c-i has to repeat work). The random data is base64-encoded and then decoded into a string (str or unicode depending on the version of Python in use). The base64 string has just as many bits of entropy, so we're not throwing away useful "information", but we can be certain json.dump() will correctly serialize the bits.
2019-02-22azure: Filter list of ssh keys pulled from fabricJason Zions (MSFT)
The Azure data source is expected to expose a list of ssh keys for the user-to-be-provisioned in the crawled metadata. When configured to use the __builtin__ agent this list is built by the WALinuxAgentShim. The shim retrieves the full set of certificates and public keys exposed to the VM from the wireserver, extracts any ssh keys it can, and returns that list. This fix reduces that list of ssh keys to just the ones whose fingerprints appear in the "administrative user" section of the ovf-env.xml file. The Azure control plane exposes other ssh keys to the VM for other reasons, but those should not be added to the authorized_keys file for the provisioned user.
2019-01-15[Azure] Increase retries when talking to Wireserver during metadata walkJason Zions
Testing startup of large numbers of VMs (of varying distros) in Azure shows that 3 retries results in a small percentage of failed VMs. Increasing that by a few dramatically decreases the occurrence of provisioning timeout errors. The initial choice of "3 retries" was uninformed by heavy testing. Also, the alternate provisioning mechanism for Azure (waagent) retries the Wireserver crawl without limit. 10 retries seems a more reasonable choice.
2018-12-14Update to pylint 2.2.2.Scott Moser
The tip-pylint tox target correctly reported the invalid use of string formatting. The change here is to: a.) Fix the error that was caught. b.) move to pylint 2.2.2 for the default 'pylint' target.
2018-11-29azure: detect vnet migration via netlink media change eventTamilmani Manoharan
Replace Azure pre-provision polling on IMDS with a blocking call which watches for netlink link state change messages. The media change event happens when a pre-provisioned VM has been activated and is connected to the users virtual network and cloud-init can then resume operation to complete image instantiation.
2018-11-29Azure: fix copy/paste error in error handling when reading azure ovf.Adam DePue
Check the appropriate variables based on code review. Correcting what seems to be a copy/paste mistake for the error handling from a few lines above.
2018-11-15azure: _poll_imds only retry on 404. Fail on TimeoutChad Smith
Upon URL timeout, _poll_imds is expected to re-dhcp to get updated IP configuration. We don't want to indefinitely retry because the instance likely has invalid IP configuration. LP: #1803598
2018-11-13azure: retry imds polling on requests.TimeoutChad Smith
There is an infrequent race when the booting instance can hit the IMDS service before it is fully available. This results in a requests.ConnectTimeout being raised. Azure's retry_callback logic now retries on either 404s or Timeouts. LP:1800223
2018-11-12azure: Accept variation in error msg from mount for ntfs volumesJason Zions
If Azure detects an ntfs filesystem type during mount attempt, it should still report the resource device as reformattable. There are slight differences in error message format on RedHat and SuSE. This patch simplifies the expected error match to work on both distributions. LP: #1799338
2018-11-12azure: fix regression introduced when persisting ephemeral dhcp leaseasakkurr
In commitish 9073951 azure datasource tried to leverage stale DHCP information obtained from EphemeralDHCPv4 context manager to report updated provisioning status to the fabric earlier in the boot process. Unfortunately the stale ephemeral network configuration had already been torn down in preparation to bring up IMDS network config so the report attempt failed on timeout. This branch introduces obtain_lease and clean_network public methods on EphemeralDHCPv4 to allow for setup and teardown of ephemeral network configuration without using a context manager. Azure datasource now uses this to persist ephemeral network configuration across multiple contexts during provisioning to avoid multiple DHCP roundtrips.
2018-11-01azure: remove /etc/netplan/90-hotplug-azure.yaml when net from IMDSChad Smith
There was a typo in the seeded filename s/azure-hotplug/hotplug-azure/.
2018-10-31azure: report ready to fabric after reprovision and reduce loggingasakkurr
When reusing a preprovisioned VM, report ready to Azure fabric as soon as we get the reprovision data and the goal state so that we are not delayed by the cloud-init stage switch, saving 2-3 seconds. Also reduce logging when polling IMDS for reprovision data. LP: #1799594
2018-10-17azure: Add apply_network_config option to disable network from IMDSChad Smith
Azure generates network configuration from the IMDS service and removes any preexisting hotplug network scripts which exist in Azure cloud images. Add a datasource configuration option which allows for writing a default network configuration which sets up dhcp on eth0 and leave the hotplug handling to the cloud-image scripts. To disable network-config from Azure IMDS, add the following to /etc/cloud/cloud.cfg.d/99-azure-no-imds-network.cfg: datasource:   Azure:     apply_network_config: False LP: #1798424
2018-10-09instance-data: Add standard keys platform and subplatform. Refactor ec2.Chad Smith
Add the following instance-data.json standardized keys: * v1._beta_keys: List any v1 keys in beta development, e.g. ['subplatform']. * v1.public_ssh_keys: List of any cloud-provided ssh keys for the instance. * v1.platform: String representing the cloud platform api supporting the datasource. For example: 'ec2' for aws, aliyun and brightbox cloud names. * v1.subplatform: String with more details about the source of the metadata consumed. For example, metadata uri, config drive device path or seed directory. To support the new platform and subplatform standardized instance-data, DataSource and its subclasses grew platform and subplatform attributes. The platform attribute defaults to the lowercase string datasource name at self.dsname. This method is overridden in NoCloud, Ec2 and ConfigDrive datasources. The subplatform attribute calls a _get_subplatform method which will return a string containing a simple slug for subplatform type such as metadata, seed-dir or config-drive followed by a detailed uri, device or directory path where the datasource consumed its configuration. As part of this work, DatasourceEC2 methods _get_data and _crawl_metadata have been refactored for a few reasons: - crawl_metadata is now a read-only operation, persisting no attributes on the datasource instance and returns a dictionary of consumed metadata. - crawl_metadata now closely represents the raw stucture of the ec2 metadata consumed, so that end-users can leverage public ec2 metadata documentation where possible. - crawl_metadata adds a '_metadata_api_version' key to the crawled ds.metadata to advertise what version of EC2's api was consumed by cloud-init. - _get_data now does all the processing of crawl_metadata and saves datasource instance attributes userdata_raw, metadata etc. Additional drive-bys: * unit test rework for test_altcloud and test_azure to simplify mocks and make use of existing util and test_helpers functions.
2018-08-17azure: allow azure to generate network configuration from IMDS per boot.Chad Smith
Azure datasource now queries IMDS metadata service for network configuration at link local address http://169.254.169.254/metadata/instance?api-version=2017-12-01. The azure metadata service presents a list of macs and allocated ip addresses associated with this instance. Azure will now also regenerate network configuration on every boot because it subscribes to EventType.BOOT maintenance events as well as the 'first boot' EventType.BOOT_NEW_INSTANCE. For testing add azure-imds --kind to cloud-init devel net_convert tool for debugging IMDS metadata. Also refactor _get_data into 3 discrete methods:   - is_platform_viable: check quickly whether the datasource is     potentially compatible with the platform on which is is running   - crawl_metadata: walk all potential metadata candidates, returning a     structured dict of all metadata and userdata. Raise InvalidMetaData on     error.   - _get_data: call crawl_metadata and process results or error. Cache     instance data on class attributes: metadata, userdata_raw etc.
2018-05-23Azure: Ignore NTFS mount errors when checking ephemeral drivePaul Meyer
The Azure data source provides a method to check whether a NTFS partition on the ephemeral disk is safe for reformatting to ext4. The method checks to see if there are customer data files on the disk. However, mounting the partition fails on systems that do not have the capability of mounting NTFS. Note that in this case, it is also very unlikely that the NTFS partition would have been used by the system (since it can't mount it). The only case would be where an update to the system removed the capability to mount NTFS, the likelihood of which is also very small. This change allows the reformatting of the ephemeral disk to ext4 on systems where mounting NTFS is not supported.
2018-05-03azure: Add reported ready marker file.Joshua Chan
This change is for Azure VM Preprovisioning. A bug was found when after azure VMs report ready the first time, during the time when VM is polling indefinitely for the new ovf-env.xml from Instance Metadata Service (IMDS), if a reboot happens, we send another report ready signal to the fabric, which deletes the reprovisioning data on the node. This marker file is used to fix this issue so that we will only send a report ready signal to the fabric when no marker file is present. Then, create a marker file so that when a reboot does occur, we check if a marker file has been created and decide whether we would like to send the repot ready signal. LP: #1765214
2018-04-19pylint: pay attention to unused variable warnings.Scott Moser
This enables warnings produced by pylint for unused variables (W0612), and fixes the existing errors.
2018-03-23Reduce AzurePreprovisioning HTTP timeouts.Douglas Jordan
Reducing timeout to 1 second as IMDS responds within a handful of milliseconds. Also get rid of max_retries to prevent exiting out of polling loop early due to IMDS outage / upgrade. Reduce Azure PreProvisioning HTTP timeouts during polling to avoid waiting an extra minute. LP: #1752977
2018-03-10This commit fixes get_hostname on the AzureDataSource.Douglas Jordan
LP: #1754495
2018-01-24Azure VM Preprovisioning support.Douglas Jordan
This change will enable azure vms to report provisioning has completed twice, first to tell the fabric it has completed then a second time to enable customer settings. The datasource for the second provisioning is the Instance Metadata Service (IMDS),and the VM will poll indefinitely for the new ovf-env.xml from IMDS. This branch introduces EphemeralDHCPv4 which encapsulates common logic used by both DataSourceEc2 an DataSourceAzure for temporary DHCP interactions without side-effects. LP: #1734991
2017-12-20Azure: Only bounce network when necessary.Chad Smith
This fixes a traceback when attempting to bounce the network after hostname resets. In artful and bionic ifupdown package is no longer installed in default cloud images. As such, Azure can't use those tools to bounce the network informing DDNS about hostname changes. This doesn't affect DDNS updates though because systemd-networkd is now watching hostname deltas and with default behavior to SendHostname=True over dhcp for all hostname updates which publishes DDNS for us. LP: #1722668
2017-12-15lint: Fix lints seen by pylint version 1.8.1.Chad Smith
This branch resolves lints seen by pylint revision 1.8.1 and updates our pinned tox pylint dependency used by our tox pylint target.
2017-12-05Datasources: Formalize DataSource get_data and related properties.Chad Smith
Each DataSource subclass must define its own get_data method. This branch formalizes our DataSource class to require that subclasses define an explicit dsname for sourcing cloud-config datasource configuration. Subclasses must also override the _get_data method or a NotImplementedError is raised. The branch also writes /run/cloud-init/instance-data.json. This file contains all meta-data, user-data and vendor-data and a standardized set of metadata keys in a json blob which other utilities with root-access could make use of. Because some meta-data or user-data is potentially sensitive the file is only readable by root. Generally most metadata content types should be json serializable. If specific keys or values are not serializable, those specific values will be base64encoded and the key path will be listed under the top-level key 'base64-encoded-keys' in instance-data.json. If json writing fails due to other TypeErrors or UnicodeDecodeErrors, a warning log will be emitted to /var/log/cloud-init.log and no instance-data.json will be created.
2017-11-30ec2: Fix sandboxed dhclient background process cleanup.Chad Smith
There is a race condition where our sandboxed dhclient properly writes a lease file but has not yet written a pid file. If the sandbox temporary directory is torn down before the dhclient subprocess writes a pidfile DataSourceEc2Local gets a traceback and the instance will fallback to DataSourceEc2 in the init-network stage. This wastes boot cycles we'd rather not spend. Fix handling of sandboxed dhclient to wait for both pidfile and leasefile before proceding. If either file doesn't show in 5 seconds, log a warning and return empty lease results {}. LP: #1735331
2017-11-09Azure: don't generate network configuration for SRIOV devicesScott Moser
Azure kernel now configures the SRIOV devices itself so cloud-init does not need to provide any SRIOV device configuration or udev naming rules. LP: #1721579
2017-09-18Azure: wait longer for SSH pub keys to arrive.Paul Meyer
Currently the Azure data source waits up to 60 seconds. This has proven not to be sufficient to provide resiliency to unrelated transient failures in other parts of the infrastructure. Azure already has logic outside of the VM to abort hung provisioning. This changes lengthens the time out to 15 minutes. LP: #1717611
2017-06-27Azure: Add network-config, Refactor net layer to handle duplicate macs.Ryan Harper
On systems with network devices with duplicate mac addresses, cloud-init will fail to rename the devices according to the specified network configuration. Refactor net layer to search by device driver and device id if available. Azure systems may have duplicate mac addresses by design. Update Azure datasource to run at init-local time and let Azure datasource generate a fallback networking config to handle advanced networking configurations. Lastly, add a 'setup' method to the datasources that is called before userdata/vendordata is processed but after networking is up. That is used here on Azure to interact with the 'fabric'.
2017-06-15FreeBSD: Make freebsd a variant, fix unittests and tools/build-on-freebsd.Scott Moser
- Simplify the logic of 'variant' in util.system_info much of the data from https://github.com/hpcugent/easybuild/wiki/OS_flavor_name_version - fix get_resource_disk_on_freebsd when running on a system without an Azure resource disk. - fix tools/build-on-freebsd to replace oauth with oauthlib and add bash which is a dependency for tests. - update a fiew places that were checking for freebsd but not using the util.is_FreeBSD()
2017-06-15FreeBSD: fix test failureScott Moser
The previous commit caused test failure. This separates out _check_freebsd_cdrom and mocks it in a test rather than patching open.
2017-06-15FreeBSD: replace ifdown/ifup with "ifconfig down" and "ifconfig up".Hongjiang Zhang
Fix the issue caused by different commands on Linux and FreeBSD. On Linux, we used ifdown and ifup to enable and disable a NIC, but on FreeBSD, the counterpart is "ifconfig down" and "ifconfig up". LP: #1697815
2017-06-15FreeBSD: fix cdrom mounting failure if /mnt/cdrom/secure did not exist.Hongjiang Zhang
The current method is to attempt to mount the cdrom (/dev/cd0), if it is successful, /dev/cd0 is configured, otherwise, it is not configured. The problem is it forgets to check whether the mounting destination folder is created or not. As a result, mounting attempt failed even if cdrom is ready. LP: #1696295
2017-06-01azure: remove accidental duplicate line in merge.Scott Moser
In previous commit I inadvertantly left two calls to asset_tag = util.read_dmi_data('chassis-asset-tag') The second did not do anything useful. Thus, remove it.
2017-06-01azure: identify platform by well known value in chassis asset tag.Chad Smith
Azure sets a known chassis asset tag to 7783-7084-3265-9085-8269-3286-77. We can inspect this in both ds-identify and DataSource.get_data to determine whether we are on Azure. Added unit tests to cover these changes and some minor tweaks to Exception error message content to give more context on malformed or missing ovf-env.xml files. LP: #1693939
2017-05-23flake8: move the pinned version of flake8 up to 3.3.0Scott Moser
This just moves flake8 and related tools up to newer versions and fixes the complaints associated with that. We added to the list of flake8 ignores: H102: do not put vim info in source files H304: no relative imports Also updates and pins the following in the flake8 environment: pep8: 1.7.0 => drop (although hacking still pulls it in). pyflakes 1.1.0 => 1.5.0 hacking 0.10.2 => 0.13.0 flake8 2.5.4 => 3.3.0 pycodestyle none => 2.3.1
2017-05-17Azure: fix reformatting of ephemeral disks on resize to large types.Scott Moser
Large instance types have a different disk format on the newly partitioned ephemeral drive. So we have to adjust the logic in the Azure datasource to recognize that a disk with 2 partitions and an empty ntfs filesystem on the second one is acceptable. This also adjusts the datasources's builtin fs_setup config to remove the 'replace_fs' entry. This entry was previously ignored, and confusing. I've clarified the doc on that also. LP: #1686514
2017-05-10FreeBSD: improvements and fixes for use on AzureHongjiang Zhang
This patch targets to make FreeBSD 10.3 or 11 work on Azure. The modifications abide by the rule of: * making as less modification as possible * delegate to the distro or datasource where possible. The main modifications are: 1. network configuration improvements, and movement into distro path. 2. Fix setting of password. Password setting through "pw" can only work through pipe. 3. Add 'root:wheel' to syslog_fix_perms field. 4. Support resizing default file system (ufs) 5. copy cloud.cfg for freebsd to /etc/cloud/cloud.cfg rather than /usr/local/etc/cloud/cloud.cfg. 6. Azure specific changes: a. When reading the azure endpoint, search in a different path and read a different option name (option-245 vs. unknown-245). so, the lease file path should be generated according to platform. b. adjust the handling of ephemeral mounts for ufs filesystem and for finding the ephemeral device. c. fix mounting of cdrom LP: #1636345
2017-04-21pylint: fix all logging warningsJoshua Powers
This will change all instances of LOG.warn to LOG.warning as warn is now a deprecated method. It will also make sure any logging uses lazy logging by passing string format arguments as function parameters.
2017-03-21Bounce network interface for Azure when using the built-in path.Brent Baude
When deploying on Azure and using only cloud-init, you must "bounce" the network interface to trigger a DDNS update. This allows dhclient to register the hostname with Azure so that DNS works correctly on their private networks (i.e. between vm and vm). The agent path was already doing the bounce so this creates parity between the built-in path and the agent. LP: #1674685
2016-12-22LICENSE: Allow dual licensing GPL-3 or Apache 2.0Jon Grimm
This has been a recurring ask and we had initially just made the change to the cloud-init 2.0 codebase. As the current thinking is we'll just continue to enhance the current codebase, its desirable to relicense to match what we'd intended as part of the 2.0 plan here. - put a brief description of license in LICENSE file - put full license versions in LICENSE-GPLv3 and LICENSE-Apache2.0 - simplify the per-file header to reference LICENSE - tox: ignore H102 (Apache License Header check) Add license header to files that ship. Reformat headers, make sure everything has vi: at end of file. Non-shipping files do not need the copyright header, but at the moment tests/ have it.
2016-12-02fix problems found in python2.6 test.Joshua Harlow
These are just simple syntax fixes to work correctly on python2.6. Found when testing in a centos 6 container.
2016-11-22Azure: No longer rely on walinux agent.Scott Moser
Cloud-init has for some time relied on walinuxagent to do some bits of work necessary for instance initialization. That reliance has not been needed for a while, but we have still defaulted to it. This change uses the "builtin" path that Daniel Watkins added some time ago by default. Also, Adjust tests that assumed the non-__builtin__ Azure agent_command. LP: #1538522
2016-11-18Add activate_datasource, for datasource specific code paths.Scott Moser
This adds a call to 'activate_datasource'. That will be called during init stage (or init-local in the event of a 'local' dsmode). It is present so that the datasource can do platform specific operations that may be necessary. It is passed the fully rendered cloud-config and whether or not the instance is a new instance. The Azure datasource uses this to address formatting of the ephemeral devices. It does so by a.) waiting for the device to come online b.) removing the marker files for the disk_setup and mounts modules if it finds that the ephemeral device has been reset. LP: #1611074
2016-09-20Adjust mounts and disk configuration for systemd.Scott Moser
The end result of all of these changes is to get mounts managed by cloud-init to occur only after cloud-init.service is done. We need to do that so that filesystems that are set up by cloud-init (in disk_setup) do not get mounted by stale entries in /etc/fstab before the setup occurs. This can occur in 2 ways: a.) new instance with old /etc/fstab b.) same instance where disk needs adjusting (Azure resize will re-format the ephemeral disk). The list of changes here is: - move mounts and disk_setup module to cloud-init.service rather than config. cloud-init.service runs earlier in boot so it can get those mount points done earlier. - on systemd add 'x-systemd.requires=cloud-init.service' to fstab options - cloud-init-local.service: add Before=basic.target - cloud-init.service: - extend After, Before, and Wants to multiple lines rather than one long line. - sort consistently with cloud-init-local.service - add DefaultDependencies=no - add Before=default.target - add Conflicts=shutdown.target LP: #1611074
2016-08-22azure dhclient-hook cleanupsScott Moser
This adds some function to the generator to maintain the presense of a flag file '/run/cloud-init/enabled' indicating that cloud-init is enabled. Then, only run the dhclient hooks if on Azure and cloud-init is enabled. The test for is_azure currently only checks to see that the board vendor is Microsoft, not actually that we are on azure. Running should not be harmful anywhere, other than slowing down dhclient. The value of this additional code is that then dhclient having run does not task the system with the load of cloud-init. Additionally, some changes to config are done here. * rename 'dhclient_leases' to 'dhclient_lease_file' * move that to the datasource config (datasource/Azure/dhclient_lease_file) Also, it removes the config in config/cloud.cfg that set agent_command to __builtin__. This means that by default cloud-init still needs the agent installed. The suggested follow-on improvement is to use __builtin__ if there is no walinux-agent installed.