summaryrefslogtreecommitdiff
path: root/cloudinit/sources/__init__.py
AgeCommit message (Collapse)Author
2020-03-10instance-data: add cloud-init merged_cfg and sys_info keys to json (#214)Chad Smith
Cloud-config userdata provided as jinja templates are now distro, platform and merged cloud config aware. The cloud-init query command will also surface this config data. Now users can selectively render portions of cloud-config based on: * distro name, version, release * python version * merged cloud config values * machine platform * kernel To support template handling of this config, add new top-level keys to /run/cloud-init/instance-data.json. The new 'merged_cfg' key represents merged cloud config from /etc/cloud/cloud.cfg and /etc/cloud/cloud.cfg.d/*. The new 'sys_info' key which captures distro and platform info from cloudinit.util.system_info. Cloud config userdata templates can render conditional content based on these additional environmental checks such as the following simple example: ``` ## template: jinja #cloud-config runcmd: {% if distro == 'opensuse' %} - sh /custom-setup-sles {% elif distro == 'centos' %} - sh /custom-setup-centos {% elif distro == 'debian' %} - sh /custom-setup-debian {% endif %} ``` To see all values: sudo cloud-init query --all Any keys added to the standardized v1 keys are guaranteed to not change or drop on future released of cloud-init. 'v1' keys will be retained for backward-compatibility even if a new standardized 'v2' set of keys are introduced The following standardized v1 keys are added: * distro, distro_release, distro_version, kernel_version, machine, python_version, system_platform, variant LP: #1865969
2020-03-04instance-data: write redacted cfg to instance-data.json (#233)Chad Smith
When cloud-init persisted instance metadata to instance-data.json if failed to redact the sensitive value. Currently, the only sensitive key 'security-credentials' is omitted as cloud-init does not fetch this value from IMDS. Fix this by properly redacting the content from the public instance-metadata.json file while retaining the value in the root-only instance-data-sensitive.json file. LP: #1865947
2020-01-21Start removing dependency on six (#178)Daniel Watkins
* url_helper: drop six * url_helper: sort imports * log: drop six * log: sort imports * handlers/__init__: drop six * handlers/__init__: sort imports * user_data: drop six * user_data: sort imports * sources/__init__: drop six * sources/__init__: sort imports * DataSourceOVF: drop six * DataSourceOVF: sort imports * sources/helpers/openstack: drop six * sources/helpers/openstack: sort imports * mergers/m_str: drop six This also allowed simplification of the logic, as we will never encounter a non-string text type. * type_utils: drop six * mergers/m_dict: drop six * mergers/m_list: drop six * cmd/query: drop six * mergers/__init__: drop six * net/cmdline: drop six * reporting/handlers: drop six * reporting/handlers: sort imports
2019-11-13Fix metadata check when local-hostname is null (#32)Mark Goddard
Fix traceback when running with a config drive containing a metadata file which has local-hostname set to null. Cloud-init ignores absent local-hostname or None values. LP: #1852100
2019-08-09Add support for publishing host keys to GCE guest attributesRick Wright
This adds an empty publish_host_keys() method to the default datasource that is called by cc_ssh.py. This feature can be controlled by the 'ssh_publish_hostkeys' config option. It is enabled by default but can be disabled by setting 'enabled' to false. Also, a blacklist of key types is supported. In addition, this change implements ssh_publish_hostkeys() for the GCE datasource, attempting to write the hostkeys to the instance's guest attributes. Using these hostkeys for ssh connections is currently supported by the alpha version of Google's 'gcloud' command-line tool. (On Google Compute Engine, this feature will be enabled by setting the 'enable-guest-attributes' metadata key to 'true' for the project/instance that you would like to use this feature for. When connecting to the instance for the first time using 'gcloud compute ssh' the hostkeys will be read from the guest attributes for the instance and written to the user's local known_hosts file for Google Compute Engine instances.)
2019-07-26net/cmdline: split interfaces_by_mac and init network config determinationDaniel Watkins
Previously "cmdline" network configuration could be either user-specified network-config=... configuration data, or initramfs-provided configuration data. Before data sources could modify the order in which network config sources were considered, this conflation didn't matter (and, indeed, in the default data source configuration it will continue to not matter). However, it _is_ desirable for a data source to be able to specify that its network configuration should be preferred over the initramfs-provided network configuration but still allow explicit network-config=... configuration passed to the kernel cmdline to continue to override both of those sources. (This also modifies the Oracle data source to use read_initramfs_config directly, which is effectively what it was using read_kernel_cmdline_config for previously.)
2019-07-23stages: allow data sources to override network config source orderDaniel Watkins
Currently, if a platform provides any network configuration via the "cmdline" method (i.e. network-data=... on the kernel command line, ip=... on the kernel command line, or iBFT config via /run/net-*.conf), the value of the data source's network_config property is completely ignored. This means that on platforms that use iSCSI boot (such as Oracle Compute Infrastructure), there is no way for the data source to configure any network interfaces other than those that have already been configured by the initramfs. This change allows data sources to specify the order in which network configuration sources are considered. Data sources that opt to use this mechanism will be expected to consume the command line network data and integrate it themselves. (The generic merging of network configuration sources was considered, but we concluded that the single use case we have presently (a) didn't warrant the increased complexity, and (b) didn't give us a broad enough view to be sure that our generic implementation would be sufficiently generic. This change in no way precludes a merging strategy in future.)
2019-04-10Revert "DataSource: move update_events from a class to an instance..."Daniel Watkins
Moving update_events from a class attribute to an instance attribute means that it doesn't exist on DataSource objects that are unpickled, causing tracebacks on cloud-init upgrade. As this change is only required for cloud-init installations which don't utilise ds-identify, we're backing it out to be reintroduced once the upgrade path bug has been addressed. This reverts commit f2fd6eac4407e60d0e98826ab03847dda4cde138.
2019-03-14DataSource: move update_events from a class to an instance attributeDaniel Watkins
Currently, DataSourceAzure updates self.update_events in __init__. As update_events is a class attribute on DataSource, this updates it for all instances of classes derived from DataSource including those for other clouds. This means that if DataSourceAzure is even instantiated, its behaviour is applied to whichever data source ends up being used for boot. To address this, update_events is moved from a class attribute to an instance attribute (that is therefore populated at instantiation time). This retains the defaults for all DataSource sub-class instances, but avoids them being able to mutate the state in instances of other DataSource sub-classes. update_events is only ever referenced on an instance of DataSource (or a sub-class); no code relies on it being a class attribute. (In fact, it's only used within methods on DataSource or its sub-classes, so it doesn't even _need_ to remain public, though I think it's appropriate for it to be public.) DataSourceScaleway is also updated to move update_events from a class attribute to an instance attribute, as the class attribute would now be masked by the DataSource instance attribute. LP: #1819913
2018-10-09tools: Add cloud-id command line utilityChad Smith
Add a quick cloud lookup utility in order to more easily determine the cloud on which an instance is running. The utility parses standardized attributes from /run/cloud-init/instance-data.json to print the canonical cloud-id for the instance. It uses known region maps if necessary to determine on which specific cloud the instance is running. Examples: aws, aws-gov, aws-china, rackspace, azure-china, lxd, openstack, unknown
2018-10-09instance-data: Add standard keys platform and subplatform. Refactor ec2.Chad Smith
Add the following instance-data.json standardized keys: * v1._beta_keys: List any v1 keys in beta development, e.g. ['subplatform']. * v1.public_ssh_keys: List of any cloud-provided ssh keys for the instance. * v1.platform: String representing the cloud platform api supporting the datasource. For example: 'ec2' for aws, aliyun and brightbox cloud names. * v1.subplatform: String with more details about the source of the metadata consumed. For example, metadata uri, config drive device path or seed directory. To support the new platform and subplatform standardized instance-data, DataSource and its subclasses grew platform and subplatform attributes. The platform attribute defaults to the lowercase string datasource name at self.dsname. This method is overridden in NoCloud, Ec2 and ConfigDrive datasources. The subplatform attribute calls a _get_subplatform method which will return a string containing a simple slug for subplatform type such as metadata, seed-dir or config-drive followed by a detailed uri, device or directory path where the datasource consumed its configuration. As part of this work, DatasourceEC2 methods _get_data and _crawl_metadata have been refactored for a few reasons: - crawl_metadata is now a read-only operation, persisting no attributes on the datasource instance and returns a dictionary of consumed metadata. - crawl_metadata now closely represents the raw stucture of the ec2 metadata consumed, so that end-users can leverage public ec2 metadata documentation where possible. - crawl_metadata adds a '_metadata_api_version' key to the crawled ds.metadata to advertise what version of EC2's api was consumed by cloud-init. - _get_data now does all the processing of crawl_metadata and saves datasource instance attributes userdata_raw, metadata etc. Additional drive-bys: * unit test rework for test_altcloud and test_azure to simplify mocks and make use of existing util and test_helpers functions.
2018-09-26docs: surface experimental doc in instance-data.jsonChad Smith
2018-09-25cli: add cloud-init query subcommand to query instance metadataChad Smith
Cloud-init caches any cloud metadata crawled during boot in the file /run/cloud-init/instance-data.json. Cloud-init also standardizes some of that metadata across all clouds. The command 'cloud-init query' surfaces a simple CLI to query or format any cached instance metadata so that scripts or end-users do not have to write tools to crawl metadata themselves. Since 'cloud-init query' is runnable by non-root users, redact any sensitive data from instance-data.json and provide a root-readable unredacted instance-data-sensitive.json. Datasources can now define a sensitive_metadata_keys tuple which will redact any matching keys which could contain passwords or credentials from instance-data.json. Also add the following standardized 'v1' instance-data.json keys:   - user_data: The base64encoded user-data provided at instance launch   - vendor_data: Any vendor_data provided to the instance at launch   - underscore_delimited versions of existing hyphenated keys:     instance_id, local_hostname, availability_zone, cloud_name
2018-09-11user-data: jinja template to render instance-data.json in cloud-configChad Smith
Allow users to provide '## template: jinja' as the first line or their #cloud-config or custom script user-data parts. When this header exists, the cloud-config or script will be rendered as a jinja template. All instance metadata keys and values present in /run/cloud-init/instance-data.json will be available as jinja variables for the template. This means any cloud-config module or script can reference any standardized instance data in templates and scripts. Additionally, any standardized instance-data.json keys scoped below a '<v#>' key will be promoted as a top-level key for ease of reference in templates. This means that '{{ local_hostname }}' is the same as using the latest '{{ v#.local_hostname }}'. Since instance-data is written to /run/cloud-init/instance-data.json, make sure it is persisted across reboots when the cached datasource opject is reloaded. LP: #1791781
2018-08-17Add datasource Oracle Compute Infrastructure (OCI).Scott Moser
This adds a Oracle specific datasource that functions with OCI. It is a simplified version of the OpenStack metadata server with support for vendor-data. It does not support the OCI-C (classic) platform. Also here is a move of BrokenMetadata to common 'sources' as this was the third occurrence of that class.
2018-07-26update_metadata re-config on every boot comments and tests not quite rightMike Gerdts
The comment in update_metadata() that explains how a datasource should enable network reconfig on every boot presumes that EventType.BOOT_NEW_INSTANCE is a subset of EventType.BOOT. That's not the case, and as such a datasource that needs to configure networking when it is a new instance and every boot needs to include both event types. To make the situation above easier to debug, update_metadata() now logs when it returns false. To make it so that datasources do not need to test before appending to the update_events['network'], it is changed from a list to a set. test_update_metadata_only_acts_on_supported_update_events is updated to allow datasources to support EventType.BOOT. Author: Mike Gerdts <mike.gerdts@joyent.com>
2018-07-01update_metadata: a datasource can support network re-config every bootChad Smith
Very basic type definitions are now defined to distinguish 'boot' events from 'new instance (first boot)'. Event types will now be handed to a datasource.update_metadata method which can determine whether to refresh its metadata and re-render configuration based on that source event. A datasource can 'subscribe' to an event by setting up the update_events attribute on the datasource class which describe what config scope is updated by a list of matching events. By default datasources will have the following update_events: {'network': [EventType.BOOT_NEW_INSTANCE]} This setting says the datasource will re-write network configuration only on first boot of a new instance or when the instance id changes. New methods are now present on the datasource: - clear_cached_attrs: Resets cached datasource attributes to values listed in datasource.cached_attr_defaults. This is performed prior to processing a fresh metadata process to avoid keeping old/invalid cached data around. - update_metadata: accepts source_event_types to determine if the metadata should be crawled again and processed
2018-05-23openstack: Allow discovery in init-local using dhclient in a sandbox.Chad Smith
Network has not yet been configured in the init-local stage so the openstack datasource will use dhcp-client to temporarily obtain an ipv4 address and query the metadata service at http://169.254.169.254 to get network_data.json configuration. If present, the datasource will return network_config version 1 config based on that network_data.json content. Previously OpenStack datasource only setup dhcp on the fallback interface so this represents a change in behavior to react to the full config provided by openstack. Also significant to OpenStack is the separation of a _crawl_data operation from get_data(). crawl_data walks the available metadata services and returns a dict of discovered content. get_data consumes the crawled_data,  caches it in the datasource and reacts to that data. /run/cloud-init/instance-data.json now published network_data.json or ec2_metadata key if that data is present on any datasource. The main reasons for the separation of crawl from get_data:  * Enable performance metrics of cloud-init's metadata crawls on each  * Enable cloud-init modules and scripts to query and consume metadata    content which may have updated/changed after cloud-init's initial cache    during instance boot. (Think hotplug) Also generalize common logic to base DataSource class/module:  * Move to a common UNSET variable up into base datasource module fix EC2,    ConfigDrive, OpenStack, SmartOS to use the global.  * Drop get_url_settings from Ec2, CloudStack and OpenStack and generalize    DataSource.get_url_params(). Allow subclasses to override url_max_wait,    url_timeout and url_retries params.  * Rename get_network_metadata bool to perform_dhcp_setup as it designates    whether EphemeralDHCPv4 setup is required before crawling metadata. LP: #1749717
2018-03-14set_hostname: When present in metadata, set it before network bringup.Chad Smith
When instance meta-data provides hostname information, run cc_set_hostname in the init-local or init-net stage before network comes up. Prevent an initial DHCP request which leaks the stock cloud-image default hostname before the meta-data provided hostname was processed. A leaked cloud-image hostname adversely affects Dynamic DNS which would reallocate 'ubuntu' hostname in DNS to every instance brought up by cloud-init. These instances would only update DNS to the cloud-init configured hostname upon DHCP lease renewal. This branch extends the get_hostname methods in datasource, cloud and util to limit results to metadata_only to avoid extra cost of querying the distro for hostname information if metadata does not provide that information. LP: #1746455
2018-01-24docs: Fix typos in docs and one debug message.aRkadeFR
Fix obvious typos. Replace 'for for' with a 'for'.
2017-12-05Datasources: Formalize DataSource get_data and related properties.Chad Smith
Each DataSource subclass must define its own get_data method. This branch formalizes our DataSource class to require that subclasses define an explicit dsname for sourcing cloud-config datasource configuration. Subclasses must also override the _get_data method or a NotImplementedError is raised. The branch also writes /run/cloud-init/instance-data.json. This file contains all meta-data, user-data and vendor-data and a standardized set of metadata keys in a json blob which other utilities with root-access could make use of. Because some meta-data or user-data is potentially sensitive the file is only readable by root. Generally most metadata content types should be json serializable. If specific keys or values are not serializable, those specific values will be base64encoded and the key path will be listed under the top-level key 'base64-encoded-keys' in instance-data.json. If json writing fails due to other TypeErrors or UnicodeDecodeErrors, a warning log will be emitted to /var/log/cloud-init.log and no instance-data.json will be created.
2017-08-30distro: allow distro to specify a default localeRyan Harper
Currently the cloud-init default locale (en_US.UTF-8) is set by the base datasource class. This patch allows a distro to overide the fallback value with one that's available in the distro but continues to respect an image which has preconfigured a locale. - Distro object now has a get_locale method which will return a preconfigure locale setting by checking the distros locale system configuration file. If not set or not present, return the default locale of en_US.UTF-8 which retains behavior of all previous cloud-init releases. - Apply locale now handles regenerating locales or system configuration files as needed. - Adjust apply_locale logic to skip locale-regen if the specified LANG value is C.UTF-8,C, or POSIX; they do not require regeneration. - Further add unittests to exercise the default paths for Ubuntu and non-ubuntu paths to validate they get the LANG expected.
2017-06-27Azure: Add network-config, Refactor net layer to handle duplicate macs.Ryan Harper
On systems with network devices with duplicate mac addresses, cloud-init will fail to rename the devices according to the specified network configuration. Refactor net layer to search by device driver and device id if available. Azure systems may have duplicate mac addresses by design. Update Azure datasource to run at init-local time and let Azure datasource generate a fallback networking config to handle advanced networking configurations. Lastly, add a 'setup' method to the datasources that is called before userdata/vendordata is processed but after networking is up. That is used here on Azure to interact with the 'fabric'.
2017-04-21pylint: fix all logging warningsJoshua Powers
This will change all instances of LOG.warn to LOG.warning as warn is now a deprecated method. It will also make sure any logging uses lazy logging by passing string format arguments as function parameters.
2017-03-24test: add running of pylintJoshua Powers
Now tox will run pylint. The .pylintrc file sets pylint to only produce errors, and will ignore certain classes that are known problematic (six).
2017-03-21Bounce network interface for Azure when using the built-in path.Brent Baude
When deploying on Azure and using only cloud-init, you must "bounce" the network interface to trigger a DDNS update. This allows dhclient to register the hostname with Azure so that DNS works correctly on their private networks (i.e. between vm and vm). The agent path was already doing the bounce so this creates parity between the built-in path and the agent. LP: #1674685
2016-12-22LICENSE: Allow dual licensing GPL-3 or Apache 2.0Jon Grimm
This has been a recurring ask and we had initially just made the change to the cloud-init 2.0 codebase. As the current thinking is we'll just continue to enhance the current codebase, its desirable to relicense to match what we'd intended as part of the 2.0 plan here. - put a brief description of license in LICENSE file - put full license versions in LICENSE-GPLv3 and LICENSE-Apache2.0 - simplify the per-file header to reference LICENSE - tox: ignore H102 (Apache License Header check) Add license header to files that ship. Reformat headers, make sure everything has vi: at end of file. Non-shipping files do not need the copyright header, but at the moment tests/ have it.
2016-12-19set_hostname: avoid erroneously appending domain to fqdnLars Kellogg-Stedman
In some situations, cloud-init will erroneously append a default domain to an already fully qualified hostname, resulting in something like 'localhost.localdomain.localdomain'. This patch checks to see if the value returned by util.get_hostname() contains a '.', and if it does treats it as a fully qualified name. Resolves: rhbz#1389048 LP: #1647910
2016-11-18Add activate_datasource, for datasource specific code paths.Scott Moser
This adds a call to 'activate_datasource'. That will be called during init stage (or init-local in the event of a 'local' dsmode). It is present so that the datasource can do platform specific operations that may be necessary. It is passed the fully rendered cloud-config and whether or not the instance is a new instance. The Azure datasource uses this to address formatting of the ephemeral devices. It does so by a.) waiting for the device to come online b.) removing the marker files for the disk_setup and mounts modules if it finds that the ephemeral device has been reset. LP: #1611074
2016-08-12MAAS: add vendor-data supportScott Moser
Add vendor-data support to maas which will behave like the openstack vendor-data does. Data returned from maas must be yaml loadable. Also update the main in DataSourceMAAS to "just work" on a maas deployed system. LP: #1612313
2016-06-27fix restoring from a datasource that did not have dsmodeScott Moser
On upgrade and reboot, if datasource restored from obj.pkl did not have a dsmode attribute, then 'init --local' would fail due to stack trace. LP: #1596690
2016-05-26fix typos in namesScott Moser
2016-05-25commit to push for fear of loss.Scott Moser
== background == DataSource Mode (dsmode) is present in many datasources in cloud-init. dsmode was originally added to cloud-init to specify when this datasource should be 'realized'. cloud-init has 4 stages of boot. a.) cloud-init --local . network is guaranteed not present. b.) cloud-init (--network). network is guaranteed present. c.) cloud-config d.) cloud-init final 'init_modules' [1] are run "as early as possible". And as such, are executed in either 'a' or 'b' based on the datasource. However, executing them means that user-data has been fully consumed. User-data and vendor-data may have '#include http://...' which then rely on the network being present. boothooks are an example of the things run in init_modules. The 'dsmode' was a way for a user to indicate that init_modules should run at 'a' (dsmode=local) or 'b' (dsmode=net) directly. Things were further confused when a datasource could provide networking configuration. Then, we needed to apply the networking config at 'a' but if the user had provided boothooks that expected networking, then the init_modules would need to be executed at 'b'. The config drive datasource hacked its way through this and applies networking if *it* detects it is a new instance. == Suggested Change == The plan is to 1. incorporate 'dsmode' into DataSource superclass 2. make all existing datasources default to network 3. apply any networking configuration from a datasource on first boot only apply_networking will always rename network devices when it runs. for bug 1579130. 4. run init_modules at cloud-init (network) time frame unless datasource is 'local'. 5. Datasources can provide a 'first_boot' method that will be called when a new instance_id is found. This will allow the config drive's write_files to be applied once. Over all, this will very much simplify things. We'll no longer have 2 sources like DataSourceNoCloud and DataSourceNoCloudNet, but would just have one source with a dsmode. == Concerns == Some things have odd reliance on dsmode. For example, OpenNebula's get_hostname uses it to determine if it should do a lookup of an ip address. == Bugs to fix here == http://pad.lv/1577982 ConfigDrive: cloud-init fails to configure network from network_data.json http://pad.lv/1579130 need to support systemd.link renaming of devices in container http://pad.lv/1577844 Drop unnecessary blocking of all net udev rules
2016-05-12Fix up a ton of flake8 issuesJoshua Harlow
2016-04-04DataSource: set ds_cfg to be a dictionaryScott Moser
if the Datasource does not have an entry in config, then set it to be a empty dictionary rather than None. Also remove places that did this elsewhere.
2016-03-24provide datasource.check_instance_id with access to system configScott Moser
Changing this interface to allow for easy change later. The thing that this will enable is: a.) maas datasource to look at the system config and see if it is configured with the same consumer_key b.) datasource config could allow setting a variable that it would look at.
2016-03-22add code to invoke networking configScott Moser
there is no data source that has a populated network_config() so at this point this doesn't do anything.
2016-03-21quickly check to see if the previous instance id is still validScott Moser
This adds a check in cloud-init to see if the existing (cached) datasource is still valid. It relies on support from the Datasource to implement 'check_instance_id'. That method should quickly determine (if possible) if the instance id found in the datasource is still valid. This means that we can still notice new instance ids without depending on a network datasource on every boot. I've also implemented check_instance_id for the superclass and for 3 classes: DataSourceAzure (check dmi data) DataSourceOpenstack (check dmi data) DataSourceNocloud (check the seeded data or kernel command line) LP: #1553815
2015-08-31split 'events' portion of reporting into separate fileScott Moser
this just separates events from other things that could conceivably be reported.
2015-08-04fix pep8Scott Moser
2015-08-02event name doesnt need mode as it is run through init-local or init-netScott Moser
2015-08-02fix tests from syncScott Moser
change ReportStack to ReportEventStack change default ReportEventStack to be status.SUCCESS instead of None
2015-07-31plumb the rest the reporting throughScott Moser
2015-07-31move 'mode' out of SearchReportStackScott Moser
2015-07-31address Daniel's comments in reviewScott Moser
2015-07-31add nicer formating and messages for datasource searchingScott Moser
2015-07-31fix issues found when testingScott Moser
2015-07-30tests passScott Moser
2015-07-22Add DataSource.region and use it in mirror selection.Daniel Watkins
Also implement DataSource.region for EC2 and GCE data sources.
2015-07-22Make full data source available to code that handles mirror selection.Daniel Watkins