summaryrefslogtreecommitdiff
path: root/doc/rtd/topics/datasources.rst
blob: 648c60683bc799df74f27668b160673e5a11c136 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
.. _datasources:

***********
Datasources
***********

What is a datasource?
=====================

Datasources are sources of configuration data for cloud-init that typically
come from the user (aka userdata) or come from the stack that created the
configuration drive (aka metadata). Typical userdata would include files,
yaml, and shell scripts while typical metadata would include server name,
instance id, display name and other cloud specific details. Since there are
multiple ways to provide this data (each cloud solution seems to prefer its
own way) internally a datasource abstract class was created to allow for a
single way to access the different cloud systems methods to provide this data
through the typical usage of subclasses.

Any metadata processed by cloud-init's datasources is persisted as
``/run/cloud-init/instance-data.json``. Cloud-init provides tooling
to quickly introspect some of that data. See :ref:`instance_metadata` for
more information.


Datasource API
--------------
The current interface that a datasource object must provide is the following:

.. sourcecode:: python

    # returns a mime multipart message that contains
    # all the various fully-expanded components that
    # were found from processing the raw userdata string
    # - when filtering only the mime messages targeting
    #   this instance id will be returned (or messages with
    #   no instance id)
    def get_userdata(self, apply_filter=False)

    # returns the raw userdata string (or none)
    def get_userdata_raw(self)

    # returns a integer (or none) which can be used to identify
    # this instance in a group of instances which are typically
    # created from a single command, thus allowing programmatic
    # filtering on this launch index (or other selective actions)
    @property
    def launch_index(self)

    # the data sources' config_obj is a cloud-config formatted
    # object that came to it from ways other than cloud-config
    # because cloud-config content would be handled elsewhere
    def get_config_obj(self)

    #returns a list of public ssh keys
    def get_public_ssh_keys(self)

    # translates a device 'short' name into the actual physical device
    # fully qualified name (or none if said physical device is not attached
    # or does not exist)
    def device_name_to_device(self, name)

    # gets the locale string this instance should be applying
    # which typically used to adjust the instances locale settings files
    def get_locale(self)

    @property
    def availability_zone(self)

    # gets the instance id that was assigned to this instance by the
    # cloud provider or when said instance id does not exist in the backing
    # metadata this will return 'iid-datasource'
    def get_instance_id(self)

    # gets the fully qualified domain name that this host should  be using
    # when configuring network or hostname releated settings, typically
    # assigned either by the cloud provider or the user creating the vm
    def get_hostname(self, fqdn=False)

    def get_package_mirror_info(self)


Adding a new Datasource
-----------------------
The datasource objects have a few touch points with cloud-init.  If you
are interested in adding a new datasource for your cloud platform you'll
need to take care of the following items:

* **Identify a mechanism for positive identification of the platform**:
  It is good practice for a cloud platform to positively identify itself
  to the guest.  This allows the guest to make educated decisions based
  on the platform on which it is running. On the x86 and arm64 architectures,
  many clouds identify themselves through DMI data.  For example,
  Oracle's public cloud provides the string 'OracleCloud.com' in the
  DMI chassis-asset field.

  cloud-init enabled images produce a log file with details about the
  platform.  Reading through this log in ``/run/cloud-init/ds-identify.log``
  may provide the information needed to uniquely identify the platform.
  If the log is not present, you can generate it by running from source
  ``./tools/ds-identify`` or the installed location
  ``/usr/lib/cloud-init/ds-identify``.

  The mechanism used to identify the platform will be required for the
  ds-identify and datasource module sections below.

* **Add datasource module ``cloudinit/sources/DataSource<CloudPlatform>.py``**:
  It is suggested that you start by copying one of the simpler datasources
  such as DataSourceHetzner.

* **Add tests for datasource module**:
  Add a new file with some tests for the module to
  ``cloudinit/sources/test_<yourplatform>.py``.  For example see
  ``cloudinit/sources/tests/test_oracle.py``

* **Update ds-identify**:  In systemd systems, ds-identify is used to detect
  which datasource should be enabled or if cloud-init should run at all.
  You'll need to make changes to ``tools/ds-identify``.

* **Add tests for ds-identify**: Add relevant tests in a new class to
  ``tests/unittests/test_ds_identify.py``.  You can use ``TestOracle`` as an
  example.

* **Add your datasource name to the builtin list of datasources:** Add
  your datasource module name to the end of the ``datasource_list``
  entry in ``cloudinit/settings.py``.

* **Add your your cloud platform to apport collection prompts:** Update the
  list of cloud platforms in ``cloudinit/apport.py``.  This list will be
  provided to the user who invokes ``ubuntu-bug cloud-init``.

* **Enable datasource by default in ubuntu packaging branches:**
  Ubuntu packaging branches contain a template file
  ``debian/cloud-init.templates`` that ultimately sets the default
  datasource_list when installed via package.  This file needs updating when
  the commit gets into a package.

* **Add documentation for your datasource**: You should add a new
  file in ``doc/datasources/<cloudplatform>.rst``


Datasource Documentation
========================
The following is a list of the implemented datasources.
Follow for more information.

.. toctree::
   :maxdepth: 2

   datasources/aliyun.rst
   datasources/altcloud.rst
   datasources/azure.rst
   datasources/cloudsigma.rst
   datasources/cloudstack.rst
   datasources/configdrive.rst
   datasources/digitalocean.rst
   datasources/ec2.rst
   datasources/maas.rst
   datasources/nocloud.rst
   datasources/opennebula.rst
   datasources/openstack.rst
   datasources/oracle.rst
   datasources/ovf.rst
   datasources/smartos.rst
   datasources/fallback.rst
   datasources/gce.rst

.. vi: textwidth=78