summaryrefslogtreecommitdiff
path: root/doc/rtd/topics/debugging.rst
diff options
context:
space:
mode:
authorChad Smith <chad.smith@canonical.com>2017-08-21 13:46:23 -0600
committerChad Smith <chad.smith@canonical.com>2017-08-21 13:46:23 -0600
commite74d7752f1761c3a8d3c19877de4707d00c49d08 (patch)
tree0627dd5d06d83b770b38196bceabddd06c0b1b77 /doc/rtd/topics/debugging.rst
parentdc2bd79949492bccdc1d7df0132f98c354d51943 (diff)
downloadvyos-cloud-init-e74d7752f1761c3a8d3c19877de4707d00c49d08.tar.gz
vyos-cloud-init-e74d7752f1761c3a8d3c19877de4707d00c49d08.zip
tools: Add tooling for basic cloud-init performance analysis.
This branch adds cloudinit-analyze into cloud-init proper. It adds an "analyze" subcommand to the cloud-init command line utility for quick performance assessment of cloud-init stages and events. On a cloud-init configured instance, running "cloud-init analyze blame" will now report which cloud-init events cost the most wall time. This allows for quick assessment of the most costly stages of cloud-init. This functionality is pulled from Ryan Harper's analyze work. The cloudinit-analyze main script itself has been refactored a bit for inclusion as a subcommand of cloud-init CLI. There will be a followup branch at some point which will optionally instrument detailed strace profiling, but that approach needs a bit more discussion first. This branch also adds: * additional debugging topic to the sphinx-generated docs describing cloud-init analyze, dump and show as well as cloud-init single usage. * Updates the Makefile unittests target to include cloudinit directory because we now have unittests within that package. LP: #1709761
Diffstat (limited to 'doc/rtd/topics/debugging.rst')
-rw-r--r--doc/rtd/topics/debugging.rst146
1 files changed, 146 insertions, 0 deletions
diff --git a/doc/rtd/topics/debugging.rst b/doc/rtd/topics/debugging.rst
new file mode 100644
index 00000000..4e43dd57
--- /dev/null
+++ b/doc/rtd/topics/debugging.rst
@@ -0,0 +1,146 @@
+**********************
+Testing and debugging cloud-init
+**********************
+
+Overview
+========
+This topic will discuss general approaches for test and debug of cloud-init on
+deployed instances.
+
+
+Boot Time Analysis - cloud-init analyze
+======================================
+Occasionally instances don't appear as performant as we would like and
+cloud-init packages a simple facility to inspect what operations took
+cloud-init the longest during boot and setup.
+
+The script **/usr/bin/cloud-init** has an analyze sub-command **analyze**
+which parses any cloud-init.log file into formatted and sorted events. It
+allows for detailed analysis of the most costly cloud-init operations are to
+determine the long-pole in cloud-init configuration and setup. These
+subcommands default to reading /var/log/cloud-init.log.
+
+* ``analyze show`` Parse and organize cloud-init.log events by stage and
+include each sub-stage granularity with time delta reports.
+
+.. code-block:: bash
+
+ $ cloud-init analyze show -i my-cloud-init.log
+ -- Boot Record 01 --
+ The total time elapsed since completing an event is printed after the "@"
+ character.
+ The time the event takes is printed after the "+" character.
+
+ Starting stage: modules-config
+ |`->config-emit_upstart ran successfully @05.47600s +00.00100s
+ |`->config-snap_config ran successfully @05.47700s +00.00100s
+ |`->config-ssh-import-id ran successfully @05.47800s +00.00200s
+ |`->config-locale ran successfully @05.48000s +00.00100s
+ ...
+
+
+* ``analyze dump`` Parse cloud-init.log into event records and return a list of
+dictionaries that can be consumed for other reporting needs.
+
+.. code-block:: bash
+
+ $ cloud-init analyze blame -i my-cloud-init.log
+ [
+ {
+ "description": "running config modules",
+ "event_type": "start",
+ "name": "modules-config",
+ "origin": "cloudinit",
+ "timestamp": 1510807493.0
+ },...
+
+* ``analyze blame`` Parse cloud-init.log into event records and sort them based
+on highest time cost for quick assessment of areas of cloud-init that may need
+improvement.
+
+.. code-block:: bash
+
+ $ cloud-init analyze blame -i my-cloud-init.log
+ -- Boot Record 11 --
+ 00.01300s (modules-final/config-scripts-per-boot)
+ 00.00400s (modules-final/config-final-message)
+ 00.00100s (modules-final/config-rightscale_userdata)
+ ...
+
+
+Analyze quickstart - LXC
+---------------------------
+To quickly obtain a cloud-init log try using lxc on any ubuntu system:
+
+.. code-block:: bash
+
+ $ lxc init ubuntu-daily:xenial x1
+ $ lxc start x1
+ # Take lxc's cloud-init.log and pipe it to the analyzer
+ $ lxc file pull x1/var/log/cloud-init.log - | cloud-init analyze dump -i -
+ $ lxc file pull x1/var/log/cloud-init.log - | \
+ python3 -m cloudinit.analyze dump -i -
+
+Analyze quickstart - KVM
+---------------------------
+To quickly analyze a KVM a cloud-init log:
+
+1. Download the current cloud image
+ wget https://cloud-images.ubuntu.com/daily/server/xenial/current/xenial-server-cloudimg-amd64.img
+2. Create a snapshot image to preserve the original cloud-image
+
+.. code-block:: bash
+
+ $ qemu-img create -b xenial-server-cloudimg-amd64.img -f qcow2 \
+ test-cloudinit.qcow2
+
+3. Create a seed image with metadata using `cloud-localds`
+
+.. code-block:: bash
+
+ $ cat > user-data <<EOF
+ #cloud-config
+ password: passw0rd
+ chpasswd: { expire: False }
+ EOF
+ $ cloud-localds my-seed.img user-data
+
+4. Launch your modified VM
+
+.. code-block:: bash
+
+ $ kvm -m 512 -net nic -net user -redir tcp:2222::22 \
+ -drive file=test-cloudinit.qcow2,if=virtio,format=qcow2 \
+ -drive file=my-seed.img,if=virtio,format=raw
+
+5. Analyze the boot (blame, dump, show)
+
+.. code-block:: bash
+
+ $ ssh -p 2222 ubuntu@localhost 'cat /var/log/cloud-init.log' | \
+ cloud-init analyze blame -i -
+
+
+Running single cloud config modules
+===================================
+This subcommand is not called by the init system. It can be called manually to
+load the configured datasource and run a single cloud-config module once using
+the cached userdata and metadata after the instance has booted. Each
+cloud-config module has a module FREQUENCY configured: PER_INSTANCE, PER_BOOT,
+PER_ONCE or PER_ALWAYS. When a module is run by cloud-init, it stores a
+semaphore file in
+``/var/lib/cloud/instance/sem/config_<module_name>.<frequency>`` which marks
+when the module last successfully ran. Presence of this semaphore file
+prevents a module from running again if it has already been run. To ensure that
+a module is run again, the desired frequency can be overridden on the
+commandline:
+
+.. code-block:: bash
+
+ $ sudo cloud-init single --name cc_ssh --frequency always
+ ...
+ Generating public/private ed25519 key pair
+ ...
+
+Inspect cloud-init.log for output of what operations were performed as a
+result.