Thursday, 1 September 2016

Complex data transformations with nested Heat intrinsic functions

Disclaimer, what follows is either pretty neat, or pure-evil depending your your viewpoint ;)  But it's based on a real use-case and it works, so I'm posting this to document the approach, why it's needed, and hopefully stimulate some discussion around optimizations leading to a improved/simplified implementation in the future.

Friday, 12 August 2016

TripleO Deploy Artifacts (and puppet development workflow)

For a while now, TripleO has supported a "DeployArtifacts" interface, aimed at making it easier to deploy modified/additional files on your overcloud, without the overhead of frequently rebuilding images.

This started out as a way to enable faster iteration on puppet module development (the puppet modules are by default stored inside the images deployed by TripleO, and generally you'll want to do development in a git checkout on the undercloud node), but it is actually a generic interface that can be used for a variety of deployment time customizations.

Friday, 5 August 2016

TripleO Composable Services 101

Over the newton cycle, we've been working very hard on a major refactor of our heat templates and puppet manifiests, such that a much more granular and flexible "Composable Services" pattern is followed throughout our implementation.

It's been a lot of work, but it's been a frequently requested feature for some time, so I'm excited to be in a position to say it's complete for Newton (kudos to everyone involved in making that happen!) :)

This post aims to provide an introduction to this work, an overview of how it works under the hood, some simple usage examples and a roadmap for some related follow-on work.

Thursday, 9 June 2016

TripleO partial stack updates

Recently I was asked if it's possible to do a partial update of a TripleO overcloud - the answer is yes, so I thought I'd write a post showing how to do it.  Much of what follows is basically an update on my old post on nested resource introspection (some interfaces have changed a bit since I wrote that), combined with an introduction to heat PATCH updates.

Partial update?!  Why?

So, the first question is why would you do this - TripleO heat templates are designed to enforce a consistent state for your entire OpenStack deployment, so in most cases you really should update the entire overcloud, and not mess with the underlying nested stacks directly.

However, for some development usage, this creates a long feedback loop - you change something (perhaps one line in a puppet manifest or heat template), then have to wait several minutes for Heat to walk the entire tree of nested stacks, puppet to run all steps on all nodes, etc.

So, while you would probably never do this in production (seriously, please don't!), it can be a useful technique for developers seeking a quicker hack-then-test cycle, and also when attempting to isolate root-causes for some subset of overcloud stack update behavior.

Ok, with that disclaimer clearly stated, here's how you do it:

Step 1 - Find the nested stack to update

Lets take a specific example - I want to update only the ControllerNodesPostDeployment resource which is defined in overcloud.yaml - this is a resource that maps to a nested stack that uses the cluster configuration interfaces I described in this previous post to apply puppet in a series of steps to all controller nodes.

Here's our overcloud (some CLI output removed for brevity):

$ heat stack-list
| 01c51e7e-ad2f-41d3-b056-3c4c84395114 | overcloud  | CREATE_COMPLETE |
2016-06-08T18:07:00 | None         |

Here's the ControllerNodesPostDeployment resource:

$ heat resource-list overcloud | grep ControllerNodesPost
| ControllerNodesPostDeployment             |
e67fff24-8089-4cf8-adf4-9c6064bf01d6          |
OS::TripleO::ControllerPostDeployment             | CREATE_COMPLETE |
2016-06-08T18:07:00 |
e67fff24-8089-4cf8-adf4-9c6064bf01d6 is the resource ID of
ControllerNodesPostDeployment, which is a nested stack - you can confirm
this via:

$ heat stack-list -n | grep "^| e67fff24-8089-4cf8-adf4-9c6064bf01d6"
| e67fff24-8089-4cf8-adf4-9c6064bf01d6 |
| UPDATE_COMPLETE | 2016-06-08T18:10:34 | 2016-06-09T08:52:45 |
01c51e7e-ad2f-41d3-b056-3c4c84395114 |

Note here the first column is the stack ID, and the last is the parent
stack ID (e.g "overcloud" above).

overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 is the name of the stack that implements ControllerNodesPostDeployment - we can refer to it by either that name or the ID (e67fff24-8089-4cf8-adf4-9c6064bf01d6).

Step 2 - Basic update of the stack

Heat supports PATCH updates, so it is possible to trigger a no-op update without passing any template or parameters (the existing data will be used), or to patch in some specific modification.

Here's now it works, we simply use either the name or ID we discovered above, and use heat stack-update (or the new openstack client equivalent commands.

First, however, we want to get the last event ID before triggering the update (or, on recent heatclient versions you can instead use openstack stack event list --follow):

$ heat event-list overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | tac | head -n2
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 89e535ef-d414-4121-b726-9924eccb4fc3 | Stack UPDATE completed successfully | UPDATE_COMPLETE    | 2016-06-09T09:09:09 |

So the last event logged by this nested stack has the ID of 89e535ef-d414-4121-b726-9924eccb4fc3 - we can use this as a marker so we hide all previous events for the stack:

 $ heat event-list -m 89e535ef-d414-4121-b726-9924eccb4fc3 overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
| id | resource_status_reason | resource_status | event_time |

 Now, we can trigger the update, and use the marker event-list to follow progress:

heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26

<wait a short time>

$ heat event-list -m 89e535ef-d414-4121-b726-9924eccb4fc3 overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
| resource_name | id | resource_status_reason | resource_status | event_time |
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 2e08a022-ce0a-4e57-bf30-719fea6cbb74 | Stack UPDATE started | UPDATE_IN_PROGRESS | 2016-06-09T10:00:52 |
| ControllerArtifactsConfig | a55f9b17-f26c-4664-9ea5-535949c368e8 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:00 |
| ControllerPuppetConfig | 21679c7f-c354-4319-9688-7fa290168664 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:00 |
| ControllerPuppetConfig | f5761452-91dd-45dc-92e8-a5c371fa5004 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:02 |
| ControllerArtifactsConfig | 01abec3c-f472-4ec2-893d-0fddb8fc1696 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:02 |
| ControllerArtifactsDeploy | f8f7a21f-9169-4f8c-ab46-46ecbb141be8 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:02 |
| ControllerArtifactsDeploy | 75937a57-e2f0-4d66-9b4c-2308593e56b1 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:04 |
| ControllerLoadBalancerDeployment_Step1 | 6058e29c-cded-4ad3-94d9-65909fd4911d | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:04 |
| ControllerLoadBalancerDeployment_Step1 | c9f93f1f-177c-4721-827f-a7d409b2cd50 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:06 |
| ControllerServicesBaseDeployment_Step2 | 92409e4c-24f2-4e68-bad9-47ce09107d7a | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:06 |
| ControllerServicesBaseDeployment_Step2 | a9203aa1-c438-47c0-977b-8e34669777bc | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:08 |
| ControllerOvercloudServicesDeployment_Step3 | aa7d78dc-d243-4d54-8ea6-3b59a6ed302a | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:08 |
| ControllerOvercloudServicesDeployment_Step3 | 4a1a6885-29d7-4708-a884-01f481ac1b35 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:10 |
| ControllerOvercloudServicesDeployment_Step4 | 7afd52c1-cbbc-431a-a22c-dd7459ed2255 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:10 |
| ControllerOvercloudServicesDeployment_Step4 | 0dac2e72-0919-4e91-ac94-100d8d811c67 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:13 |
| ControllerOvercloudServicesDeployment_Step5 | ec57867f-e401-4756-bd30-0a566eced343 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:13 |
| ControllerOvercloudServicesDeployment_Step5 | 427582fb-acd1-4939-a13c-7b3cbbc7527b | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:15 |
| ExtraConfig | 760fd961-fff6-4f4c-848e-80773e09e04b | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:15 |
| ExtraConfig | caee58b6-01bb-4805-b41f-4c48a8c7d767 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:16 |
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 35f527a5-0761-46bb-aecb-6eee0e0f083e | Stack UPDATE completed successfully | UPDATE_COMPLETE | 2016-06-09T10:01:25 |

So, we can see that we triggered an update on the nested stack, and it ran to completion in around 30 seconds (much less time than updating the entire overcloud).

Step 3 - Update of the stack with modifications

So, those paying attention may have noticed that 30 seconds is too fast for puppet to run on all the controller nodes, and it is - the reason being that we did a no-op update, and so Heat detects that no inputs have changed, thus it doesn't cause puppet to re-run.

To work around this, and enable puppet to re-assert state on every overcloud update, we have an identifier in the nested stack that is normally updated to a value that changes every update (in includes a timestamp when updates are triggered via python-tripleoclient vs heatclient directly)

We can emulate this behavior in our patch update, and force puppet to re-run through all the deployment steps - lets first look at the NodeConfigIdentifers parameter value:

$ heat stack-show overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | grep NodeConfigIdentifiers
"NodeConfigIdentifiers": "{u'deployment_identifier': u'1465409217', u'controller_config': {u'0': u'os-apply-config deployment bb67a1d5-f0a5-48ec-9883-1f2ae578a8bd complet ed,Root CA cert injection not enabled.,TLS not enabled.,None,'}, u'allnodes_extra': u'none'}"

Here we can see various data, including a deployment_identifier, which is the timestamp-derived unique identifier normally passed via python-tripleoclient.

We could update just that field, but the content of this mapping isn't important, only that it changes (this data is not currently consumed by puppet on update, it's just used to trigger the SoftwareDeployment to re-apply the config due to an input value changing).

So we can create an environment file that looks like this (note this must use parameters, not parameter_defaults, so that it overrides the value passed from the parent stack) - any value can be used, but you must change it each update if you want the SoftwareDeployment resources to be re-applied to the nodes.

$ cat update_env.yaml
  NodeConfigIdentifiers: 123

Then we can trigger another PATCH update including this data:

heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 -e update_env.yaml

This time I'm using the new openstack stack event list --follow approach to monitor progress (if you don't have this, you can repeat the marker event-list approach described above):

$ openstack stack event list --follow2016-06-09 08:52:46 [overcloud-ControllerNodesPostDeployment-smy5ygz2lc26]: UPDATE_IN_PROGRESS  Stack UPDATE started
2016-06-09 08:52:54 [ControllerPuppetConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:54 [ControllerArtifactsConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:56 [ControllerPuppetConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:56 [ControllerArtifactsConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:56 [ControllerArtifactsDeploy]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:58 [ControllerArtifactsDeploy]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:58 [ControllerLoadBalancerDeployment_Step1]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:53:32 [ControllerLoadBalancerDeployment_Step1]: UPDATE_COMPLETE  state changed
2016-06-09 08:53:32 [ControllerServicesBaseDeployment_Step2]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:54:00 [ControllerServicesBaseDeployment_Step2]: UPDATE_COMPLETE  state changed
2016-06-09 08:54:00 [ControllerOvercloudServicesDeployment_Step3]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:54:57 [ControllerOvercloudServicesDeployment_Step3]: UPDATE_COMPLETE  state changed
2016-06-09 08:54:57 [ControllerOvercloudServicesDeployment_Step4]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:56:14 [ControllerOvercloudServicesDeployment_Step4]: UPDATE_COMPLETE  state changed
2016-06-09 08:56:14 [ControllerOvercloudServicesDeployment_Step5]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:57:16 [ControllerOvercloudServicesDeployment_Step5]: UPDATE_COMPLETE  state changed
2016-06-09 08:57:16 [ExtraConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:57:17 [ExtraConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:57:26 [overcloud-ControllerNodesPostDeployment-smy5ygz2lc26]: UPDATE_COMPLETE  Stack UPDATE completed successfully

So, here we can see the update of the stack took a little longer (around 5 minutes in my environment), and if you were to check the os-collect-config logs on each controller node, you would see puppet re-applying on each node, fore every step defined in the template.

This approach can be extended if you want to e.g test changes to the stack template (or files it references such as puppet manifests or scripts), you would do something like:

$ cp -r /usr/share/openstack-tripleo-heat-templates .
$ cd openstack-tripleo-heat-templates/
$ heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 -e update_env.yaml -f puppet/controller-post.yaml

Note that if you want to do a final update of the entire overcloud, you would need to point to this copied tree (assuming you want to maintain any changes), e.g

$ openstack overcloud deploy --templates /path/to/copy/openstack-tripleo-heat-templates

Sunday, 17 May 2015

TripleO Heat templates Part 3 - Cluster configuration, introduction/primer

In my previous two posts I covered an overview of TripleO template roles and groups, and specifics of how initial deployment of a node happens.  Today I'm planning to introduce the next step of the deployment process - taking the deployed groups of nodes, and configuring them to work together as clusters running the various OpenStack services encapsulated by each role.

This post will provide an introduction to the patterns and Heat features used to configure the groups of nodes, then in the next instalment I'll dig into the specifics of exactly what configuration takes place in the TripleO heat templates.

Wednesday, 13 May 2015

TripleO Heat templates Part 2 - Node initial deployment & config

In my previous post "TripleO Heat templates Part 1 - roles and groups", I provided an overview of the various TripleO roles, the way the role implementation is abstracted via provider resources, and how they are grouped and scaled via OS::Heat::ResourceGroup.

In this post, I'm aiming to dig into the next level of template implementation, specifically how a role is implemented behind the provider resource alias used in the top-level template.

I'm only going to cover one role type for now OS::TripleO::Controller, because the patterns described are directly applicable to all other role types.  I'm also going to focus on the puppet based implementation (because that's what I'm most familiar with), but again most concepts apply to the element/container/etc based implementations too.

Throughout this post, I'll be referring to templates in the tripleo-heat-templates repo, so if you haven't already, now might be a good time to clone that so you can follow along looking at the templates themselves.

Thursday, 7 May 2015

TripleO Heat templates Part 1 - Roles and Groups

This is the start of a series of posts aiming to de-construct the TripleO heat templates, explaining the abstractions that exist,and the heat features which enable them.

If you're not already a little familiar with ResourceGroups, "Provider Resources" used for template composition, and SoftwareConfig resources, it's probably not a bad idea to check out my previous posts on those topics, as well as our user guide and other documentation - TripleO makes heavy use of all of these features.

Overcloud "Roles"

TripleO typically refers to the deployed OpenStack cloud as the "overcloud", because the tools used to perform that deployment mirror those in the deployed cloud - e.g a small OpenStack is used to bootstrap and manage a bigger one (normally the small OpenStack is called either a "seed" or "undercloud", depending on your environment).

The definition of what is deployed in your overcloud exists in a number of Heat templates, with the top-level one defining a number of groups of different node types, or "roles".

  • Controller: Contains API services, e.g Keystone, Neutron, Heat, Glance, Ceilometer, Horizon, and the API parts of Nova, Cinder & Swift.  It can also optionally host the storage parts for Cinder, Swift and Ceph if these are not deployed separately (see below).
  • Compute: Contains the Nova Hypervisor components
  • BlockStorage: Contains the Cinder storage components (if not hosted on the Controller(s).
  • ObjectStorage: Contains the Swift storage components (if not hosted on the Controllers(s).
  • CephStorage: Contains the Ceph storage components (if not hosted on the Controllers(s).

Roles & resource types

Each of the roles (or node types), are mapped to a a type defined in the resource_registry in the environment passed to Heat.

So, for example, the "Controller" role is defined in the heat template as a type OS::TripleO::Controller, and similar aliases exist for all the other roles.

The resource registry maps this type alias to another heat template, which implements whatever is required to deploy one node with that role.

So to create a node type "OS::TripleO::Controller" Heat may create a stack based on the template in "puppet/controller-puppet.yaml", or some other implementation based on whatever mapping exists in the resource_registry.

This makes it very easy if you want to plug in some alternate implementation, while maintaining the top-level template interfaces and deployment topology.  For example, work is currently in-progress implementing an alternate implementation using docker containers, as an alternative to the existing puppet and element based impelementations.

Roles & ResourceGroups

Each of these roles may be independently scaled - because they are defined in an OS::Heat::ResourceGroup.  The minimum you can deploy is one "Controller" and one "Compute" node (some roles may be deployed with zero nodes in the group).

Here's an example of what that looks like in the top level "overcloud-without-mergepy" template (this is the name of the main template TripleO uses to deploy OpenStack, the "without-mergepy" part is historical and refers to an older, now deprecated, implementation.)

    type: OS::Heat::ResourceGroup
      count: {get_param: ControllerCount}
        type: OS::TripleO::Controller
          AdminPassword: {get_param: AdminPassword}
          AdminToken: {get_param: AdminToken}


Here, you can see we've defined a group of OS::TripleO::Controller resources in an OS::Heat::ResourceGroup, and the number of nodes deployed is controlled via a template parameter, "ControllerCount", and similarly a number of template parameters are referenced to provide input properties to enable configuration of the deployed controller node (I've abbreviated the full list of properties).

This pattern is repeated for all roles, so building a specified number of nodes for a particular role (or adding/removing them via a stack-update), is as simple as passing a different number into Heat as a stack parameter :)

That's all, folks

That's all for today - hopefully it provides an overview of the top-level interfaces provided by the TripleO Heat templates, and illustrates the following:

  • There are clearly defined node "roles", containing the various parts of your OpenStack deployment
  • The patterns used to define and implement these roles are repeated, which helps understand the templates despite them being fairly large.
  • The implementation is modular, and abstractions exist which make implementing different "back end" implementations relatively simple.
  • Deployments can be easily scaled due to using Heat's ResourceGroup functionality.
In future instalments I'll dig further into the individual node implementations, ways to easily plug in site-specific additional configuration, and ways in which you can control and validate the deployments performed via TripleO.